Chapter 16 Heatmap

Now that we have our combined, unstratified, and normalised table, we can visualise the dataset to see how the two groups compare.

  • Do samples in the same diet group appear to correlate well with each other?
  • Are samples from one diet group distinguishable from those from the other diet group?

To visualise this we will create a heatmap with hclust2.

Before carrying out the command we will need to edit the file. Carry out the following alterations:

Remove the _Abundance part of the sample names whilst creating a copy that we will use (It is always a good idea to keep the original file in case a mistake happens).

cat diet_unstratified.relab.tsv | sed "s/_Abundance//g" > diet_unstratified.relab.comp.tsv

Intro to unix links:

Next using your text editor of choice carry out the following changes on the file diet_unstratified.relab.comp.tsv.

  • Remove the # (including the one space after the #) from the start of the header so it starts as Pathway.
  • Add in the same metadata line as we did for 12.1 but this time below the header line, i.e. as the 2nd line (ensure you are using tabs instead of spaces).

If you are having issues with creating and editing the file all_num.lefse.bracken you can copy a pre-made version.

cp /pub14/tea/nsc206/NEOF/Shotgun_metagenomics/lefse/diet_unstratified.relab.comp.tsv .

Now we can use the hclust2 tool to create a heatmap of our pathway abundances.

hclust2.py \
-i diet_unstratified.relab.comp.tsv \
-o diet_unstratified.relab.heatmap.png \
--ftop 40 \
--metadata_rows 1 \
--dpi 300

Note: You will get 2 MatplotlibDeprecationWarnings, these are normal and can be ignored. However, ensure these are the only warnings/errors before continuing.

Parameters

  • -i: The input table file.
  • -o: The output image file. The tool does not specify what types of image files you can use but .png is always a good image file format.
  • --ftop: Specifies how many of the top features (pathways in this case) to be included in the heatmap.
  • --metadata_rows: Specifies which row/s contain the metadata information to be used for the group colouring at the top of the heatmap.
    • Row numbers start at 0 for this tool. Therefore our sample names are in row 0 and the diet info is in row 1.
    • Multiple rows can be specified if you have multiple rows of metadata.
      • e.g. --metadata_rows 1,2,3.
  • --dpi: The image resolution in dpi (dots per inch). 300 dpi is used for publication quality images.

There are many more options that can be seen on the hclust2 github.

Visualise

Now we can view the plot.

firefox diet_unstratified.relab.heatmap.png

From this, we can see that there is a small amount of clustering caused by the differences between the Korean and Western diet. Other factors that we do not know about the samples must also come into play. This is normal as we cannot account for everything but it is good to try to account for as much as possible.

MCQs

  1. Which pathway stands out the most?
  2. How many clusters are formed based on diet (Colours on tree at top of heatmap)?
  3. How many clusters are formed based on pathways (Colours on tree at the side of heatmap)?

You can look up the pathway names in the table file to see a fuller description.