Chapter 23 Depth calculation
We will use MetaBAT2 for genome binning.
It requires depth information of the contigs.
Depth will be calculated for each base of the assembly by aligning the trimmed reads to the metagenome assembly. The depth will be equal to the number of reads/sequences that align to the base, this represents how many times the base was sequenced. If a base from an assembly aligns to 10 of the reads/sequences used for assembly it will have a depth of 10x. The average of all the base depths in a contig is its average depth.
23.1 Index assembly
For the alignment we will use bwa. We need to index our assembly file prior to alignment.
23.2 Alignment
Next we will align our trimmed paired reads we used to create the stitched reads.
We will carry this out with the bwa mem command.
bwa mem is a good aligner for short reads.
If you are using long reads (PacBio or Nanopore) minimap2 will be more appropriate.
23.3 SAM to sorted BAM
After alignment we need to get the file ready for the contig depth summarisation step. This requires converting the SAM file to a BAM (binary form of a SAM file) file and then sorting the BAM file.
23.4 Summarise depths
Now we can summarise the contig depths from the sorted bam files with MetaBAT2's jgi_summarize_bam_contig_depths command.
23.5 View depths
Have a look at the depth file.
Many contigs have low depths of less than 10 (totalAvgDepth) and short lengths of less than <1500 (contigLen).
23.6 Summarise with R
For tutorial purposes we will have a better at the depth by summarising the contig depths with R.
Activate R:
23.6.1 Column summaries
Read in the file and get a summary() of it.
#Read in the table as an object called df (short for data frame)
#We want the first row to be the column names (header=TRUE)
#We do not want R to check the column names and "fix" them (check.names=FALSE)
df <- read.table("K1.depth.txt", header=TRUE, check.names=FALSE)
#Create a summary of the data
summary(df)The last command gave us summary information of all the columns. This includes the minimum, maximum, mean, median, and Inter-Quartile Range (IQR) values.
We can see the values of the contigLen and totalAvgDepth are very low. However, this is most likely due to a bunch of short and low coverage contigs which will be ignored by MetaBAT2, our binning tool.
23.6.2 Filter then summarise
We'll remove rows with information on contigs shorter than 1500 and rerun the summary.
MetaBAT2's documentation dictates the minimum contig length should be >=1500 with its default being 2500.
#Set the new object "df_min1500len" as all rows
#where the value in the column "contigLen" of "df"
#Is greater than or equal to 1500
df_min1500len <- df[df$contigLen >= 1500,]
#Summary of our new data frame
summary(df_min1500len)That is looking better.
The minimum average depth for MetaBAT2 is 1 and our minimum value is 2.700 with a maximum of ~93 (your values may differ slightly).