Chapter 22 Metagenome binning
A metagenome assembly consists of contigs from many different genomes. At this stage we don't know which contigs are from which species. We could try to taxonomically classify each contig but there are 2 problems with this approach:
- Some contigs may be misclassified which can lead to multiple contigs from the same genome/organism being classified as various taxa.
- Databases are incomplete and so some contigs will not be classified at all (microbial dark matter).
To alleviate these issues genomic binning can be carried out. This will cluster contigs into bins based on:
- Coverage: Contigs with similar coverage are more likely to be from the same genome.
- Composition: Contigs with similar GC content are more likely to belong to the same genome.
Genomic binning has been used to discover many new genomes. Additionally, it makes downstream analyses quicker as the downstream steps will be carried out on the sets of bins rather than on one large metagenome assembly.
Binning produces "bins" of contigs of various quality (e.g. draft, complete). These bins are also known as MAGs . In other words a MAG is a single genome assembly that was assembled with other genomes in a metagenome assembly but later separated from the other assemblies. The term MAGs has been adopted by the GSC (Genomics Standards Consortium).
It is recommended to ensure you do not have a poor quality metagenome assembly. Binning requires contigs of good length and good coverage. Extremely low coverage and very short contigs will be excluded from binning.