Chapter 2 Overview

2.1 What is metagenomics?

Meta /ˈmɛtə/ : prefix meaning “higher” or “beyond”

Metagenomics is the study of genes and genetic material recovered from environmental samples (whether from the sea, soil, human gut, or anywhere else you can imagine). Unlike genomics, metagenomics deals with a multitude of usually diverse species rather than focussing on a single species/genome.

2.2 Why metagenomics?

Microbes exist virtually everywhere on Earth, even in some of the most seemingly hostile environments. Every process on our planet is influenced in some way by the actions of microbes, and all higher organisms are intrinsically associated with microbial communities.

While much can be learned from studying the genome of a single microbial species in isolation, it does not provide us with any information regarding that species' neighbours, i.e. what else is in its natural environment? Metagenomics offers a top-down approach which allows researchers to investigate and understand interactions between species in different environments, thus providing a much broader and complete picture.

2.3 Metagenomics vs Metagenetics

Broadly speaking, there are two families of metagenomic analysis:

  • Amplicon-based: This utilises sequencing data generated from amplified marker sequences, for example, regions of the 16S rRNA. Sequences are clustered together and taxonomically assigned to estimate the species abundance in a sample. This is sometimes referred to as metagenetics, as it does not consist of any genomic analysis beyond the marker gene regions.
  • Shotgun: This utilises sequencing data generated from random fragments from total genomic DNA from environmental samples, rather than targeting specific genes. This approach allows for not only species abundance determination but direct functional analysis, too, due to having information on a wide range of genetic data sampled from the population. This is sometimes referred to as metagenomics, as it involves genome-wide analyses. Shotgun metagenomics is the focus of this practical session.

2.4 Tutorial overview

2.4.1 Basics

This tutorial and practical session focuses on performing a range of metagenomic analyses using shotgun sequence data from the Illumina platforms.

The analyses discussed here are by no means exhaustive and are instead intended to provide a sample of what can be done with a metagenomic dataset.

2.4.2 Structure

We prefer to allow people to work at a pace that they are comfortable with rather than ensuring that everyone is at the same point of the tutorial at the same time. There will be no instructor telling you what to type and click. Instead, everything you require to carry out the practical is written in this document. Take your time; it's important to spend some time understanding why you are running the commands, rather than simply typing them out.

If at any point you are having trouble or have a question, let one of us know and we'll provide 1-to-1 assistance.

2.4.3 Content

This practical is broken up into the following broad sections.

  1. Raw data: We will first link to a dataset that we have downloaded for this tutorial. We will take a quick look at what the sequence files look like and briefly discuss the origin of the samples.
  2. Trimming data: This entails preprocessing our data to ensure that it is of good quality.
  3. Host removal: When sequencing the genomic content of host's microbiota (bacteriome, archaeome, mycobiome, and more) it is likely you will also sequence the host's genome. This step shows a method of removing possible host contamination.
  4. Taxonomic profiling: We will analyse the dataset to determine the species abundance in each sample. Following this, we will visualise the data and compare the samples.
  5. Functional profiling: We will analyse the dataset to determine the pathway abundance and completeness in each sample. Following this, we will visualise the data and compare the samples.
  6. Metagenome assembly: Here, we will move away from just analysing the reads directly and will assemble the metagenome into contigs. Prior to this, we will 'stitch' the reads together to ensure we get the best assembly possible.
  7. Binning: This step attempts to seperate each assembled genomes into bins. These genome assemblies are called Metagenome-assembled Genomes (MAGs).
  8. Functional annotation: We will take our MAGs, predict genes and then functionally annotate them with MetaCyc.