Chapter 7 Checking the quality of your data
Those of you who attended our Introduction to sequencing data and quality control course would have used FastQC to check the quality of the data. DADA2 has its own quality control option, which plots a similar read length by quality figure.
To run first import the cutadapt files and extract the sample names and then run the plotQualityProfile
function
# Specify the paths and file names of the forward and reverse primer cleaned files
cutFs <- sort(list.files(path.cut, pattern = "_L001_R1_001.fastq", full.names = TRUE))
cutRs <- sort(list.files(path.cut, pattern = "_L001_R2_001.fastq", full.names = TRUE))
# Extract sample names
get.sample.name <- function(fname) strsplit(basename(fname), "_")[[1]][2]
sample.names <- unname(sapply(cutFs, get.sample.name))
head(sample.names)
# check the quality for the first file
plotQualityProfile(cutFs[1:1])

To interpret this plot, the gray-scale heatmap shows the the frequency of each quality score along the forward read lengths. The green line is the median quality score and the orange lines are the quartiles. The red line at the bottom of the plot represents the proportion of reads of that particular length.
The quality is very good for our forward reads. You can also see that the majority of the forward reads are ~130bp long after having the primers removed with cutadapt.
Now check the quality of the reverse file in the same way

The reverse reads also look like they are good quality.
To check the quality of the second and third fastq files we would type.