Chapter 8 Cleaning your data
We will now filter our data to remove any poor quality reads.
First set the path to a directory to store the filtered output files called filtered
.
filtFs <- file.path(path.cut, "../filtered", basename(cutFs))
filtRs <- file.path(path.cut, "../filtered", basename(cutRs))
Now run filterAndTrim
. This time we use the standard filtering parameters:
maxN=0
After truncation, sequences with more than 0 Ns will be discarded. (DADA2 requires sequences contain no Ns)truncQ = 2
Truncate reads at the first instance of a quality score less than or equal to 2rm.phix = TRUE
Discard reads that match against the phiX genomemaxEE=c(2, 2)
After truncation, reads with higher than 2 "expected errors" will be discardedminLen = 60
Remove reads with length less than 60 (note these should have already been removed by cutadapt)multithread = TRUE
input files are filtered in parallel
out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 2),
truncQ = 2, minLen = 60, rm.phix = TRUE, compress = TRUE,
multithread = TRUE)
out
Some samples have very low read numbers after this filtering step. These could be poor quality samples but we also have negatives controls in this dataset so we would expect these to contain zero or very few reads.