Chapter 8 Cleaning your data

We will now filter our data to remove any poor quality reads.
First set the path to a directory to store the filtered output files called filtered.
filtFs <- file.path(path.cut, "../filtered", basename(cutFs))
filtRs <- file.path(path.cut, "../filtered", basename(cutRs))Now run filterAndTrim. This time we use the standard filtering parameters:
maxN=0After truncation, sequences with more than 0 Ns will be discarded. (DADA2 requires sequences contain no Ns)truncQ = 2Truncate reads at the first instance of a quality score less than or equal to 2rm.phix = TRUEDiscard reads that match against the phiX genomemaxEE=c(2, 2)After truncation, reads with higher than 2 "expected errors" will be discardedminLen = 60Remove reads with length less than 60 (note these should have already been removed by cutadapt)multithread = TRUEinput files are filtered in parallel
out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 2),
truncQ = 2, minLen = 60, rm.phix = TRUE, compress = TRUE,
multithread = TRUE)
outSome samples have very low read numbers after this filtering step. These could be poor quality samples but we also have negatives controls in this dataset so we would expect these to contain zero or very few reads.