Chapter 11 Exercise 2

The directory "~/Linux/6_final_exercise/" has all the files you need. Below is a set of tasks and questions that will require all the skills you have gained from this practical.
You can check my solutions by clicking the expandable boxes like the one below ("Move to correct directory"). These are not the definitive solution but only examples of solutions. If your method works and you understand why then you have carried it out correctly.
11.1 Exercise 2 tasks
Task 3

Make a backup of the files in a directory called backup.
mkdir backup
cp 1-P1_ATGCCTGG_L001_R1_001.fastq backup/
cp 1-P1_ATGCCTGG_L001_R2_001.fastq backup/
cp 2-P2_AAGGACAC_L001_R1_001.fastq backup/
cp 2-P2_AAGGACAC_L001_R2_001.fastq backup/
cp 3-P3_CACTTCGA_L001_R1_001.fastq backup/
cp 3-P3_CACTTCGA_L001_R2_001.fastq backup/
cp 4-E1_ATTGGCTC_L001_R1_001.fastq backup/
cp 4-E1_ATTGGCTC_L001_R2_001.fastq backup/
cp metadata.txt backup/
This can be done a lot quicker with the use of wildcard characters (Covered in Advanced Linux section)
Task 4

How many reads are in the samples?
The below command will give the number of lines in the files, this number can then be divided by 4 (mentally or using a calculator). These values will be the same for the R2 files as they are for the matching R1 file.
wc -l 1-P1_ATGCCTGG_L001_R1_001.fastq \
2-P2_AAGGACAC_L001_R1_001.fastq \
3-P3_CACTTCGA_L001_R1_001.fastq
An advanced method using regular expressions, wildcard characters and grep:
Task 5

Remove the fastq files with no data.
Check which files have no data
wc \
1-P1_ATGCCTGG_L001_R1_001.fastq 1-P1_ATGCCTGG_L001_R2_001.fastq \
2-P2_AAGGACAC_L001_R1_001.fastq 2-P2_AAGGACAC_L001_R2_001.fastq \
3-P3_CACTTCGA_L001_R1_001.fastq 3-P3_CACTTCGA_L001_R2_001.fastq \
4-E1_ATTGGCTC_L001_R1_001.fastq 4-E1_ATTGGCTC_L001_R2_001.fastq
Remove empty files
Task 9

In file "1-P1_ATGCCTGG_L001_R1_001.fastq" look for sequence headers with the term ‘psychrobacter' (Tip: Use grep
).
Task 10

In the sample 1-P1 remove any fastq entries where the term ‘psychrobacter’ appears in the fastq header. Do this for the R1 and R2 files.
Using nano navigate to the psychrobacter sequences. Then use "Ctrl+K" to cut the lines followed by "Ctrl+S" and "Ctrl+X" to save and exit.
11.2 Exercise 2 conclusion

Stupendous! You have finished the last exercise of the intro to linux section.
Thanks for your hard work. You have learnt a lot throughout this course but there is more to learn if you are willing and have the time. The next section is the Advanced Linux section. This is not required for any of our other workshops but the skills are very useful for bioinformatics analysis in Linux.