Chapter 13 Advance practice exercise

Below is a set of tasks and questions that will require all the skills you have gained from the advanced linux part practical.

You can check my solutions by clicking the expandable boxes. These are not the definitive solution but only examples of solutions. If your method works and you understand why then you have carried it out correctly.

13.1 Advanced tasks

Task 1

Copy the directory ~/Linux/advanced_practice to ~/Linux/advanced_practice_exercise

cp -r ~/Linux/advanced_practice ~/Linux/advanced_practice_exercise

Task 2

Move into ~/Linux/advanced_practice_exercise

cd ~/Linux/advanced_practice_exercise

Task 3

Make a directory called fastq and one called txt

mkdir fastq txt

Task 4

With one command move all the fastq files into the directory fastq

mv *.fastq fastq/

Task 5

With one command move all the txt files, excluding metadata.txt and samples.txt, into the directory txt

mv sample_*txt txt/

Task 6

Create a file in the fastq directory called patient_1_corrected.fastq and put all the corrected fastq data for patient_1 into the file. You can look at the metadat.txt file to see which samples belong to patient_1.

cat fastq/sample_[1-2]_*corrected.fastq > \
fastq/patient_1_corrected.fastq

Task 7

Append the metadata line for sample_1_AAAA to the bottom of the file sample_1_AAAA.txt in the txt directory.

cat metadata.txt | grep "sample_1_AAAA" >> txt/sample_1_AAAA.txt

Task 8

For all the corrected fastq files find the sequences that start with a stop codon in the forward orientation (i.e. TAG, TAA or TGA). Print out to screen the sample name and sequence info separated by a “:” only (e.g. sample_10_AAGT:TAAGAGAACAATGAACAGATATTAATAATTTTGCCGCTTTTCTGCGGGAT)

grep "^TA[AG]\|^TGA" fastq/*corrected.fastq | \
sed "s/.*sample/sample/" | sed "s/_corrected.fastq//"

Task 9

Count the number of Gs and Cs within file sample_16_AACC.fastq

cat fastq/sample_16_AACC.fastq | grep -B 1 "^+$" | \
grep -v "+\|--" | sed "s/A\|T//g" | wc -c

Task 10

Get the fastq headers of sequences with homopolymers made of As with a length of 5 or greater for the uncorrected fastq files for samples 3,4,5,13,14 and 15 with one command.

cat fastq/*[3-5]*[AGCT].fastq | grep -B 2 "^+$" | \
grep -B 1 "AAAAA" | grep "^@" | sed "s/^@s/S/" | \
sed "s/_[AGCT]*_/: Sequence /" | sed "s/ 1:$//"

13.2 Advanced exercise conclusion

Superlative! Those were definetly difficult tasks so great going getting through them all. That is all the practice and challenges done, there is some information on other languages and the appendix you can continue onto.