Chapter 13 Advance practice exercise

  1. Copy the directory ~/Linux/advanced_practice to ~/Linux/advanced_practice_exercise
  2. Move into ~/Linux/advanced_practice_exercise
  3. Make a directory called fastq and one called txt
  4. With one command move all the fastq files into the directory fastq
  5. With one command move all the txt files, excluding metadata.txt and samples.txt, into the directory txt
  6. Create a file in the fastq directory called patient_1_corrected.fastq and put all the corrected fastq data for patient_1 into the file. You can look at the metadat.txt file to see which samples belong to patient_1.
  7. Append the metadata line for sample_1_AAAA to the bottom of the file sample_1_AAAA.txt in the txt directory.
  8. For all the corrected fastq files find the sequences that start with a stop codon in the forward orientation (i.e. TAG, TAA or TGA). Print out to screen the sample name and sequence info separated by a “:” only (e.g. sample_10_AAGT:TAAGAGAACAATGAACAGATATTAATAATTTTGCCGCTTTTCTGCGGGAT)
  9. Count the number of Gs and Cs within file sample_16_AACC.fastq
  10. Get the fastq headers of sequences with homopolymers made of As with a length of 5 or greater for the uncorrected fastq files for samples 3,4,5,13,14 and 15 with one command.

You can check my solutions in the Answers section. These are not the definitive solution but only examples of solutions. If your method works and you understand why then you have done it correctly.