Chapter 10 Fastq format

The next exercise will focus on a set of files including fastq files.

  • Fastq files are very commonly used in bioinformatics.
  • Fastq files contain DNA or Amino acid sequencing data.
  • Fastq files contain the nucleotide/amino acid content and its sequencing quality for sequences.
  • Generally these files are separated by sample but not always.
  • A fastq file acts as a normal txt file that can be read but is of a specific format.
  • One fastq file contains many fastq entries, one after the other.
  • Each fastq entry contains four lines.
    • One fastq entry represents one sequence.

The format of one entry is as below:

@Sequence 1
CTGTTAAATACCGACTTGCGTCAGGTGCGTGAACAACTGGGCCGCTTT
+
=<<<=>@@@ACDCBCDAC@BAA@BA@BBCBBDA@BB@>CD@A@B?B@@

The lines represent:
1. Header for fastq entry known as the fastq header. This always begins with a ‘@’.
2. Sequence content of sequence
3. Quality header. Always begins with a ‘+’. Sometimes also contains the same information as fastq header.
4. Quality values for each base in the 2nd line. NOTE: ‘@’ can be used as quality values.

For more information on the fastq format the below resource is good: https://en.wikipedia.org/wiki/FASTQ_format