Chapter 10 Fastq format
The next exercise will focus on a set of files including fastq files.
- Fastq files are very commonly used in bioinformatics.
- Fastq files contain DNA or Amino acid sequencing data.
- Fastq files contain the nucleotide/amino acid content and its sequencing quality for sequences.
- Generally these files are separated by sample but not always.
- A fastq file acts as a normal txt file that can be read but is of a specific format.
- One fastq file contains many fastq entries, one after the other.
- Each fastq entry contains four lines.
- One fastq entry represents one sequence.
The format of one entry is as below:
@Sequence 1
CTGTTAAATACCGACTTGCGTCAGGTGCGTGAACAACTGGGCCGCTTT
+
=<<<=>@@@ACDCBCDAC@BAA@BA@BBCBBDA@BB@>CD@A@B?B@@
The lines represent:
1. Header for fastq entry known as the fastq header. This always begins with a ‘@’.
2. Sequence content of sequence
3. Quality header. Always begins with a ‘+’. Sometimes also contains the same information as fastq header.
4. Quality values for each base in the 2nd line. NOTE: ‘@’ can be used as quality values.
For more information on the fastq format the below resource is good: https://en.wikipedia.org/wiki/FASTQ_format