Chapter 8 File reading and processing

There are many ways to show the contents of a file. Below are a few examples.

The files for the examples are within the directory: "/pub14/tea/nsc2xx/Linux/5_reading_files/" (replace xxx with your user number).

8.2 head and tail

The head command will print out to screen the top n lines of a file.

The tail command will print out to screen the bottom n lines of a file.

The default value is 10. The -n option can be used to indicate how many lines to print out.

Carry out the below commands in the directory "/pub14/tea/nsc2xx/Linux/5_reading_files/"

Print out the top 10 lines of "ecoli.gbk"

head ecoli.gbk

Print out the bottom 10 lines of "ecoli.gbk"

tail ecoli.gbk

Print out the top 25 lines of "ecoli.gbk"

head -n 25 ecoli.gbk

Print out the bottom 2 lines of "ecoli.gbk"

tail -n 2 ecoli.gbk

Print out all but the bottom 2 lines of "Scientist.txt"

head -n -2 Scientist.txt

Print out all lines starting from the 2nd top line of "Scientist.txt"

tail -n +2 Scientist.txt

Print out all but the bottom 5 lines of "Scientist.txt"

head -n -5 Scientist.txt

Print out all lines starting from the 3rd top line of "Scientist.txt"

tail -n +3 Scientist.txt

Print out the top 25 lines of "ecoli.gbk"

head -n +25 ecoli.gbk

Print out the bottom 2 lines of "ecoli.gbk"

tail -n -2 ecoli.gbk

8.3 File viewing with less

The less command will display a file’s contents one page at a time. Various keys on the keyboard will allow you to navigate the contents of the files. The below actions will occur identically with the man command.

  • q : Exit
  • up and down arrow keys : Will move up/down 1 line at a time
  • space : Move down one page
  • b : Move up one page
  • / : Follow this by a term to search for it in the file’s contents
  • n : Find the next occurrence of the term last searched for
  • N : Find the previous occurrence of the term last searched for
  • g : Jump to the first line of the file
  • G : Jump to the bottom line of the file

Use the less command to view the contents of the "ecoli.gbk" file. Then find the 3rd occurrence of the word ‘ribosome’. Afterwards move around the file.

less ecoli.gbk

Look at the manual for less and search for the first occurrence of the string ‘percent’. Afterwards look around the manual page.

man less

8.4 Word count

The wc command will allow you to word count files. It will display line, word and byte counts for files in that order.

Use wc to see the line, word and byte count of the "short_file.txt", "Scientist.txt" and "ecoli.gbk" files. As you can see you can carry this out on multiple files at once.

wc short_file.txt Scientist.txt ecoli.gbk

Count the number of characters in the "short_file.txt" file

wc -m short_file.txt

Count the number of lines in the "ecoli.gbk" file

wc -l ecoli.gbk

8.5 Pattern searching

The grep command will search for a pattern in a text file and output all the lines containing the pattern.

Print out the lines from "Scientist.txt" that have the number 18 in them. In this particular example it prints out all scientists which were born in the 1800s. This will not always be the case depending on the data in the file.

grep “18” Scientist.txt

Print out the lines which have the string "Ada" in them.

grep “Ada” Scientist.txt

Print out the lines which have the string "ada" in them. There should be none, as grep is case sensitive.

grep “ada” Scientist.txt

Type in the following command.

grep Scientist.txt

The above command will be stuck as grep does not know what it is looking for. To cancel the command use ‘Ctrl’ + ‘c’

8.6 Text editor

Three of the most popular text editors are vim, gedit and nano. Below is a quick introductions to nano.

nano is the easiest to learn but is quite limiting. vim and gedit are quite similar in power with different people preferring one or the other.

The below will teach you nano. If you are interested in learning vim in the future you can find a quick guide in the appendix.

8.6.1 nano

To enter the nano text editor you can use the command nano. The command structure is: nano file.txt.

nano can be run with a previous file name which you can then edit or a new file name in which case you will create a new file.

Once you are in the editor you can type characters and move around with the arrow keys.

To carry out specific functions you will need to use Ctrl or Alt with another key. At the bottom of the editor are a few examples where the ^ indicates Ctrl. For example the ^G Get Help means you need to press Ctrl+G to get help. When you use letters this way in nano they are case insensitive (i.e. the CAPS lock can be on or off and you will get the same result).

After you carry out a function ensure you look at the bottom of the editor again as it may ask you to type something or you may get a new series of functions you can use.

Below are some important examples:

  • Ctrl+X - Exit nano
  • Ctrl+S - Save file
  • Ctrl+O - Save file as
  • Ctrl+A - Jump to the start of a line
  • Ctrl+E - Jump to the end of a line
  • Ctrl+W - Start search (Where is) Note This unfortunately is also the shortcut to close a tab in internet browsers. Therefore this can't be used within our webVNC.
  • Alt+W - Continue search forward (find next occurrence forward)
  • Alt+Q - Find next occurrence backward
  • Alt+K - Cut current line
  • Alt+\ - Go to the first line
  • Alt+/ - Go to the last line

Nano cheatsheet

8.6.2 Tasks

Carry out the following tasks in the directory: "/pub14/tea/nscxx/Linux/5_reading_files/"

Using a text editor (nano or vim) add an entry for Scientist Mae Jemison (Born: 1956) to the file "Scientist.txt". The names and date are separated by one tab.

Using your text editor of choice delete all the scientists born before 1000 in the "Scientist.txt" file and save this as "Scientist_post_1000.txt".

8.7 MCQs: File reading and processing

Please attempt to answer the below Multiple-Choice Questions to reinforce what you have learnt in this chapter.

  1. What command searches for a pattern?
  2. What command word counts files?
  3. What command prints the contents of a file?
  4. What command displays a file's contents one page at a time and allows keyboard navigation?
  5. What command prints out the top n lines of a file
  6. What command prints out the bottom n lines of a file