Chapter 14 Bioinformatic Languages
Unfortunately you cannot do everything you would want directly on the Linux command line. Even the tasks you can do are sometimes not very efficient or easy. Fortunately there are many other programming languages you can use.
There is a large amount of other programming languages and it can be hard to know which one to learn. Below is a list of commonly used bioinformatic program languages with a brief summary on their purpose and links to online resources to introduce you to the language.
14.1 awk
awk
is typically used as a data extraction and reporting tool. It is very good due to its power and versatility.
Tutorial:
http://www.grymoire.com/Unix/Awk.html
Manual:
14.2 Python
Python is used for software development and other applications. It is favoured in Bioinformatics due to its relative ease to learn and it is able to handle strings well (i.e. genetic code). There are also many packages for python that help the analysis of biological data and other scientific data.
Python website:
BioPython:
14.3 Perl
Perl is a similar language to python with similar uses. The main difference is how they look.
Tutorial:
https://www.tutorialspoint.com/perl/perl_introduction.htm
Perl website:
BioPerl:
14.4 Python or Perl?
Generally the answer is Python. Many programs are written in Python so even if you don't make any programs yourself it can be useful to know a little python so you can debug Python scripts.
Some programs are written in Perl but it has lost a lot of popularity so there are less and less programs written in Perl. Many Perl programs that were useful are now outdated.
14.5 Ruby
A programming language that was becoming more popular in bioinformatics due to its beginner friendliness. However, it does not appear to have been widely adopted.
Tutorial:
Ruby website:
BioRuby:
Why learn Ruby?
14.6 Golang (AKA Go)
Another easy programming language with a library specifically made for bioinformatics. Has not been widely adopted.
Tutorial:
https://tour.golang.org/welcome/1
Go website:
biogo:
14.7 R
R is a very powerful programming language for statistical analysis and visualisation. It unfortunately has a large barrier to entry and normally quite unclear documentation. However it will unlikely be surpassed by another language any time soon due to its widespread use and large amount of very useful and powerful packages for various uses in the public domain. We would recommend using the IDE Rstudio when using R.
Tutorial:
R Website:
Cran Website:
Rstudio Website: