Computational biology is a relatively new interdisciplinary branch
of science.
It evolves on an intersection of biology, mathematics and computer
science.
Its main goal is to develop mathematical models of biological
phenomena,
especially at a molecular level. Then, the models are used to
construct
algorithmic methods dedicated for supporting an analysis of these
phenomena and
structures of biomolecules (DNA, RNA and proteins).
Crucial discoveries in molecular biology made in the second half
of the 20th
century showed that biology is in fact information processing.
The information
processed in living organisms is encoded in macromolecules present
in each
cell, i.e. nucleic acids (DNA and RNA) and proteins.
In order to fully understand mechanisms of life it is necessary
to
understand biological processes taking place at a molecular level.
Such
processes are controlled by genetic information contained
in DNA (in some viruses the information is encoded in RNA).
Consequently, the first stage in such studies is usually to read
DNA
sequences, which is a basic step in genomic research. Due to some
technological constraints that reading cannot be done directly,
e.g. using
a microscope. More complex methods have to be used, instead and
they
usually consist of two main stages: the biochemical and the computational
one. Problems arising in reading DNA sequences are almost always
computationally intractable.
Reading DNA sequence is usually only the first step
in broader genomic research. The main challenge is to understand
information contained in the sequences. The goal is to discover
all genes,
to understand their functions and, moreover, to understand the
regulatory
functions of DNA, thanks to which the genes work in the right
place,
i.e. in the proper tissue, and time. This is an extremelly difficult
task,
mainly because of the fact that there are many unknown interactions
between genes and their products (i.e. proteins). Moreover, it
is
estimated that in a human genome only a very small fraction (few
percents)
of DNA sequences encodes proteins (which are building blocks of
all living
organisms). The role of the most of the remainder is unknown and
it is
usually called junk DNA, since there is a supposition that it
has no function.
However, it may be also the case that a great part of the non-coding
DNA
plays some regulatory functions (which are crucial for the proper
functionality
of an organism).
Nowadays, the process of discovering genes and their functions
is based mainly on
a sequence comparison. Since databases of biological sequences
grow rapidly,
such a process requires advanced algorithmic methods. When a new
gene is discovered
researchers usually try to find a similar already known gene in
order to
infer its function. The problem is not trivial, not only because
of huge
number of sequences in databases, but also because there is not
known
well-defined notion of similarity of proteins and its relations
to similarity
of protein functions. It is not enough to make a simple search
for patterns in
sequences. More flexible approaches are required as, for example,
searching
for motifs. Moreover, a pairwise comparison of sequences is often
useless,
especially in a reconstruction of evolutionary history. In that
case,
on the basis of mutation of analogous genes coming from different
organisms,
researchers try to infer their evolutionary history.
In other words, in evolutionary research the goal is to construct
a phylogenetic (evolutionary) tree for a given group of taxa.
In such
a tree the leaves correspond to currently living organisms and
the internal
nodes represent their hypothetical ancestors. These ancestors
should be
inferred from similarities amang organisms represented by the
leaves
in the tree.
The isues mentioned above are only selected classical problems
of computational
biology. But since this branch of science is stimulated by rapidly
increasing
amount of available genetic information, it evolves very fast
and new exciting
problems appear very often.
Solution to many of them requires new mathematical methods among
which
combinatorial optimization approaches appear to be especially
useful.