Keynote Speaker - Piotr Formanowicz

On some combinatorial problems of computational biology

Computational biology is a relatively new interdisciplinary branch of science.
It evolves on an intersection of biology, mathematics and computer science.
Its main goal is to develop mathematical models of biological phenomena,
especially at a molecular level. Then, the models are used to construct
algorithmic methods dedicated for supporting an analysis of these phenomena and
structures of biomolecules (DNA, RNA and proteins).

Crucial discoveries in molecular biology made in the second half of the 20th
century showed that biology is in fact information processing. The information
processed in living organisms is encoded in macromolecules present in each
cell, i.e. nucleic acids (DNA and RNA) and proteins.

In order to fully understand mechanisms of life it is necessary to
understand biological processes taking place at a molecular level. Such
processes are controlled by genetic information contained
in DNA (in some viruses the information is encoded in RNA).
Consequently, the first stage in such studies is usually to read DNA
sequences, which is a basic step in genomic research. Due to some
technological constraints that reading cannot be done directly, e.g. using
a microscope. More complex methods have to be used, instead and they
usually consist of two main stages: the biochemical and the computational
one. Problems arising in reading DNA sequences are almost always
computationally intractable.

Reading DNA sequence is usually only the first step
in broader genomic research. The main challenge is to understand
information contained in the sequences. The goal is to discover all genes,
to understand their functions and, moreover, to understand the regulatory
functions of DNA, thanks to which the genes work in the right place,
i.e. in the proper tissue, and time. This is an extremelly difficult task,
mainly because of the fact that there are many unknown interactions
between genes and their products (i.e. proteins). Moreover, it is
estimated that in a human genome only a very small fraction (few percents)
of DNA sequences encodes proteins (which are building blocks of all living
organisms). The role of the most of the remainder is unknown and it is
usually called junk DNA, since there is a supposition that it has no function.
However, it may be also the case that a great part of the non-coding DNA
plays some regulatory functions (which are crucial for the proper functionality
of an organism).

Nowadays, the process of discovering genes and their functions is based mainly on
a sequence comparison. Since databases of biological sequences grow rapidly,
such a process requires advanced algorithmic methods. When a new gene is discovered
researchers usually try to find a similar already known gene in order to
infer its function. The problem is not trivial, not only because of huge
number of sequences in databases, but also because there is not known
well-defined notion of similarity of proteins and its relations to similarity
of protein functions. It is not enough to make a simple search for patterns in
sequences. More flexible approaches are required as, for example, searching
for motifs. Moreover, a pairwise comparison of sequences is often useless,
especially in a reconstruction of evolutionary history. In that case,
on the basis of mutation of analogous genes coming from different organisms,
researchers try to infer their evolutionary history.
In other words, in evolutionary research the goal is to construct
a phylogenetic (evolutionary) tree for a given group of taxa. In such
a tree the leaves correspond to currently living organisms and the internal
nodes represent their hypothetical ancestors. These ancestors should be
inferred from similarities amang organisms represented by the leaves
in the tree.

The isues mentioned above are only selected classical problems of computational
biology. But since this branch of science is stimulated by rapidly increasing
amount of available genetic information, it evolves very fast and new exciting
problems appear very often.

Solution to many of them requires new mathematical methods among which
combinatorial optimization approaches appear to be especially useful.