Graph Theory and RNA Structures
Since RNA secondary structures are
essentially 2D networks, they can be represented using planar graphs to
facilitate analysis of RNA structures. Indeed, modeling network motifs
using graphs has proven to be fruitful for many complex systems in biochemistry,
neurobiology, ecology, and engineering.
Graph Theory is a branch of mathematics
that deals with configurations described by nodes and connections. The
configurations may represent physical networks, such as electrical circuits
or chemical compounds, where atoms and bonds are modeled as nodes and connections,
respectively. Formally, such configurations are modeled by graphs consisting
of vertices or nodes (o) and edges or lines (---), which connect the vertices.
In our work, we use planar graphs, whose vertices and edges are drawn in
a 2D plane.
Figures A and B show examples of tree
(A) and non-tree (B) planar graphs. These are connected graphs since
every pair of vertices in the graph is connected by one or more paths;
a path is a ``walk'' from a vertex to another with no repeated edges or
vertices. A tree is a connected graph whose vertex connections do not form
closed paths (e.g., no triangles). Hydrocarbon molecules, such butane and
isobutane, can be represented using trees or tree graphs. In fact, the
use of tree graphs to count possible hydrocarbon structures played a significant
role in the development of graph theory through the graphical enumeration
theorem of Arthur Cayley. We exploit this and other tree enumeration theorems
to count RNA's topological motifs. For RNA pseudoknots, non-tree graphs
are required to describe their complex patterns of connectivity involving
closed paths or faces (Figure B).
Figure A |
|
Figure B |
|
Unlike graphs for chemical structures,
where atoms are vertices and bonds are edges, our RNA graphs are RNA secondary
topologies where a vertex or an edge can represent multiple nucleotide
bases or base pairs, which are themselves composed of multiple atoms and
bonds. To allow graphical representation of complex RNA secondary topologies,
special rules for defining RNA graphs are required. The rules specify how
to represent RNA loops, bulges, junctions, and stems as vertices or edges
in a graph (see How
to Represent RNA Trees as Planar Graphs and How
to Represent All Types of RNA Motifs as Dual Graphs). Essentially,
the tree and dual graph rules simplify RNA secondary motifs to allow their
representation as mathematical graphs; the ``RNA graphs'' specify the skeletal
connectivity of the secondary motifs.