rag_tutorial_graph_theory | Schlick Group at NYU

Graph Theory and RNA Structures

Since RNA secondary structures are essentially 2D networks, they can be represented using planar graphs to facilitate analysis of RNA structures. Indeed, modeling network motifs using graphs has proven to be fruitful for many complex systems in biochemistry, neurobiology, ecology, and engineering. Graph Theory is a branch of mathematics that deals with configurations described by nodes and connections. The configurations may represent physical networks, such as electrical circuits or chemical compounds, where atoms and bonds are modeled as nodes and connections, respectively. Formally, such configurations are modeled by graphs consisting of vertices or nodes (o) and edges or lines (---), which connect the vertices. In our work, we use planar graphs, whose vertices and edges are drawn in a 2D plane.

Figures A and B show examples of tree (A) and non-tree (B) planar graphs. These are connected graphs since every pair of vertices in the graph is connected by one or more paths; a path is a ``walk'' from a vertex to another with no repeated edges or vertices. A tree is a connected graph whose vertex connections do not form closed paths (e.g., no triangles). Hydrocarbon molecules, such butane and isobutane, can be represented using trees or tree graphs. In fact, the use of tree graphs to count possible hydrocarbon structures played a significant role in the development of graph theory through the graphical enumeration theorem of Arthur Cayley. We exploit this and other tree enumeration theorems to count RNA's topological motifs. For RNA pseudoknots, non-tree graphs are required to describe their complex patterns of connectivity involving closed paths or faces (Figure B).

Figure A

Figure B

Unlike graphs for chemical structures, where atoms are vertices and bonds are edges, our RNA graphs are RNA secondary topologies where a vertex or an edge can represent multiple nucleotide bases or base pairs, which are themselves composed of multiple atoms and bonds. To allow graphical representation of complex RNA secondary topologies, special rules for defining RNA graphs are required. The rules specify how to represent RNA loops, bulges, junctions, and stems as vertices or edges in a graph (see How to Represent RNA Trees as Planar Graphs and How to Represent All Types of RNA Motifs as Dual Graphs). Essentially, the tree and dual graph rules simplify RNA secondary motifs to allow their representation as mathematical graphs; the ``RNA graphs'' specify the skeletal connectivity of the secondary motifs.