Exploring the repertoire of RNA secondary motifs using graph theory with implications for RNA design

Understanding the structural repertoire of RNA is crucial for RNA genomics research. Yet current methods for finding novel RNAs are limited to small or known RNA families. To expand known RNA structural motifs, we develop a two-dimensional graphical repre- sentation approach for describing and estimating the size of RNA's secondary structural repertoire, including naturally occurring and other possible RNA motifs. We employ tree graphs to describe RNA tree motifs and more general (dual) graphs to describe both RNA tree and pseudoknot motifs. Our estimates of RNA's structural space are vastly smaller than the nucleotide sequence space, suggesting an advantage for finding novel RNAs. Our survey shows that known RNA trees and pseudoknots represent only a small subset of all possible motifs, implying that some of the "missing" motifs may represent novel RNAs. To help pinpoint RNA-like motifs, we show that the motifs of existing functional RNAs are clustered in a narrow range of topological characteristics. We also illustrate the applications of our approach to design of novel RNAs and automated comparison of RNA structures; we report several occurrences of RNA motifs within larger RNAs. Thus, our graph theory approach to RNA structures has implications for RNA genomics, structure analysis, and design.

