Archive

Posts Tagged ‘alignment’

Discriminating between RNA basepairs

March 18th, 2009

If you want to predict the secondary structure – i.e. the basepairing interactions – for a single RNA sequence, you would normally rely on energy parameters and try to find the minimum free energy structure. If, however, you have multiple related sequences you suddenly have access to all the evolutionary information you get from a multiple alignment. This can be utilized.

Click to continue reading “Discriminating between RNA basepairs”

Stinus Geek stuff, introduction , ,

Shuffling multiple alignments

March 11th, 2009

Scoring of candidates has been a major problem in large scans for structural ncRNAs. One reason for this is the lack of a method to maintain the dinucleotide composition of the shuffled alignments that are used as null examples for these scans. The norm for assessing the biological significance of the predictions in these scans is to re-run their respective method on random data that is similar to real genomic data, and compare the score distribution. The random data has often been generated by shuffling the original alignments, maintaining mononucleotide frequencies, gap structure and local conservation patterns, using an algorithm by Washietl and Hofacker [2004].

For prediction of structured RNAs, the folding free energy is a critical attribute. Since G − C base pairs are more energetically favorable, random control sequences should ideally exhibit the same mononucleotide frequencies as native sequences. More subtly, dinucleotide frequencies also matter, since stacked pairs in helices also affect folding energy. Several studies (e.g., Workman and Krogh [1999], Clote et al. [2005]) convincingly demonstrate that this is more than just a theoretical concern. To constrain the dinucleotide frequencies, while also maintaining mononucleotide frequencies, gap structure and local conservation patterns, often reduces the number of possible shuffles unacceptably.

We, therefore, designed a shuffling algorithm, Multiperm, for arbitrary multiple alignments. Multiperm preserves mononucleotide frequencies, gap structure and local conservation patterns exactly, while preserving dinucleotide frequencies approximately. The number of distinct shuffles that can be produced by the algorithm is very large for the vast majority of multiple alignments. Together these characteristics provide a  much more realistic null model for RNA prediction than previously available.

elfar Algorithms ,

WAR

February 10th, 2009

The growing interest in non-coding RNAs in recent years has given rise to many different programs focused on aligning and predicting the secondary structure of ncRNAs. It can be difficult for a user to determine which one to use, to judge the different predictions, and sometimes even to run the programs. Therefore I and Stinus Lindgreen, implemented a webserver which provides users with an easy way to run the top methods available simultaneously, and get a combined, simple view of the predictions, which can be downloaded in various formats for further analysis. Additional measures are calculated for each program to make it easier to judge the individual predictions, and a consensus prediction taking all the programs into account is also calculated. The webserver will run globally and locally, where the local version simply uses CMfinder [Yao et al., 2006] to cut out the local regions that will then be fed to the other programs globally as usual.

war_flow3

The consensus is in itself a heatmap indicating how well the multiple alignment programs included agree on the prediction. This can be very valuable when studying unknown multiple sequences to find if they have well defined, reliable, sequence and structure conservation. Check out this example output.

elfar Cool Tools, Web , ,