Prediction of potential coding fragments in EST/mRNA sequence.

Method description:

Algorithm is based on Markov chain model of coding regions and a probabilistic model to combine it with Start codon potential.

Accuracy:

Our tests show that accuracy of frame recognition (true ORF) is about 100% for typical mRNA and about 99% for mRNA fragments of 500 - 800 bp containing partial coding region. Accuracy is lower for EST with frameshift errors, or for EST with very short coding fragments.

The program outputs potential CDS positions produced taking into account probabilities of each potential start codon, as well as longest ORF positions, as an extension of CDS upstream from start codon). If all observed Met codons are recognized as internal, i.e. when predicted translation start codon is missing from the sequence, CDS and ORF have the same positions.

Example of Output:


BestORF  Prediction of potential coding fragment in plant EST/mRNA sequence
 Time:   Tue Feb 16 20:03:57 1999.
 Seq name: Seq_name: 
 Length of sequence:  388
 Predicted CDS 1 in +chain 1 in -chain 0
 Position of predicted CDS/ORF:
  G Str Feature   Start      End   Score       ORF       CDS-Len Frame 

  1 +   1 CDSo      30 -     386   30.57      3 -    386    357     +3

Predicted protein fragment:
>BestORF   1   1 fragment (s)     30  -    386    119 aa, chain +
MDELDILIVGGYWGKGSRGGMMSHFLCAVAEKPPPGEKPSVFHTLSRVGSGCTMKELYDL
GLKLAKYWKPFHRKAPPSSILCGTEKPEVYIEPCNSVIVQIKAAEIVPSDMYKTGCTLR

Abbreviations: G - gene (CDS/ORF), Str - Strand, CDS-Len - CDS Length.