AbIni3D - Ab inition folding

Problem: The program is intended for calculating 3D structure of proteins, provided that 3D structures of individual parts (fragments) of the protein are known, while phi and psi angles between the fragments should be found. This problem may arise when constructing a protein structure from fragments, whose structures were obtained using the search for homology of their primary sequences.
Method: The angles are calculated by genetic algorithm. The target optimization function is comprised by two additive contributions: (a) energy of the short-range interaction between the fragments and (b) the energy of phi/psi angles constructed basing on statistics of the angles between fragments of secondary structures in protein 3D structures from PDB database.
Results: Testing using seven natural proteins (with lengths from 58 to 135 aa; each protein consisted of several fragments) demonstrated that the program restores the native structure with a mean accuracy of 5.3.6.7 A. The prediction accuracy depends on individual protein and program operation mode: for three best proteins, the mean value of RMSD between the restored and native structures over ten runs amounted to 1.9, 2.3, and 2.6 A.

Program is provided with viewer.

HELP in questions and answers on the AbIni3D program
Q: For what purpose the program is intended?
A: For calculating protein spatial structures basing on the fragments of whole structure that can be obtained by use of search for homology.
Q: How are the fragments selected?
A: Fragments of protein sequence (homologous regions) should be selected so that they would completely span the whole sequence of the target protein and, on the other hand, should not overlap. The program joins the fragments into a single chain and by use of genetic algorithm, optimizes phi and psi angles at the sites where the fragments were joined to find the conformation displaying a minimal energy.
Q: What are the launching parameters, input, and output formats?
A: The program has two mandatory parameters and one optional: these are the input COV file, output PDB file, and optional parameter-the number of computing cycles for genetic algorithm (default value, 500).
Q: How the run-time should be selected?
A: This depends on the number of fragments-more fragments require a longer run-time. For example, 50 cycles are sufficient for optimizing two fragments.
Q: What is the input COV format?
A: This is a specialized format for the program in question containing information on the primary structure of the fragments, alignments for covering of the target sequence, and "pieces" of PDB files corresponding to the covering fragments.


   Example:	
========================================================================================

*****  SET 1 *****
>1NDDB qb=0 pb=25 le=20 Sc=98.9
aaaa           bbbbb
MSANFTDKNGRQSKGVLLLR
IKERVEEKEGIPPQQQRLIY
aaaaaaaaa      bbbbb
ATOM    794  N   ILE B 126      37.162  -0.022  40.293  1.00 12.67           N  
ATOM    795  CA  ILE B 126      35.962  -0.674  39.781  1.00 11.72           C  
ATOM    796  C   ILE B 126      35.671  -0.073  38.399  1.00 12.39           C  
ATOM    797  O   ILE B 126      35.366  -0.799  37.452  1.00 14.47           O  
ATOM    798  CB  ILE B 126      34.746  -0.424  40.696  1.00 13.18           C  
ATOM    799  CG1 ILE B 126      35.033  -0.951  42.107  1.00 14.02           C  
ATOM    800  CG2 ILE B 126      33.499  -1.074  40.094  1.00 15.53           C  
ATOM    801  CD1 ILE B 126      33.908  -0.706  43.107  1.00 14.94           C  
ATOM    802  N   LYS B 127      35.806   1.249  38.282  1.00 11.60           N  
ATOM    803  CA  LYS B 127      35.581   1.929  37.006  1.00 11.37           C  

....    ...  ..  ... . ...      ......   .....  ......  .... .....           .

ATOM    964  CZ  TYR B 145      25.681  -2.498  47.587  1.00 17.99           C  
ATOM    965  OH  TYR B 145      25.481  -3.704  48.220  1.00 20.22           O  
>2PDZA qb=20 pb=31 le=17 Sc=93.1
b                
TLAMPSDTNANGDIFGG
KIFKGLAADQTEALFVG
b     aaaa       
ATOM    498  N   LYS A  32      -1.097  -3.476  -1.916  1.00  0.00           N
....    ...  ..  ... . ...      ......   .....  ......  .... .....           .  
TER
========================================================================================

There may be several variants of coverings (SETs); therefore, each new variant starts from the corresponding keyword, for example, "SET 1"; next, "SET 2"; etc.
Q: How is it possible to create a COV file?
A: The file mandatory starts with the keyword "SET" with any number, for example, 1, 2, etc., followed one after another by the "pieces" of spatial structures in PDB format. The fragments are separated from one another by an empty string.
Example: suppose, you want to "disrupt" the native structure of a protein (and you have this structure in PDB format) to test then how it will be restored using this program. For this purpose, copy your PDB file, for example, YourProtein.pdb, into the file with a name, for example, YourProtein.cov, and introduce the corresponding changes:
- Put the text, for example, " SET 1 ", into the first string (it is important that the first string would contain the word SET in capitals) and
- Add empty strings at the points where you want to destroy the protein structure (i.e. break the conformation of the main chain); several breaks (empty strings) are recommended, for example, tree-five.

  
 Example:
******* SET 1 *******
REMARK   MSI WebLab Viewer PDB file
REMARK   Created:  Fri Oct 25 07:58:42 ‡ˆœņš’™üœ’  Lž›  (ž›˜’) 2002
CRYST1   57.810   29.700  106.090  90.00 101.99  90.00 A2
ATOM      1  N   GLY A   1      15.740  11.178 -11.733  1.00  0.00              
ATOM      2  CA  GLY A   1      15.234  10.462 -10.556  1.00  0.00              
ATOM      3  C   GLY A   1      16.284   9.483  -9.998  1.00  0.00              
ATOM      4  O   GLY A   1      17.150   8.979 -10.709  1.00  0.00              
....    ...  ..  ... . ...      ......   .....  ......  .... .....           
ATOM    310  N   LEU A  40       6.658  -4.909  19.830  1.00  0.00              
ATOM    311  CA  LEU A  40       6.751  -5.839  20.961  1.00  0.00              
ATOM    312  C   LEU A  40       5.510  -6.747  21.050  1.00  0.00              
ATOM    313  O   LEU A  40       5.642  -7.969  21.132  1.00  0.00              
ATOM    314  CB  LEU A  40       6.968  -5.086  22.286  1.00  0.00              
ATOM    315  CG  LEU A  40       7.926  -5.898  23.179  1.00  0.00              
ATOM    316  CD1 LEU A  40       8.886  -4.973  23.944  1.00  0.00              
ATOM    317  CD2 LEU A  40       7.121  -6.784  24.145  1.00  0.00              
               // Empty line - a point of a break 
ATOM    318  N   GLU A  41       4.357  -6.093  21.040  1.00  0.00              
ATOM    319  CA  GLU A  41       3.066  -6.778  21.082  1.00  0.00              
ATOM    320  C   GLU A  41       2.967  -7.863  19.997  1.00  0.00              
ATOM    321  O   GLU A  41       2.821  -9.046  20.315  1.00  0.00              
ATOM    322  CB  GLU A  41       1.903  -5.775  20.992  1.00  0.00              
ATOM    323  CG  GLU A  41       1.986  -4.741  22.132  1.00  0.00              
ATOM    324  CD  GLU A  41       0.577  -4.464  22.689  1.00  0.00              
ATOM    325  OE1 GLU A  41      -0.227  -5.435  22.661  1.00  0.00              
ATOM    326  OE2 GLU A  41       0.371  -3.298  23.120  1.00  0.00              
TER