Search for regulatory motifs conserved in several sequences
As NSITE, Nsite-m is also based on search of statistically significant regulatory site consensus - see NSITE Help for more description. The main features of the approach are the follows:
(i) RE may consist of a single box (a continuous DNA segment) or two boxes, spaced by some DNA sequence, where only length, but not nucleotide content, of this spacer is important for functioning of such a composite site.
(ii) A real RE or its IUPAC consensus contains both variable positions, where the presence of a certain group of nucleotides is permissible, and strictly conserved positions, where strict identity between real site/consensus and predicted motif is required . The nonequivalence of these positions should be taken into account, i.e., complete homology at conserved positions is required, and a violation of homology in the variable positions should be permissible.
(iii) The homology between RE and a motif on query DNA sequence may be a random happening, therefore, estimation of its statistical significance is very important. A conclusion on functional significance of revealed homology can be reached only if the homology is significantly nonrandom, i.e., the homology is not a random event.
(iv) Characteristics such as nucleotide frequencies should not be used when describing consensus because of its small size. Instead, one should use estimates based on number of specific nucleotides in the consensus.
(v) Although all available RE databases usually annotate fixed distance between two boxes of composite elements, some variability of the spacer length usually takes place. Therefore, search algorithm for composite REs should allow some limited flexibility in spacer length.