ALNGG Information


Alngg detects the protein coding gene by genome comparison between two species. We tuned for Human vs. Mouse or Human vs. Rat enough. We are now testing the other combinations(e.g. Chickien, Dog etc). We expect this tool effective between many vertebrates. But this tool is not suited for too close species(e.g. Human vs. Chimpanzee, Mouse vs. Rat etc).
For the reason that alngg predicts genes from homologous parts between two species, this tool can not predict species-specific genes. But we estimate such a gene not so many which is less than 10 % in case of Human vs. Mouse.
At first, Alngg generates preliminary Clumps(i.e. expected Exon), and constructs as Clusters(i.e. expected Synteny). Afterward, Alngg adjusts each Clump using the technique of Aln's gene-structure prediction[1]. At last, Alngg determines the optimal route as gene-structure with DP.


[1] Gotoh, O. (2000) " Homology-based gene structure prediction: simplified matching algorithm by the use of translated codon (tron) and improved accuracy by allowing for long gaps." Bioinformatics,16, 190-202.

[2] Gotoh, O., Morita, M., Ichiyoshi, N., and Yada, T. (2005) Discovery of protein coding genes through chromosome-to-chromosome sequence comparison. Genome Informatics 2005, Yokohama, 12.19-21.

Sequence file

Now Alngg accepts only FASTA format. A lower case letter is ignored as the masking region. The sequences would be better masked. Because the low complexity patterns are caused increasing the false positives.

User Operation

Alngg is generally run from the command line. Typical usage is denoted the two sequences with the options as follows:
  % alngg -I1 S1.seq -I2 S2.seq
But you can not catch anything for the result with this run. It is necessary for any outputs with the below options.
 [Output Options]
  -O1gtf  Out_file      Predictions for Seq1(GTF)
  -O2gff  Out_file      Predictions for Seq2(GFF)
  -Ogene  Out_file      Output for GUI
  -Ocluster  Out_file   Output for Synteny map
If you want to compare with the partial sequence, the following options are suited.
  -I1Start NNN   the comparison starts from NNN of Seq1
  -I1End   NNN   the comparison terminates to NNN of Seq1
  -I2Start NNN   the comparison starts from NNN of Seq2
  -I2End   NNN   the comparison terminates to NNN of Seq2
Alngg detects the same strand genes. If you want to get the reverse strand genes, run with the following option.
   -rev    the comparison with the reverse Seq2 
Alngg is a homology base system. it sometimes lose the first or last exon because of the weak match. If you want to change the default, you can specify the following options.
   -completeGene    try to extract the lost first exon or last exon again.
By this option the performance would go down under the influence of unreliable matches.

Copyright © 1997-2007 Osamu Gotoh