% cd src % ./configure [--help] % make all % make install % make clearall % CFLAGS="-O3 -DDVAL=1" ./configure % make prrn % make install_prrn % make clearall
If you have changed the location of the table directory after installation, set the env variable ALN_TAB:
% setenv ALN_TAB New_Aln_Tab (csh/tsh) $ export ALN_TAB=New_Aln_Tab (sh/bsh)
To represent the exon-intron structure of the parental gene, the format of FASTA file should be extended. A line starting with ';C' shows the exon boundaries (inclusive). More than one line may be used if necessary. The format after ';C ' is essentially the same as that of Feature field of a GenBank file. Start and end positions of each exon are separated by two dots. Individual exons are delimited by a comma. The term 'complement' indicates that the corresponding gene lies in the complementary strand of the genomic sequence. Two examples are as follows:
>ce13a1 C. elegans chromosome II positive strand ;C join(9803525..9803710,9803766..9804097,9804152..9804251, ;C 9804299..9804855,9804926..9805069,9805115..9805349) MSFSILIAIAIFVGIISYYLWIWSFWIRKGVKGPRGLPFLGVIHKFTNYENPGALKFSEW TKKYGPVYGITEGVEKTLVISDPEFVHEVFVKQFDNFYGRKLTAIQGDPNKNKRVPLVAA QGHRWKRLRTLASPTFSNKSLRKIMGTVEESVTELVRSLEKASAEGKTLDMLEYYQEFTM DIIGKMAMGQEKSLMFRNPMLDKVKTIFKEGRNNVFMISGIFPFVGIALRNIFAKFPSLQ MATDIQSILEKALNKRLEQREADEKAGIEPSGEPQDFIDLFLDARSTVDFFEGEAEQDFA KSEVLKVDKHLTFDEIIGQLFVFLLAGYDTTALSLSYSSYLLATHPEIQKKLQEEVDREC PDPEVTFDQLSKLKYLECVVKEALRLYPLASLVHNRKCLKTTNVLGMEIEAGTNINVDTW SLHHDPKVWGDDVNEFKPERWESGDELFFAKGGYLPFGMGPRICIGMRLAMMEMKMLLTN ILKNYTFETTPETVIPLKLVGTATIAPSSVLLKLKSRF [EOF] >ce13a2 C. elegans chromosome II negative strand ;C complement(join(9798263..9798503,9798584..9798727,9798905..9799461, ;C 9799519..9799618,9799680..9800011,9800058..9800243)) MSLSILIAGASFIGLLTYYIWIWSFWIRKGVKGPRGFPFFGVIHEFQDYENPGLLKLGEW TKEYGPIYGITEGVEKTLIVSNPEFVHEVFVKQFDNFYGRKTNPIQGDPNKNKRAHLVSA QGHRWKRLRTLSSPTFSNKNLRKIMSTVEETVVELMRHLDDASAKGKAVDLLDYYQEFTL DIIGRIAMGQTESLMFRNPMLPKVKGIFKDGRKLPFLVSGIFPIAGTMFREFFMRFPSIQ PAFDIMSTVEKALNKRLEQRAADEKAGIEPSGEPQDFIDLFLDARANVDFFEEESALGFA KTEIAKVDKQLTFDEIIGQLFVFLLAGYDTTALSLSYSSYLLARHPEIQKKLQEEVDREC PNPEVTFDQISKLKYMECVVKEALRMYPLASIVHNRKCMKETNVLGVQIEKGTNVQVDTW TLHYDPKVWGEDANEFRPERWESGDELFYAKGGYLPFGMGPRICIGMRLAMMEKKMLLTH ILKKYTFETSTQTEIPLKLVGSATTAPRSVMLKLTPRHSN [EOF]
The line starting with ';C' may be simplified as follows:
>ce13a1 C. elegans chromosome II positive strand ;C + 9803525 9803710 9803766 9804097 9804152 9804251 ;C 9804299 9804855 9804926 9805069 9805115 9805349 ... >ce13a2 C. elegans chromosome II negative strand ;C - 9798263 9798503 9798584 9798727 9798905 9799461 ;C 9799519 9799618 9799680 9800011 9800058 9800243 ...
When the parental gene contains one or more frame shifts, an output from aln or spaln contains the corresponding number of lines starting with ';M' such that:
;M Deleted n chars at p ;M Insert n chars at pThe first line indicates that n (n = 1 or 2) nucleotides have been deleted from the parental genomic sequence beginning at site p. Likewise, the second line indicates that n blank characters are inserted after site p to maintain the open reading frame. Such kinds of information are used to properly juxtapose intron positions along the alignment.
|File 1:|||||File 2:|
Seq1 - Seq2
N is calculated to be 7. Subsequent lines up to the first blank line are ignored. The rest of the file is composed of one or more blocks of a fixed column width of less than 254 characters. Each block is composed of N 'sequence lines' and other (optional) 'non-sequence lines'. The general format of a sequence line is:
<Position> <Sequence> <Name>
where <Position> is a numeral that indicates the sequence position of the first letter in <Sequence|> (Usually all <Position>s in the first block are 1, but it is not a prerequisite. Negative values are also appropriate). A line lacking the <Position> field is regarded as a 'non-sequence line' and ignored upon reading. The i-th sequence line in the second block is concatenated to the i-th sequence line in the first block, and so on. There is no particular limit on N or the length, but the total number of characters to be stored is limited by MAXAREA defined in src/sqio.h. Several examples of native format are provided in the sample directory.
 Gotoh, O. (1982) "An improved algorithm for matching biological sequences." J. Mol. Biol. 162, 705-708.
 Gotoh, O. (1990) "Optimal sequence alignment allowing for long gaps." Bull. Math. Biol. 52, 359-373.
 Berger, M.P., and Munson, P.J. (1991) "A novel randomized iterative strategy for aligning multiple protein sequences." CABIOS 7, 479-484.
 Gotoh, O. (1993) "Optimal alignment between groups of sequences and its application to multiple sequence alignment." CABIOS 9, 361-370.
 Gotoh, O. (1993) "Extraction of conserved or variable regions from a multiple sequence alignment." Proceedings of Genome Informatics Workshop IV, pp. 109-113.
 Gotoh, O. (1994) "Further improvement in group-to-group sequence alignment with generalized profile operations." CABIOS 10, 379-387.
 Gotoh, O. (1995) "A weighting system and algorithm for aligning many phylogenetically related sequences." CABIOS, 11, 543-551.
 Gotoh, O. (1996) "Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments." J. Mol. Biol. 264, 823-838.
 Gotoh, O. (1999) "Multiple sequence alignment: algorithms and applications." Adv. Biophys. 36, 159-206.
 Gotoh, O., Yamada, S., and Yada, T. (2006) Multiple Sequence Alignment, in Handbook of Computational Molecular Biology, (Aluru, S. ed.) Chapman & Hall/CRC, Computer and Information Science Series, Vol. 9, pp. 3.1-3.36.
Copyright © 1997-2020 Osamu Gotoh