Added/modified in Ver. 4.0.4 (2014-06-02)
Added/modified in Ver. 3.1.3 (2010-02-06)
Added/modified in Ver. 3.1.1
Added/modified in Ver. 3.1.0
 Gotoh, O. (1982) "An improved algorithm for matching biological sequences." J. Mol. Biol. 162, 705-708.
 Gotoh, O. (1990) "Optimal sequence alignment allowing for long gaps." Bull. Math. Biol. 52, 359-373.
 Berger, M.P., and Munson, P.J. (1991) "A novel randomized iterative strategy for aligning multiple protein sequences." CABIOS 7, 479-484.
 Gotoh, O. (1993) "Optimal alignment between groups of sequences and its application to multiple sequence alignment." CABIOS 9, 361-370.
 Gotoh, O. (1993) "Extraction of conserved or variable regions from a multiple sequence alignment." Proceedings of Genome Informatics Workshop IV, pp. 109-113.
 Gotoh, O. (1994) "Further improvement in group-to-group sequence alignment with generalized profile operations." CABIOS 10, 379-387.
 Gotoh, O. (1995) "A weighting system and algorithm for aligning many phylogenetically related sequences." CABIOS, 11, 543-551.
 Gotoh, O. (1996) "Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments." J. Mol. Biol. 264, 823-838.
 Gotoh, O. (1999) "Multiple sequence alignment: algorithms and applications." Adv. Biophys. 36, 159-206.
 Gotoh, O., Yamada, S., and Yada, T. (2006) Multiple Sequence Alignment, in Handbook of Computational Molecular Biology, (Aluru, S. ed.) Chapman & Hall/CRC, Computer and Information Science Series, Vol. 9, pp. 3.1-3.36.
 Gotoh, O. (2013) "Heuristic Alignment Methods, in Multiple Sequence Alignment Methods, (Russel, D. ed.), Methods in Molecular Biology, Vol. 1079, pp. 29-44, Humana Press.
 Gotoh, O., Morita, M. Nelson, D.R. (2014) "Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment", submitted
To represent the exon-intron structure of the parental gene, the format of FASTA file should be extended. A line starting with ';C' shows the exon boundaries (inclusive). More than one line may be used if necessary. The format after ';C ' is essentially the same as that of Feature field of a GenBank file. Start and end positions of each exon are separated by two dots. Individual exons are delimited by a comma. The term 'complement' indicates that the corresponding gene lies in the complementary strand of the genomic sequence. Two examples are as follows:
>ce13a1 C. elegans chromosome II positive strand
>ce13a2 C. elegans chromosome II negative strand
When the parental gene contains one or more frame shifts, an output from aln or spaln contains the corresponding number of lines starting with ';M' such that: ;M Deleted n chars at p ;M Insert n chars at p The first line indicates that n (n = 1 or 2) nucleotides have been deleted from the parental genomic sequence beginning at site p. Likewise, the second line indicates that n blank characters are inserted after site p to maintain the open reading frame. Such kinds of information are used to properly juxtapose intron positions along the alignment.
|File 1:|||||File 2:|
Interleaved (native) format of multiple sequences
This native format is designed for multiple-sequence alignment to be naturally recognized by human eyes. The alignment produced by aln can be used as an input to aln or prrn, and this is the most common way to have access to sets of pre- aligned sequences. Thus, the format of an aligned sequence file is the same as the default output format of aln. The first non-blank line in a file must indicate the number of sequences, N, involved in the alignment. This number is obtained as the sum of numbers in square brackets, e.g., when the first line is
Seq1 - Seq2
N is calculated to be 7. Subsequent lines up to the first blank line are ignored. The rest of the file is composed of one or more blocks of a fixed column width of less than 254 characters. Each block is composed of N 'sequence lines' and other (optional) 'non-sequence lines'. The general format of a sequence line is:
<Position> <Sequence> <Name>
where <Position> is a numeral that indicates the sequence position of the first letter in <Sequence|> (Usually all <Position>s in the first block are 1, but it is not a prerequisite. Negative values are also appropriate). A line lacking the <Position> field is regarded as a 'non-sequence line' and ignored upon reading. The i-th sequence line in the second block is concatenated to the i-th sequence line in the first block, and so on. There is no particular limit on N or the length, but the total number of characters to be stored is limited by MAXAREA defined in src/sqio.h. Several examples of native format are provided in the sample directory.
 prrn [options1] [options2]  prrn [options1] [options2] seq1 [seq2 seq3 ...]Options1 is specific to prrn, whereas options2 is common to other programs, and default values will be used if it is omitted. Method  is a 'conversational mode', which will be discussed in detail below. Method  is a 'command-line mode'; the calculation starts immediately using the sequence data in file(s) seqn. When two or more sequences or an unaligned multiple sequence file are given, they are combined and prealigned according to a progressive method. The -b Tree option described below generates the initial alignment by a progressive method using the given tree as the guide tree.
Menu Prompt: 1 pgris equivalent to
Menu Prompt: 1 Menu Prompt: p Menu Prompt: g Menu Prompt: r
[[path][filename]] [start] [end] [attribute]Any terms may be omitted, except start when you wish to specify end. When path is omitted, the default path is used (these can be specified with options2, see  instant mode below). Start and end specify the range of the sequence to be analyzed. The default values are 1 and the last position of the sequence, respectively. Some attribute terms are applicable only to a nucleotide sequence. Currently meaningful attributes are defined below.
^ : the sequence is complemented. (nucleotide sequence only) - : the sequence is reversed. < : the sequence is reverse-complemented. % : forces the sequences into profile. A : amino acid sequence. P : amino acid sequence. D : nucleotide sequence. R : nucleotide sequence. N : nucleotide sequence, delete ambiguous codes ('N'). T : tron sequence. See ref. 5. @N : read coding parts only, according to the N-th CDS (GenBank only). #N : retain only the columns composed of >N% of non-deletion characters.If there is no attribute character (default), the sequence remains as read. If filename is specified, the remaining terms are reset to the default values, otherwise non-specified terms are unchanged. Of course, filename should not be omitted in the beginning session.
Default Meaning pam 250 PAM level of Dayhoff's mutation data matrix MDM [0-300] step by 50. bias 0 Value to be added to each element of MDM. scale 1 Precision of the MDM matrix ( >= 1 ). v 2 Constant term of the gap-weighting function. u 9 Proportional coefficient of the gap-weighting function for a gap shorter than or equal to k1.Parameters for nucleotide sequences
s[=] 2 Similarity measure between identical nucleotides. s[#] 2 Similarity measure between different nucleotides. v 4 Constant term of the gap-weighting function. u 2 Proportional coefficient of the gap-weighting function.Parameters common to nucleotide and protein sequences
shldr 100 Window size ( >= 0 ), see note 1. series 1 Number of iteration series. The result with the best score among the series is finally reported. omode 0 Message level [0 - 2], see note 2. algor 0 Select the algorithm for group-to-group alignment. If 0, the program chooses conceivably the best one. dmode 2 This mode controls the way multiple sequences are divided into two groups, see note 3. newrn 1 Set the seed of a series of quasi-random numbers. group 2 Mutual alignment between some sequences in the given alignment may be fixed, see note 4.
Level = 0: Fewest messages Level = 1: The course of refinement is monitored.All of these messages are reported to stderr.
Mode = 1: A randomly chosen single sequence is aligned with the remaining (M-1) sequences. The average convergence takes O(M) steps. Mode = 2: Choose only the divisions that bisect an unrooted tree representing the mutual relationship of the sequences in the given alignment. The average convergence takes O(2M) steps. (default)
Mode = 0: Previous grouping is used. Mode = 1: Groups should be manually indicated. Mode = 2: No grouping.
Options1 Default Meaning -bS tree Initial alignment is calculated by a progressive method using S as the guide tree. -ES SE Destination of supplementary messages. -GN 2 Grouping. -HN 20 Threshold value used in finding conserved regions. -IN 10 Maximum number of iterations in the outer loop. -JN 1 N=1: UPGMA method; N=2: NJ method for tree. -ON 1 Set output mode (omode) to N. 1: output alignment 2: output outlier indel information 4: output normalized alignment scores. -RN Seed of the series of random numbers. -SN 1 Number of iteration series. -U Update mode. Members in seq1 are replaced by sequences of the same names in seqn (n>1). Seq1 must be a multiple alignment whose members should have unique names. -ps The order of sequences in the output file is rearranged according to the calculated phylogenetic relationship.
Options2 Default Meaning -AN 0 Use algorithm N = [0-5]. A value of other than 0 is not recommended. -FN 1 Format of output. N=1-5: native; N=6: Phylip; N=7: GCG; N=8: GDE; N=9: Concatenated Fasta. -lN 60 Set lpw (# of residues per line) = N > 8. -mS Amino acid exchange matrix. -oS SO Output resultant alignment to file. -sS ./ Set the default path to sequence files to path. -uN 2 Set gap-extension penalty u = N. -vN 4/9 Set gap-opening penalty v = N. -wN 100 Set shldr = N. -yJN 10 >=0 Bonus given to a matched intron positions. -ypN 250 Set pam = N.(Note) SE and SO mean standard error (stderr) and standard output (stdout), respectively.
Copyright © 1997-2014 Osamu Gotoh