Skip to content

pairalign

mr-y edited this page Nov 2, 2016 · 3 revisions

Pairalign

Pairalign will perform pairwise alignment of DNA sequences given in fasta format through standard in.

This manual is based on the pairalign help function (-h) but has been successively changed.

Usage

pairalign [arguments] < inputfile.fasta
pairalign [arguments] inputfile.fasta

Arguments

--aligned / -A

Input file is already aligned.

--alignments / -a

Output aligned sequences pairwise.

--difference / -i

Output difference between the Jukes-Cantor (JC) distance and proportion different sites.

--distances / -d

Output proportion different sites, JC distance, and difference between the two.

--format

Set the format of the input to fasta or fasta with sequences pairwise (as output given the -a -n option). If sequences are aligned give the -A switch.

--group / -g

This option will cluster sequences that are similar and/or find the most inclusive taxa in a hierarchy that are alignable according to MAD (Smith et al. 2009, BMC evol. Biol. 9:37). It need the taxonomy given after a (the first) | in the sequence name or in a separate file. Each taxa in the hierarchy should be separated by a semicolon, with the highest rank first and then increasingly nested levels until the lowest known level for the sequence. The groups that can be aligned are put in a file with the ending .alignment_groups and printed to the screen preceded by #. Clusters are printed to the screen after a heading, preceded by ###. To get alignable groups give 'alignment_groups' as extra argument, to cluster give 'cluster', and to do both give 'both'. Cut off value for pairwise similarity can be given after colon (:) by cut-off= followed value, e.g.:

pairalign -g both:cut-off=0.97

A file with taxonomy can be given with taxonomy=. The taxonomy file should have the taxonomy (as above) first on each row followed by a |, and the sequence name with that taxonomy as a comma (,) and/or space ( ) separated string. The same taxon can be repeated several times.

--help / -h

Print this help.

--jc_distance / -j

Output Jukes-Cantor (JC) distance.

--matrix / -m

Output in the form of a space separated left-upper triangular matrix.

--names / -n

Output sequence names (if outputting alignments then in fasta format).

--proportion_difference / -p

Output proportion sites that are different.

--similarity / -s

Output similarity between sequences (1-proportion different).

--threads / -T

This option is only valid if you have compiled pairalign with PTHREADS=YES (see installation. Set the number of threads additional to the controlling thread, e.g.:

pairalign -T 4

Default 1.

--verbose / -v

Get additional output.

Clone this wiki locally