Welcome to Concaterpillar! o o __ __ __ __ __ _\/ / \/ \/ \/ \/ \/ .\ \__/\__/\__/\__/\__/\_~/ "" "" "" "" Thank you for downloading Concaterpillar, a hierachical likelihood-ratio test for congruence in multi-locus phylogenies. Concaterpillar takes, as input, a directory of amino acid alignments in a variety of formats (including NEXUS, PHYLIP, FASTA, and many others). Alignments should have the extension ".seq" so that Concaterpillar can find them. It then assesses topological congruence (which proteins share the same history?) and/or branch-length congruence (which proteins can be concatenated, and which should have branch lengths optimised separately?). For details, please see our Systematic Biology publication (1). The current incarnation of Concaterpillar uses Dr. Alexandros Stamatakis' RAxML (2) for all phylogeny-related calculations, including tree inference and likelihood calculation/branch length optimisation. For "historical" reasons, some of the long command line flags refer to TREE-PUZZLE and PHYML, but these programs are no longer used. INSTALLATION Since Concaterpillar relies on RAxML for phylogenetic inference, etc., RAxML must be installed on your system. At the time this manual was written, Concaterpillar had been tested with only the latest version of RAxML-VI-HPC (version 2.2.3). RAxML can be downloaded from: http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm If you use Concaterpillar, please cite RAxML (2) as well! Concaterpillar is written in Python, and as such, a relatively recent version of Python must be installed. The current version has really only been tested extensively with Python 2.4, although we suspect that it should work fine with versions 2.2 and 2.3 as well. In addition, Concaterpillar relies on the SciPy library (http://www.scipy.org) and, if you wish to use the MPI version (see below), on the PyMPI library (http://pympi.sourceforge.net). Prior to installation, the ccpinstall.py script should be run to either find or verify the path to RAxML. Usage: python ccpinstall.py [options] Options: -h print this message -r path to raxml It is not necessary to specify a path to RAxML, the script will prompt the user for the path if it is unable to locate RAxML on its own. If the path to RAxML is specified, it will be verified. Following execution of the ccpinstall.py script, Concaterpillar (including all Python scripts in the tarfile) can be copied to the working directory, or somewhere else in the path (if you keep Concaterpillar in some non-path directory, you'll just need to use the full path of Concaterpillar when you run it). RUNNING CONCATERPILLAR Usage: python concaterpillar.py [options] Options: -h, --help print this message -a, --phyml begin building trees with RAxML -r, --puzzle begin reanalysis of trees with RAxML -t, --topology begin topological congruence test -i, --interrupt continue interrupted topological congruence test -b, --blength begin branch length compatibility assessment -c ncpu, --cpus=ncpu use specified number of processors (ncpu must be integer greater than 0, default = 1) -p cutoff, --pval=cutoff use specified p-value as a cutoff for tests (must be a number between 0 and 1, default = 0.05) -m model, --model=model The substitution model to be used by RAxML (Valid models are: 'DAYHOFF', 'DCMUT', 'JTT', 'MTREV', 'WAG', 'RTREV', 'CPREV', 'VT', 'BLOSUM62', 'MTMAM', 'GTR', default = WAG) -s nrounds --save=nrounds save a backup once it finishes the first run of RAxML, and then every nrounds round of toptest. the files will be saved in the file savepoint.tgz, which will be overwritten each time. (must be integer greater than 0) Notes: At least one option of -a, -r, -t, -i, and -b must be specified. More than one may be used. Note also that if you run with one of these switches at a time, -a must be run before -r, and -r before -t. -i should only be used if trees have already been built (-a) and evaluated with (-r), and the topology test (-t) has begun, but has been interrupted prior to completion for some reason. In theory, -b can be used independently of all other options, in which case branch length compatibility of all alignments will be evaluated. Otherwise, if congruence test has been completed, the branch length compatibility test will be performed only on the alignments in the largest cluster. Concaterpillar can be run either on a single CPU or multiple CPUs (highly recommended for data sets larger than a handful of genes!). If you're running it with only one CPU, the "Usage" instructions above should be followed, specifying "-c 1". Concaterpillar can be run over multiple CPUs in two different ways. If installed on a cluster with the message passing interface (MPI) set up and PyMPI (http://pympi.sourceforge.net/) installed, it can be run with MPI. To run with MPI, the command will vary, depending on whether or not a queueing system is installed, as well as which queueing system. With the simplest possible setup, Concaterpillar can be run with pyMPI as: mpirun -np [number_of_processors] [path/to/pyMPI] concaterpillar.py [options] Depending on the version of MPI installed, you may need to replace "mpirun" with "mpiexec". Also, if you have trouble, please make sure SciPy and PyMPI are installed such that the pympi interpreter knows where to look for SciPy! Note also that Concaterpillar will ignore the -c flag if it is run with PyMPI. If you don't have MPI installed, but have access to a shared memory machine (e.g. a dual or quad core computer with a single motherboard), you can run Concaterpillar over multiple processors by following the "Usage" instructions above, specifying the number of processors to use with the "-c" flag. Be careful with this: Concaterpillar isn't smart enough to warn you if you try to use more CPUs than are available! REFERENCES: 1. Leigh JW, Susko E, Baumgartner M, Roger AJ. Assessing congruence in phylogenomic data. Syst Biol. 2007. [Accepted] 2. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006 Nov 1;22(21): 2688-90.