The 'Expected Likelihood Weights' (ELW) scripts. What are they? ------------- We have developed two PERL scripts -- elw.pl and calcwts.pl -- that, together with the PAUP* program (http://paup.csit.fsu.edu) and the PHYLIP program SEQBOOT (http://evolution.genetics.washington.edu), can be used to implement the 'expected likelihood weights' method of Strimmer and Rambaut [Strimmer, K. and Rambaut, A. (2002) Inferring confidence sets of possibly misspecifiedgene trees. Proc. Royal. Soc. London Ser. B 269:137-142] to calculate a 'confidence interval' for the MLE tree in maximum likelihood analysis. A justification for this method can be found in Strimmer and Rambaut (2002). The programs -- what you need, how to use them and in what order? ------------------------------------------------------------------ Overview: -------- Step 1 -- Run SEQBOOT to generate a SEQUENTIAL format bootstrapped dataset Step 2 -- Run elw.pl to make the paupblock Step 3 -- Run the paupblock using PAUP* to generate likelihood scores files for test trees Step 4 -- Run calcwts.pl to generate weights files, summary stats and confidence sets Info about elw.pl ----------------- This program takes a set of bootstrap resampled datasets generated by the PHYLIP program SEQBOOT in SEQUENTIAL format, combines it with a pre-written paup block (template supplied) and builds it into a huge executable paupblock. This paupblock can be executed by PAUP* and will look for a user-specified treefile (in NEXUS format please) and will begin to write likelihood 'scores' files for each bootstrapped dataset. Before starting, make sure you have PERL installed on your computer (http://www.perl.com) and the files below in your directory: Input: ------ 1 PHYLIP SEQUENTIAL format bootstrapped file (generated by SEQBOOT). 1 paupblock (a template is provided) 1 file with all your test trees in it in NEXUS format To run elw.pl: -------------- Make sure elw.pl installed either in your working directory or in a system 'bin' directory in your path. Then simply type elw.pl at the prompt and it should execute. If that doesn't work, then check the permissions on it ('ls -la') and make sure you have 'execute' priveleges (for instance, type: chmod 755 elw.pl). Output: ------- elw.pl will generate a 'paupblock' file called: 'filename.nexus' Next: ---- Execute PAUP* as you normally would. At the paup prompt type 'execute filename.nexus'. Since the processing of the 'filename.nexus' will normally take quite a while, you probably want to run your paup job in the background on a UNIX system. Output: ------- For each bootstrapped dataset a 'scores' file will be created called 'bootstrapnumber.filename.scores'. Typically you will have done 100-1000 bootstrap replicates so the numbers will go from 1.filename.scores to 1000.filename.sscores. When the above job is complete you will want to run calcwts.pl. Info about calcwts.pl: ---------------------- This program takes sets of likelihood scores files for the test trees (generated as above), creates a set of likelihood weights from each set of scores and, by averaging over all the scores files, calculates the 'expected likelihood weights'. To run calcwts.pl: ------------------ Make sure calcwts.pl is in your directory or in your path. Then type 'calcwts.pl'. It will ask for the rootname of your file -- e.g. the rootname of '1.filename.scores' is 'filename.scores'. Output: ------- Calcwts.pl generates a whole whack of files. For each scores file, it will generate a #.weights file (e.g. 1.weights, 2.weights, 3.weights etc. etc.). Then it generates 2 summary files: filename.scores.elw and filename.scores.summary. The filename.scores.summary file has the trees ordered in decreasing expected likelihood weight and gives the cumulative probability (cumulative 'weights') of the trees from highest to lowest. It is this latter column that tells you the 'confidence set'. If the cumulative probability of a tree in this column is < 0.95, then its in the 95% confidence set/interval. You will be forgiven for mistaking these for Bayesian credible sets, since there is a Bayesian flavour associated with this procedure. One important difference is that these are posterior probabilities AVERAGED over bootstrap replicates - not your typical Bayesian style of analysis. Hints: ------ Test trees. As with Shimodaira-Hasegawa tests, you can get different results regarding whether a given tree is IN or OUT of the confidence set depending on what trees are included as 'test' trees. What is most important here is trying to include a very large sample of 'good' trees, as these are the trees that will accrue the most weight in the bootstrap analysis. Strimmer and Rambaut (2002) considered the cases of 'all' trees and a handful of 'good' trees and got similar results. However, with large numbers of taxa, there are likely lots of very good trees. It won't be easy to capture a large sample of good trees. What we have done [Silberman, J.D. et al. (2002) Retortamonad flagellates are closely related to diplomonads - Implications for the history of mitochondrial function in eukaryote evolution. Mol. Biol. Evol. (in press)] is to use sets of unique trees derived from bootstrap analysis, plus the test trees, plus the MLE tree. Some warnings: -------------- 1) These programs -- especially elw.pl -- were written quickly to implement the 'ELW' method and with no regard for memory limitations. You may find that this program is a memory hog and will not complete jobs due to memory limitations. If you have lots of memory on your system but you are being limited by system-level memory limits for jobs, you may need to type 'unlimit' followed by a carriage return before you run elw.pl. If you do this and your system crashes, then you probably shouldn't have done it. Therefore, make sure you have at least 120-300Mb of free memory before trying it. Also, have a look at the UNIX man page for the command 'unlimit'. 2) I have not implemented the expected likelihood weights analysis for amino acid sequences, despite the first questions asked of you by elw.pl. As it stands, elw.pl can convert any dna, rna or protein SEQUENTIAL format SEQBOOT output file for any analysis you can dream up with PAUP*. Thus it stands alone as the only program (albeit poorly coded) to be able to do this. But likelihood analyses of protein sequences WILL NOT run with the current version of PAUP*. 3) These are amateurish programs...but then again...think of who wrote them (AJR) and how much you paid for them. You got what you paid for! If you feel you got more than you paid for, please send money to me. But seriously - if you have bug reports, please send them to Andrew J. Roger (aroger@is.dal.ca).