The 'Expected Likelihood Weights' (ELW) scripts.

What are they? 
------------- 
We have developed two PERL scripts -- elw.pl and calcwts.pl -- that, 
together with the PAUP* program (http://paup.csit.fsu.edu) and the PHYLIP
program SEQBOOT (http://evolution.genetics.washington.edu), can be used to
implement the 'expected likelihood weights' method of Strimmer and Rambaut
[Strimmer, K. and Rambaut, A. (2002) Inferring confidence sets of possibly
misspecifiedgene trees. Proc. Royal. Soc. London Ser. B 269:137-142] to
calculate a 'confidence interval' for the MLE tree in maximum likelihood
analysis. A justification for this method can be found in Strimmer and
Rambaut (2002).

The programs -- what you need, how to use them and in what order?
------------------------------------------------------------------
Overview:
-------- 
Step 1 -- Run SEQBOOT to generate a SEQUENTIAL format
          bootstrapped dataset 
Step 2 -- Run elw.pl to make the paupblock 
Step 3 -- Run the paupblock using PAUP* to generate likelihood scores 
          files for test trees 
Step 4 -- Run calcwts.pl to generate weights files, summary
          stats and confidence sets


Info about elw.pl 
-----------------
This program takes a set of bootstrap resampled datasets generated by 
the PHYLIP program SEQBOOT in SEQUENTIAL format, combines it with a 
pre-written paup block (template supplied) and builds it into a huge 
executable paupblock.  This paupblock can be executed by PAUP* and will 
look for a user-specified treefile (in NEXUS format please) and will 
begin to write likelihood 'scores' files for each bootstrapped dataset.

Before starting, make sure you have PERL installed on your computer
(http://www.perl.com) and the files below in your directory:

Input: 
------ 
1 PHYLIP SEQUENTIAL format bootstrapped file (generated by SEQBOOT). 
1 paupblock (a template is provided) 
1 file with all your test trees in it in NEXUS format

To run elw.pl: 
-------------- 
Make sure elw.pl installed either in your working directory or in a 
system 'bin' directory in your path. Then simply type elw.pl at the 
prompt and it should execute.  If that doesn't work, then check the 
permissions on it ('ls -la') and make sure you have
'execute' priveleges (for instance, type: chmod 755 elw.pl).

Output: 
------- 
elw.pl will generate a 'paupblock' file called: 'filename.nexus'

Next: 
---- 
Execute PAUP* as you normally would. At the paup prompt type
'execute filename.nexus'.  Since the processing of the 'filename.nexus'
will normally take quite a while, you probably want to run your paup job
in the background on a UNIX system.

Output: 
------- 
For each bootstrapped dataset a 'scores' file will be
created called 'bootstrapnumber.filename.scores'. Typically you will
have done 100-1000 bootstrap replicates so the numbers will go from
1.filename.scores to 1000.filename.sscores.

When the above job is complete you will want to run calcwts.pl.

Info about calcwts.pl: 
----------------------
This program takes sets of likelihood scores files for the test trees 
(generated as above), creates a set of likelihood weights from each set 
of scores and, by averaging over all the scores files, calculates the 
'expected likelihood weights'.

To run calcwts.pl: 
------------------ 
Make sure calcwts.pl is in your directory or in your path.  Then 
type 'calcwts.pl'.  It will ask for the rootname of your file 
-- e.g. the rootname of '1.filename.scores' is 'filename.scores'.

Output: 
------- 
Calcwts.pl generates a whole whack of files.  For each scores
file, it will generate a #.weights file (e.g. 1.weights, 2.weights,
3.weights etc. etc.). Then it generates 2 summary files:
filename.scores.elw and filename.scores.summary.  The
filename.scores.summary file has the trees ordered in decreasing
expected likelihood weight and gives the cumulative probability
(cumulative 'weights') of the trees from highest to lowest.  It is this
latter column that tells you the 'confidence set'.  If the cumulative
probability of a tree in this column is < 0.95, then its in the 95%
confidence set/interval.  You will be forgiven for mistaking these for
Bayesian credible sets, since there is a Bayesian flavour associated
with this procedure.  One important difference is that these are
posterior probabilities AVERAGED over bootstrap replicates - not your
typical Bayesian style of analysis.


Hints: 
------ 
Test trees.  As with Shimodaira-Hasegawa tests, you can
get different results regarding whether a given tree is IN or OUT of the
confidence set depending on what trees are included as 'test' trees. 
What is most important here is trying to include a very large sample of
'good' trees, as these are the trees that will accrue the most weight in
the bootstrap analysis. Strimmer and Rambaut (2002) considered the cases
of 'all' trees and a handful of 'good' trees and got similar results.
However, with large numbers of taxa, there are likely lots of very good
trees. It won't be easy to capture a large sample of good trees.  What
we have done [Silberman, J.D. et al. (2002) Retortamonad flagellates are
closely related to diplomonads - Implications for the history of
mitochondrial function in eukaryote evolution. Mol. Biol. Evol. (in
press)] is to use sets of unique trees derived from bootstrap analysis,
plus the test trees, plus the MLE tree.

Some warnings: 
--------------
1) These programs -- especially elw.pl -- were written quickly to implement 
the 'ELW' method and with no regard for memory limitations. You may find 
that this program is a memory hog and will not complete jobs due to memory 
limitations.  If you have lots of memory on your system but you are being 
limited by system-level memory limits for jobs, you may need to type 
'unlimit' followed by a carriage return before you run elw.pl. If you do 
this and your system crashes, then you probably shouldn't have done it. 
Therefore, make sure you have at least 120-300Mb of free memory before 
trying it. Also, have a look at the UNIX man page for the command 'unlimit'.

2) I have not implemented the expected likelihood
weights analysis for amino acid sequences, despite the first questions 
asked of you by elw.pl. As it stands, elw.pl can convert any dna, rna or 
protein SEQUENTIAL format SEQBOOT output file for any analysis you can 
dream up with PAUP*. Thus it stands alone as the only program 
(albeit poorly coded) to be able to do this.  But likelihood analyses
of protein sequences WILL NOT run with the current version of PAUP*.

3) These are amateurish programs...but then again...think of who wrote
them (AJR) and how much you paid for them. You got what you paid for! If
you feel you got more than you paid for, please send money to me. But
seriously - if you have bug reports, please send them to Andrew J. Roger
(aroger@is.dal.ca).