5 Command Reference for MrBayes ver. 3.2.6
\r
7 (c) John P. Huelsenbeck, Fredrik Ronquist
\r
11 ***************************************************************************
\r
13 * 1. Command summary *
\r
15 ***************************************************************************
\r
17 ---------------------------------------------------------------------------
\r
18 Commands that are available from the command
\r
19 line or from a MrBayes block include:
\r
21 About -- Describes the program
\r
22 Acknowledgments -- Shows program acknowledgments
\r
23 Calibrate -- Assigns dates to terminals or interior nodes
\r
24 Charset -- Assigns a group of sites to a set
\r
25 Charstat -- Shows status of characters
\r
26 Citations -- Citation of program, models, and methods
\r
27 Comparetree -- Compares the trees from two tree files
\r
28 Constraint -- Defines a constraint on tree topology
\r
29 Ctype -- Assigns ordering for the characters
\r
30 Databreaks -- Defines data breaks for autodiscrete gamma model
\r
31 Delete -- Deletes taxa from the analysis
\r
32 Disclaimer -- Describes program disclaimer
\r
33 Exclude -- Excludes sites from the analysis
\r
34 Execute -- Executes a file
\r
35 Help -- Provides detailed description of commands
\r
36 Include -- Includes sites
\r
37 Link -- Links parameters across character partitions
\r
38 Log -- Logs screen output to a file
\r
39 Lset -- Sets the parameters of the likelihood model
\r
40 Manual -- Prints a command reference to a text file
\r
41 Mcmc -- Starts Markov chain Monte Carlo analysis
\r
42 Mcmcp -- Sets parameters of a chain (without starting analysis)
\r
43 Outgroup -- Changes outgroup taxon
\r
44 Pairs -- Defines nucleotide pairs (doublets) for stem models
\r
45 Partition -- Assigns a character partition
\r
46 Plot -- Plots parameters from MCMC analysis
\r
47 Prset -- Sets the priors for the parameters
\r
48 Propset -- Sets proposal probabilities and tuning parameters
\r
49 Quit -- Quits the program
\r
50 Report -- Controls how model parameters are reported
\r
51 Restore -- Restores taxa
\r
52 Set -- Sets run conditions and defines active data partition
\r
53 Showbeagle -- Show available BEAGLE resources
\r
54 Showmatrix -- Shows current character matrix
\r
55 Showmcmctrees -- Shows trees used in mcmc analysis
\r
56 Showmodel -- Shows model settings
\r
57 Showmoves -- Shows moves for current model
\r
58 Showparams -- Shows parameters in current model
\r
59 Showusertrees -- Shows user-defined trees
\r
60 Speciespartition -- Defines a partition of tips into species
\r
61 Ss -- Starts stepping-stone sampling
\r
62 Ssp -- Sets parameters of stepping-stone analysis (without starting)
\r
63 Startvals -- Sets starting values of parameters
\r
64 Sump -- Summarizes parameters from MCMC analysis
\r
65 Sumss -- Summarizes parameters from stepping-stone analysis
\r
66 Sumt -- Summarizes trees from MCMC analysis
\r
67 Taxastat -- Shows status of taxa
\r
68 Taxset -- Assigns a group of taxa to a set
\r
69 Unlink -- Unlinks parameters across character partitions
\r
70 Version -- Shows program version
\r
72 Commands that should be in a NEXUS file (data
\r
73 block, trees block or taxa block) include:
\r
75 Begin -- Denotes beginning of block in file
\r
76 Dimensions -- Defines size of character matrix
\r
77 End -- Denotes end of a block in file
\r
78 Endblock -- Alternative way of denoting end of a block
\r
79 Format -- Defines character format in data block
\r
80 Matrix -- Defines matrix of characters in data block
\r
81 Taxlabels -- Defines taxon labels
\r
82 Translate -- Defines alternative names for taxa
\r
83 Tree -- Defines a tree
\r
85 Note that this program supports the use of the shortest unambiguous
\r
86 spelling of the above commands (e.g., "exe" instead of "execute").
\r
87 ---------------------------------------------------------------------------
\r
89 ***************************************************************************
\r
91 * 2. MrBayes commands *
\r
93 ***************************************************************************
\r
95 ---------------------------------------------------------------------------
\r
98 This command provides some general information about the program.
\r
99 ---------------------------------------------------------------------------
\r
100 ---------------------------------------------------------------------------
\r
103 This command shows the authors' acknowledgments.
\r
104 ---------------------------------------------------------------------------
\r
105 ---------------------------------------------------------------------------
\r
108 This command dates a terminal or interior node in the tree. The format is
\r
110 calibrate <node_name> = <age_prior>
\r
112 where <node_name> is the name of a defined interior constraint node or the
\r
113 name of a terminal node (tip) and <age_prior> is a prior probability distribu-
\r
114 tion on the age of the node. The latter can either be a fixed date or a date
\r
115 drawn from one of the available prior probability distributions. In general,
\r
116 the available prior probability distributions are parameterized in terms of
\r
117 the expected mean age of the distribution to facilitate for users. Some dis-
\r
118 tributions put a positive probability on all ages above 0.0, while others in-
\r
119 clude a minimum-age constraint and sometimes a maximum-age constraint. The
\r
120 available distributions and their parameters are:
\r
122 calibrate <node_name> = fixed(<age>)
\r
123 calibrate <node_name> = uniform(<min_age>,<max_age>)
\r
124 calibrate <node_name> = offsetexponential(<min_age>,<mean_age>)
\r
125 calibrate <node_name> = truncatednormal(<min_age>,<mean_age>,<stdev>)
\r
126 calibrate <node_name> = lognormal(<mean_age>,<stdev>)
\r
127 calibrate <node_name> = offsetlognormal(<min_age>,<mean_age>,<stdev>)
\r
128 calibrate <node_name> = gamma(<mean_age>,<stdev>)
\r
129 calibrate <node_name> = offsetgamma(<min_age>,<mean_age>,<stdev>)
\r
131 Note that mean_age is always the mean age and stdev the standard deviation of
\r
132 the distribution measured in user-defined time units. This way of specifying
\r
133 the distribution parameters is often different from the parameterization used
\r
134 elsewhere in the program. For instance, the standard parameters of the gamma
\r
135 distribution used by MrBayes are shape (alpha) and rate (beta). If you want
\r
136 to use the standard parameterization, the conversions are as follows:
\r
138 exponential distributon: mean = 1 / rate
\r
139 gamma distributon: mean = alpha / beta
\r
140 st.dev. = square_root (alpha / beta^2)
\r
141 lognormal distributon: mean = exp (mean_log + st.dev._log^2/2)
\r
142 st.dev. = square_root ((exp (st.dev._log^2) - 1)
\r
143 * (exp (2*mean_log + st.dev._log^2))
\r
145 The truncated normal distribution is an exception in that the mean_age and
\r
146 stdev parameters are the mean and standard deviation of the underlying non-
\r
147 truncated normal distribution. The truncation will cause the modified distri-
\r
148 bution to have a higher mean and lower standard deviation. The magnitude of
\r
149 that effect depends on how much of the tail of the distribution is removed.
\r
151 Note that previous to version 3.2.2, MrBayes used the standard rate parameter-
\r
152 ization of the offset exponential. This should not cause a problem in most
\r
153 cases because the old parameterization will result in an error in more recent
\r
154 versions of MrBayes, and the likely source of the error is given in the error
\r
157 For a practical example, assume that we had three fossil terminals named
\r
158 'FossilA', 'FossilB', and 'FossilC'. Assume further that we want to fix the
\r
159 age of FossilA to 100.0 million years, we think that FossilB is somewhere
\r
160 between 100.0 and 200.0 million years old, and that FossilC is at least 300.0
\r
161 million years old, possibly older but relatively unlikely to be more than
\r
162 400.0 million years old. Then we might use the commands:
\r
164 calibrate FossilA = fixed(100) FossilB = uniform(100,200)
\r
165 calibrate FossilC = offsetexponential(300,400)
\r
167 Note that it is possible to give more than one calibration for each
\r
168 'calibrate' statement. Thus, 'calibrate FossilA=<setting> FossilB=<setting>'
\r
169 would be a valid statement.
\r
171 To actually use the calibrations to obtain dated trees, you also need to set
\r
172 a clock model using relevant 'brlenspr' and 'nodeagepr' options of the 'prset'
\r
173 command. You may also want to examine the 'clockvarpr' and 'clockratepr' op-
\r
174 tions. Furthermore, you need to activate the relevant constraint(s) using
\r
175 'topologypr', if you use any dated interior nodes in the tree.
\r
177 You may wish to remove a calibration from an interior or terminal node, which
\r
178 has previously been calibrated. You can do that using
\r
180 calibrate <node_name> = unconstrained
\r
183 ---------------------------------------------------------------------------
\r
184 ---------------------------------------------------------------------------
\r
187 This command defines a character set. The format for the charset command
\r
190 charset <name> = <character numbers>
\r
192 For example, "charset first_pos = 1-720\3" defines a character set
\r
193 called "first_pos" that includes every third site from 1 to 720.
\r
194 The character set name cannot have any spaces in it. The slash (\)
\r
195 is a nifty way of telling the program to assign every third (or
\r
196 second, or fifth, or whatever) character to the character set.
\r
197 This option is best used not from the command line, but rather as a
\r
198 line in the mrbayes block of a file. Note that you can use "." to
\r
199 stand in for the last character (e.g., charset 1-.\3).
\r
200 ---------------------------------------------------------------------------
\r
201 ---------------------------------------------------------------------------
\r
204 This command shows the status of all the characters. The correct usage
\r
209 After typing "charstat", the character number, whether it is excluded
\r
210 or included, and the partition identity are shown. The output is paused
\r
211 every 100 characters. This pause can be turned off by setting autoclose
\r
212 to "yes" (set autoclose=yes).
\r
213 ---------------------------------------------------------------------------
\r
214 ---------------------------------------------------------------------------
\r
217 This command shows a thorough list of citations you may consider using
\r
218 when publishing the results of a MrBayes analysis.
\r
219 ---------------------------------------------------------------------------
\r
220 ---------------------------------------------------------------------------
\r
223 This command compares the trees in two files, called "filename1" and
\r
224 "filename2". It will output a bivariate plot of the split frequencies
\r
225 as well as plots of the tree distance as a function of the generation. The
\r
226 plots can be used to get a quick indication of whether two runs have con-
\r
227 verged onto the same set of trees. The "Comparetree" command will also
\r
228 produce a ".pairs" file and a ".dists" file (these file endings are added
\r
229 to the end of the "Outputname"). The ".pairs" file contains the paired
\r
230 split frequencies from the two tree samples; the ".dists" file contains the
\r
231 tree distance values.
\r
233 Note that the "Sumt" command provides a different set of convergence diag-
\r
234 nostics tools that you may also want to explore. Unlike "Comparetree",
\r
235 "Sumt" can compare more than two tree samples and will calculate consensus
\r
236 trees and split frequencies from the pooled samples.
\r
240 Relburnin -- If this option is set to 'Yes', then a proportion of the
\r
241 samples will be discarded as burnin when calculating summary
\r
242 statistics. The proportion to be discarded is set with
\r
243 Burninfrac (see below). When the Relburnin option is set to
\r
244 'No', then a specific number of samples is discarded instead.
\r
245 This number is set by Burnin (see below). Note that the
\r
246 burnin setting is shared with the 'mcmc', 'sumt', 'sump' and
\r
248 Burnin -- Determines the number of samples (not generations) that will
\r
249 be discarded when summary statistics are calculated. The
\r
250 value of this option is only relevant when Relburnin is set
\r
252 BurninFrac -- Determines the fraction of samples that will be discarded
\r
253 when summary statistics are calculated. The value of this
\r
254 option is only relevant when Relburnin is set to 'Yes'.
\r
255 Example: A value for this option of 0.25 means that 25% of
\r
256 the samples will be discarded.
\r
257 Minpartfreq -- The minimum probability of partitions to include in summary
\r
259 Filename1 -- The name of the first tree file to compare.
\r
260 Filename2 -- The name of the second tree file to compare.
\r
261 Outputname -- Name of the file to which 'comparetree' results will be
\r
266 Parameter Options Current Setting
\r
267 --------------------------------------------------------
\r
268 Relburnin Yes/No Yes
\r
270 Burninfrac <number> 0.25
\r
271 Minpartfreq <number> 0.00
\r
272 Filename1 <name> temp.t
\r
273 Filename2 <name> temp.t
\r
274 Outputname <name> temp.comp
\r
276 ---------------------------------------------------------------------------
\r
277 ---------------------------------------------------------------------------
\r
280 This command defines a tree constraint. The format for the constraint
\r
283 constraint <name> [hard|negative|partial] = <taxon list> [:<taxon list>]
\r
285 There are three types of constraint implemented in MrBayes. The type of the
\r
286 constraint is specified by using one of the three keywords 'hard', 'negative',
\r
287 or 'partial' right after the name of the constraint. If no type is specified,
\r
288 then the constraint is assumed to be 'hard'.
\r
290 In a rooted tree, a 'hard' constraint forces the taxa in the list to form a
\r
291 monophyletic group. In an unrooted tree, the taxon split that separates the
\r
292 taxa in the list from other taxa is forced to be present. The interpretation
\r
293 of this depends on whether the tree is rooted on a taxon outside the list or
\r
294 a taxon in the list. If the outgroup is excluded , the taxa in the list are
\r
295 assumed to form a monophyletic group, but if the outgroup is included, the
\r
296 taxa that are not in the list are forced together.
\r
298 A 'negative' constraint bans all the trees that have the listed taxa in the
\r
299 same subtree. In other words, it is the opposite of a hard constraint.
\r
301 A 'partial' or backbone constraint is defined in terms of two sets of taxa
\r
302 separated by a colon character. The constraint forces all taxa in the first
\r
303 list to form a monophyletic group that does not include any taxon in the
\r
304 second list. Taxa that are not included in either list can be placed in any
\r
305 position on the tree, either inside or outside the constrained group. In an
\r
306 unrooted tree, the two taxon lists can be switched with each other with no
\r
307 effect. For a rooted tree, it is the taxa in the first list that have to be
\r
308 monophyletic, that is, these taxa must share a common ancestor not shared with
\r
309 any taxon in the second list. The taxa in the second list may or may not fall
\r
310 in a monophyletic group depending on the rooting of the tree.
\r
312 A list of taxa can be specified using a taxset, taxon names, taxon numbers, or
\r
313 any combination of the above, sepatated by spaces. The constraint is treated
\r
314 as an absolute requirement of trees, that is, trees that are not compatible
\r
315 with the constraint have zero prior (and hence zero posterior) probabilty.
\r
317 If you are interested in inferring ancestral states for a particular node,
\r
318 you need to 'hard' constrain that node first using the 'constraint' command.
\r
319 The same applies if you wish to calibrate an interior node in a dated
\r
320 analysis. For more information on how to infer ancestral states, see the help
\r
321 for the 'report' command. For more on dating, see the 'calibrate' command.
\r
323 It is important to note that simply defining a constraint using this
\r
324 command is not sufficient for the program to actually implement the
\r
325 constraint in an analysis. You must also enforce the constraints using
\r
326 'prset topologypr = constraints (<list of constraints>)'. For more infor-
\r
327 mation on this, see the help on the 'prset' command.
\r
331 constraint myclade = Homo Pan Gorilla
\r
333 Defines a hard constraint forcing Homo, Pan, and Gorilla to form a mono-
\r
334 phyletic group or a split that does not include any other taxa.
\r
336 constraint forbiddenclade negative = Homo Pan Gorilla
\r
338 Defines a negative constraint that associates all trees where Homon, Pan, and
\r
339 Gorilla form a monophyletic group with zero posterior probability. In other
\r
340 words, such trees will not be sampled during MCMC.
\r
342 constraint backbone partial = Homo Gorilla : Mus
\r
344 Defines a partial constraint that keeps Mus outside of the clade defined by
\r
345 the most recent common ancestor of Homo and Gorilla. Other taxa are allowed to
\r
346 sit anywhere in the tree. Note that this particular constraint is meaningless
\r
347 in unrooted trees. MrBayes does not assume anything about the position of the
\r
348 outgroup unless it is explicitly included in the partial constraint. Therefore
\r
349 a partial constraint must have at least two taxa on each side of the ':' to be
\r
350 useful in analyses of unrooted trees. The case is different for rooted trees,
\r
351 where it is sufficient for a partial constraint to have more than one taxon
\r
352 before the ':', as in the example given above, to constrain tree space.
\r
354 To define a more complex constraint tree, simply combine constraints into a
\r
355 list when issuing the 'prset topologypr' command.
\r
358 --------------------------------------------------------------------------
\r
359 ---------------------------------------------------------------------------
\r
362 This command sets the character ordering for standard-type data. The
\r
365 ctype <ordering>:<characters>
\r
367 The available options for the <ordering> specifier are:
\r
369 unordered -- Movement directly from one state to another is
\r
370 allowed in an instant of time.
\r
371 ordered -- Movement is only allowed between adjacent characters.
\r
372 For example, perhaps only between 0 <-> 1 and 1 <-> 2
\r
373 for a three state character ordered as 0 - 1 - 2.
\r
374 irreversible -- Rates of change for losses are 0.
\r
376 The characters to which the ordering is applied is specified in manner
\r
377 that is identical to commands such as "include" or "exclude". For
\r
380 ctype ordered: 10 23 45
\r
382 defines charactes 10, 23, and 45 to be of type ordered. Similarly,
\r
384 ctype irreversible: 54 - 67 71-92
\r
386 defines characters 54 to 67 and characters 71 to 92 to be of type
\r
387 irreversible. You can use the "." to denote the last character, and
\r
388 "all" to denote all of the characters. Finally, you can use the
\r
389 specifier "\" to apply the ordering to every n-th character or
\r
390 you can use predefined charsets to specify the character.
\r
392 Only one ordering can be used on any specific application of ctype.
\r
393 If you want to apply different orderings to different characters, then
\r
394 you need to use ctype multiple times. For example,
\r
396 ctype ordered: 1-50
\r
397 ctype irreversible: 51-100
\r
399 sets characters 1 to 50 to be ordered and characters 51 to 100 to be
\r
402 The ctype command is only sensible with morphological (here called
\r
403 "standard") characters. The program ignores attempts to apply char-
\r
404 acter orderings to other types of characters, such as DNA characters.
\r
405 ---------------------------------------------------------------------------
\r
406 ---------------------------------------------------------------------------
\r
409 This command is used to specify breaks in your input data matrix. Your
\r
410 data may be a mixture of genes or a mixture of different types of data.
\r
411 Some of the models implemented by MrBayes account for nonindependence at
\r
412 adjacent characters. The autocorrelated gamma model, for example, allows
\r
413 rates at adjacent sites to be correlated. However, there is no way for
\r
414 such a model to tell whether two sites, adjacent in the matrix, are
\r
415 actually separated by many kilobases or megabases in the genome. The
\r
416 databreaks command allows you to specify such breaks. The correct
\r
419 databreaks <break 1> <break 2> <break 3> ...
\r
421 For example, say you have a data matrix of 3204 characters that include
\r
422 nucleotide data from three genes. The first gene covers characters 1 to
\r
423 970, the second gene covers characters 971 to 2567, and the third gene
\r
424 covers characters 2568 to 3204. Also, let's assume that the genes are
\r
425 not directly adjacent to one another in the genome, as might be likely
\r
426 if you have mitochondrial sequences. In this case, you can specify
\r
427 breaks between the genes using:
\r
429 databreaks 970 2567;
\r
431 The first break, between genes one and two, is after character 970 and
\r
432 the second break, between genes two and three, is after character 2567.
\r
433 ---------------------------------------------------------------------------
\r
434 ---------------------------------------------------------------------------
\r
437 This command deletes taxa from the analysis. The correct usage is:
\r
439 delete <name and/or number and/or taxset> ...
\r
441 A list of the taxon names or taxon numbers (labelled 1 to ntax in the order
\r
442 in the matrix) or taxset(s) can be used. For example, the following:
\r
444 delete 1 2 Homo_sapiens
\r
446 deletes taxa 1, 2, and the taxon labelled Homo_sapiens from the analysis.
\r
447 You can also use "all" to delete all of the taxa. For example,
\r
451 deletes all of the taxa from the analysis. Of course, a phylogenetic anal-
\r
452 ysis that does not include any taxa is fairly uninteresting.
\r
453 ---------------------------------------------------------------------------
\r
454 ---------------------------------------------------------------------------
\r
457 This command shows the disclaimer for the program. In short, the disclaimer
\r
458 states that the authors are not responsible for any silly things you may do
\r
459 to your computer or any unforseen but possibly nasty things the computer
\r
460 program may inadvertently do to you.
\r
461 ---------------------------------------------------------------------------
\r
462 ---------------------------------------------------------------------------
\r
465 This command excludes characters from the analysis. The correct usage is
\r
467 exclude <number> <number> <number>
\r
471 exclude <number> - <number>
\r
477 or some combination thereof. Moreover, you can use the specifier "\" to
\r
478 exclude every nth character. For example, the following
\r
482 would exclude every third character. As a specific example,
\r
484 exclude 2 3 10-14 22
\r
486 excludes sites 2, 3, 10, 11, 12, 13, 14, and 22 from the analysis. Also,
\r
490 excludes all of the characters from the analysis. Excluding all characters
\r
491 does not leave you much information for inferring phylogeny.
\r
492 ---------------------------------------------------------------------------
\r
493 ---------------------------------------------------------------------------
\r
496 This command executes a file called <file name>. The correct usage is:
\r
498 execute <file name>
\r
502 execute replicase.nex
\r
504 would execute the file named "replicase.nex". This file must be in the
\r
505 same directory as the executable.
\r
506 ---------------------------------------------------------------------------
\r
507 ---------------------------------------------------------------------------
\r
510 This command provides useful information on the use of this program. The
\r
515 which gives a list of all available commands with a brief description of
\r
520 which gives detailed information on the use of <command>.
\r
521 ---------------------------------------------------------------------------
\r
522 ---------------------------------------------------------------------------
\r
525 This command includes characters that were previously excluded from the
\r
526 analysis. The correct usage is
\r
528 include <number> <number> <number>
\r
532 include <number> - <number>
\r
538 or some combination thereof. Moreover, you can use the specifier "\" to
\r
539 include every nth character. For example, the following
\r
543 would include every third character. As a specific example,
\r
545 include 2 3 10-14 22
\r
547 includes sites 2, 3, 10, 11, 12, 13, 14, and 22 from the analysis. Also,
\r
551 includes all of the characters in the analysis. Including all of the
\r
552 characters (even if many of them are bad) is a very total-evidence-like
\r
553 thing to do. Doing this will make a certain group of people very happy.
\r
554 On the other hand, simply using this program would make those same people
\r
556 ---------------------------------------------------------------------------
\r
557 ---------------------------------------------------------------------------
\r
560 This command links model parameters across partitions of the data. The
\r
563 link <parameter name> = (<all> or <partition list>)
\r
565 The list of parameters that can be linked includes:
\r
567 Tratio -- Transition/transversion rate ratio
\r
568 Revmat -- Substitution rates of GTR model
\r
569 Omega -- Nonsynonymous/synonymous rate ratio
\r
570 Statefreq -- Character state frequencies
\r
571 Shape -- Gamma/LNorm shape parameter
\r
572 Pinvar -- Proportion of invariable sites
\r
573 Correlation -- Correlation parameter of autodiscrete gamma
\r
574 Ratemultiplier -- Rate multiplier for partitions
\r
575 Switchrates -- Switching rates for covarion model
\r
576 Topology -- Topology of tree
\r
577 Brlens -- Branch lengths of tree
\r
578 Speciationrate -- Speciation rates for birth-death process
\r
579 Extinctionrate -- Extinction rates for birth-death process
\r
580 Popsize -- Population size for coalescence process
\r
581 Growthrate -- Growth rate of coalescence process
\r
582 Aamodel -- Aminoacid rate matrix
\r
583 Cpprate -- Rate of Compound Poisson Process (CPP)
\r
584 Cppmultdev -- Standard dev. of CPP rate multipliers (log scale)
\r
585 Cppevents -- CPP events
\r
586 TK02var -- Variance increase in TK02 relaxed clock model
\r
587 Igrvar -- Variance increase in IGR relaxed clock model
\r
588 Mixedvar -- Variance increase in Mixed relaxed clock model
\r
594 links the gamma/lnorm shape parameter across all partitions of the data.
\r
595 You can use "showmodel" to see the current linking status of the
\r
596 characters. For more information on this command, see the help menu
\r
597 for link's converse, unlink ("help unlink");
\r
598 ---------------------------------------------------------------------------
\r
599 ---------------------------------------------------------------------------
\r
602 This command allows output to the screen to also be output to a file.
\r
605 log start/stop filename=<name> append/replace
\r
609 Start/Stop -- Starts or stops logging of output to file.
\r
610 Append/Replace -- Either append to or replace existing file.
\r
611 Filename -- Name of log file (currently, the name of the log
\r
612 file is "log.out").
\r
613 ---------------------------------------------------------------------------
\r
614 ---------------------------------------------------------------------------
\r
617 This command sets the parameters of the likelihood model. The likelihood
\r
618 function is the probability of observing the data conditional on the phylo-
\r
619 genetic model. In order to calculate the likelihood, you must assume a
\r
620 model of character change. This command lets you tailor the biological
\r
621 assumptions made in the phylogenetic model. The correct usage is
\r
623 lset <parameter>=<option> ... <parameter>=<option>
\r
625 For example, "lset nst=6 rates=gamma" would set the model to a general
\r
626 model of DNA substition (the GTR) with gamma-distributed rate variation
\r
631 Applyto -- This option allows you to apply the lset commands to specific
\r
632 partitions. This command should be the first in the list of
\r
633 commands specified in lset. Moreover, it only makes sense to
\r
634 be using this command if the data have been partitioned. A
\r
635 default partition is set on execution of a matrix. If the data
\r
636 are homogeneous (i.e., all of the same data type), then this
\r
637 partition will not subdivide the characters. Up to 30 other
\r
638 partitions can be defined, and you can switch among them using
\r
639 "set partition=<partition name>". Now, you may want to
\r
640 specify different models to different partitions of the data.
\r
641 Applyto allows you to do this. For example, say you have
\r
642 partitioned the data by codon position, and you want to apply
\r
643 a nst=2 model to the first two partitions and nst=6 to the
\r
644 last. This could be implemented in two uses of lset:
\r
646 lset applyto=(1,2) nst=2
\r
648 lset applyto=(3) nst=6
\r
650 The first applies the parameters after "applyto" to the
\r
651 first and second partitions. The second lset applies nst=6
\r
652 to the third partition. You can also use applyto=(all), which
\r
653 attempts to apply the parameter settings to all of the data
\r
654 partitions. Importantly, if the option is not consistent with
\r
655 the data in the partition, the program will not apply the
\r
656 lset option to that partition.
\r
657 Nucmodel -- This specifies the general form of the nucleotide substitution
\r
658 model. The options are "4by4" [the standard model of DNA
\r
659 substitution in which there are only four states (A,C,G,T/U)],
\r
660 "doublet" (a model appropriate for modelling the stem regions
\r
661 of ribosomal genes where the state space is the 16 doublets of
\r
662 nucleotides), "codon" (the substitution model is expanded
\r
663 around triplets of nucleotides--a codon), and "Protein"
\r
664 (triplets of nucleotides are translated to amino acids, which
\r
665 form the basis of the substitution model).
\r
666 Nst -- Sets the number of substitution types: "1" constrains all of
\r
667 the rates to be the same (e.g., a JC69 or F81 model); "2" all-
\r
668 ows transitions and transversions to have potentially different
\r
669 rates (e.g., a K80 or HKY85 model); "6" allows all rates to
\r
670 be different, subject to the constraint of time-reversibility
\r
671 (e.g., a GTR model). Finally, 'nst' can be set to 'mixed', which
\r
672 results in the Markov chain sampling over the space of all poss-
\r
673 ible reversible substitution models, including the GTR model and
\r
674 all models that can be derived from it model by grouping the six
\r
675 rates in various combinations. This includes all the named models
\r
676 above and a large number of others, with or without name.
\r
677 Code -- Enforces the use of a particular genetic code. The default
\r
678 is the universal code. Other options include "vertmt" for
\r
679 vertebrate mitocondrial, "invermt", "mycoplasma", "yeast",
\r
680 "ciliate", "echinoderm", "euplotid", and "metmt" (for
\r
681 metazoan mitochondrial except vertebrates).
\r
682 Ploidy -- Specifies the ploidy of the organism. Options are "Haploid",
\r
683 "Diploid" or "Zlinked". This option is used when a coalescent
\r
684 prior is used on trees.
\r
685 Rates -- Sets the model for among-site rate variation. In general, the
\r
686 rate at a site is considered to be an unknown random variable.
\r
687 The valid options are:
\r
688 * equal -- No rate variation across sites.
\r
689 * gamma -- Gamma-distributed rates across sites. The rate
\r
690 at a site is drawn from a gamma distribution.
\r
691 The gamma distribution has a single parameter
\r
692 that describes how much rates vary.
\r
693 * lnorm -- Log Normal-distributed rates across sites. The
\r
694 rate at a site is drawn from a lognormal
\r
695 distribution. the lognormal distribiton has a
\r
696 single parameter, sigma (SD) that describes how
\r
697 much rates vary (mean fixed to log(1.0) == 0.0.
\r
698 * adgamma -- Autocorrelated rates across sites. The marg-
\r
699 inal rate distribution is gamma, but adjacent
\r
700 sites have correlated rates.
\r
701 * propinv -- A proportion of the sites are invariable.
\r
702 * invgamma -- A proportion of the sites are invariable while
\r
703 the rate for the remaining sites are drawn from
\r
704 a gamma distribution.
\r
705 Note that MrBayes versions 2.0 and earlier supported options
\r
706 that allowed site specific rates (e.g., ssgamma). In versions
\r
707 3.0 and later, site specific rates are allowed, but set using
\r
708 the 'prset ratepr' command for each partition.
\r
709 Ngammacat -- Sets the number of rate categories for the gamma distribution.
\r
710 The gamma distribution is continuous. However, it is virtually
\r
711 impossible to calculate likelihoods under the continuous gamma
\r
712 distribution. Hence, an approximation to the continuous gamma
\r
713 is used; the gamma distribution is broken into ncat categories
\r
714 of equal weight (1/ncat). The mean rate for each category rep-
\r
715 resents the rate for the entire cateogry. This option allows
\r
716 you to specify how many rate categories to use when approx-
\r
717 imating the gamma. The approximation is better as ncat is inc-
\r
718 reased. In practice, "ncat=4" does a reasonable job of
\r
719 approximating the continuous gamma.
\r
720 It is also used to set the number of rate categories for the
\r
721 lognormal distribution to avoid changing too much of the code,
\r
722 although the name is bad (should add Nlnormcat in future).
\r
723 Nbetacat -- Sets the number of rate categories for the beta distribution.
\r
724 A symmetric beta distribution is used to model the stationary
\r
725 frequencies when morphological data are used. This option
\r
726 specifies how well the beta distribution will be approximated.
\r
727 Omegavar -- Allows the nonsynonymous/synonymous rate ratio (omega) to vary
\r
728 across codons. Ny98 assumes that there are three classes, with
\r
729 potentially different omega values (omega1, omega2, omega3):
\r
730 omega2 = 1; 0 < omega1 < 1; and omega3 > 1. Like the Ny98 model,
\r
731 the M3 model has three omega classes. However, their values are
\r
732 less constrained, with omega1 < omega2 < omega3. The default
\r
733 (omegavar = equal) has no variation on omega across sites.
\r
734 Covarion -- This forces the use of a covarion-like model of substitution
\r
735 for nucleotide or amino acid data. The valid options are "yes"
\r
736 and "no". The covarion model allows the rate at a site to
\r
737 change over its evolutionary history. Specifically, the site
\r
738 is either on or off. When it is off, no substitutions are poss-
\r
739 ible. When the process is on, substitutions occur according to
\r
740 a specified substitution model (specified using the other
\r
742 Coding -- This specifies how characters were sampled. If all site patterns
\r
743 had the possibility of being sampled, then "All" should be
\r
744 specified (the default). Otherwise "Variable" (only variable
\r
745 characters had the possibility of being sampled), "Informative"
\r
746 (only parsimony informative characters has the possibility of
\r
747 being sampled), "Nosingletons" (characters which are constant
\r
748 in all but one taxon were not sampled), "Noabsencesites" (char-
\r
749 acters for which all taxa were coded as absent were not sampled),
\r
750 "Nopresencesites" (characters for which all taxa were coded as
\r
751 present were not sampled). "All" works for all data types.
\r
752 However, the others only work for morphological (All/Variable/
\r
753 Informative/Nosingletons) or restriction site (All/Variable/
\r
754 Informative/Nosingletons/Noabsencesites/Nopresencesites/
\r
755 Nosingletonpresence/Nosingletonabsence) data.
\r
756 Parsmodel -- This forces calculation under the so-called parsimony model
\r
757 described by Tuffley and Steel (1998). The options are "yes"
\r
758 or "no". Note that the biological assumptions of this model
\r
759 are anything but parsimonious. In fact, this model assumes many
\r
760 more parameters than the next most complicated model implemented
\r
761 in this program. If you really believe that the parsimony model
\r
762 makes the biological assumptions described by Tuffley and Steel,
\r
763 then the parsimony method is miss-named.
\r
765 Default model settings:
\r
767 Parameter Options Current Setting
\r
768 ------------------------------------------------------------------
\r
769 Nucmodel 4by4/Doublet/Codon/Protein 4by4
\r
771 Code Universal/Vertmt/Invermt/Yeast/Mycoplasma/
\r
772 Ciliate/Echinoderm/Euplotid/Metmt Universal
\r
773 Ploidy Haploid/Diploid/Zlinked Diploid
\r
774 Rates Equal/Gamma/LNorm/Propinv/
\r
775 Invgamma/Adgamma Equal
\r
776 Ngammacat <number> 4
\r
777 Nbetacat <number> 5
\r
778 Omegavar Equal/Ny98/M3 Equal
\r
779 Covarion No/Yes No
\r
780 Coding All/Variable/Informative/Nosingletons
\r
781 Noabsencesites/Nopresencesites/
\r
782 Nosingletonabsence/Nosingletonpresence All
\r
783 Parsmodel No/Yes No
\r
784 ------------------------------------------------------------------
\r
786 ---------------------------------------------------------------------------
\r
789 This command allows you to generate a text file containing help information
\r
790 on all the available commands. This text file can be used as an up-to-date
\r
791 command reference. You can set the name of the text file using the
\r
792 "filename" option; the default is "commref_mb<version>.txt".
\r
794 Parameter Options Current Setting
\r
795 --------------------------------------------------------
\r
796 Filename <name> commref_mb3.2.7-svn.txt
\r
798 ---------------------------------------------------------------------------
\r
799 ---------------------------------------------------------------------------
\r
802 This command starts the Markov chain Monte Carlo (MCMC) analysis. The
\r
803 posterior probability of phylogenetic trees (and other parameters of the
\r
804 substitution model) cannot be determined analytically. Instead, MCMC is
\r
805 used to approximate the posterior probabilities of trees by drawing
\r
806 (dependent) samples from the posterior distribution. This program can
\r
807 implement a variant of MCMC called "Metropolis-coupled Markov chain Monte
\r
808 Carlo", or MCMCMC for short. Basically, "Nchains" are run, with
\r
809 Nchains - 1 of them heated. The chains are labelled 1, 2, ..., Nchains.
\r
810 The heat that is applied to the i-th chain is B = 1 / (1 + temp X i). B
\r
811 is the power to which the posterior probability is raised. When B = 0, all
\r
812 trees have equal probability and the chain freely visits trees. B = 1 is
\r
813 the "cold" chain (or the distribution of interest). MCMCMC can mix
\r
814 better than ordinary MCMC; after all of the chains have gone through
\r
815 one cycle, two chains are chosen at random and an attempt is made to
\r
816 swap the states (with the probability of a swap being determined by the
\r
817 Metropolis et al. equation). This allows the chain to potentially jump
\r
818 a valley in a single bound. The correct usage is
\r
820 mcmc <parameter> = <value> ... <parameter> = <value>
\r
824 mcmc ngen=100000 nchains=4 temp=0.5
\r
826 performs a MCMCMC analysis with four chains with the temperature set to
\r
827 0.5. The chains would be run for 100,000 cycles.
\r
831 Ngen -- This option sets the number of cycles for the MCMC alg-
\r
832 orithm. This should be a big number as you want the chain
\r
833 to first reach stationarity, and then remain there for
\r
834 enough time to take lots of samples.
\r
835 Nruns -- How many independent analyses are started simultaneously.
\r
836 Nchains -- How many chains are run for each analysis for the MCMCMC
\r
837 variant. The default is 4: 1 cold chain and 3 heated chains.
\r
838 If Nchains is set to 1, MrBayes will use regular MCMC sam-
\r
839 pling, without heating.
\r
840 Temp -- The temperature parameter for heating the chains. The higher
\r
841 the temperature, the more likely the heated chains are to
\r
842 move between isolated peaks in the posterior distribution.
\r
843 However, excessive heating may lead to very low acceptance
\r
844 rates for swaps between different chains. Before changing the
\r
845 default setting, however, note that the acceptance rates of
\r
846 swaps tend to fluctuate during the burn-in phase of the run.
\r
847 Reweight -- Here, you specify three numbers, that respectively represent
\r
848 the percentage of characters to decrease in weight, the
\r
849 percentage of characters to increase in weight, and the
\r
850 increment. An increase/decrease in weight is acheived by
\r
851 replicating/removing a character in the matrix. This is
\r
852 only done to non-cold chains. The format for this parameter
\r
853 is "reweight=(<number>,<number>)" or "reweight=(<number>,
\r
854 <number>,<number>)".
\r
855 Swapfreq -- This specifies how often swaps of states between chains are
\r
856 attempted. You must be running at least two chains for this
\r
857 option to be relevant. The default is Swapfreq=1, resulting
\r
858 in Nswaps (see below) swaps being tried each generation of
\r
859 the run. If Swapfreq is set to 10, then Nswaps swaps will be
\r
860 tried every tenth generation of the run.
\r
861 Nswaps -- The number of swaps tried for each swapping generation of the
\r
862 chain (see also Swapfreq).
\r
863 Samplefreq -- This specifies how often the Markov chain is sampled. You
\r
864 can sample the chain every cycle, but this results in very
\r
865 large output files. Thinning the chain is a way of making
\r
866 these files smaller and making the samples more independent.
\r
867 Printfreq -- This specifies how often information about the chain is
\r
868 printed to the screen.
\r
869 Printall -- If set to NO, only cold chains in a MCMC analysis are printed
\r
870 to screen. If set to YES, both cold and heated chains will be
\r
871 output. This setting only affects the printing to screen, it
\r
872 does not change the way values are written to file.
\r
873 Printmax -- The maximum number of chains to print to screen.
\r
874 Mcmcdiagn -- Determines whether acceptance ratios of moves and swaps will
\r
875 be printed to file. The file will be named similarly to the
\r
876 '.p' and '.t' files, but will have the ending '.mcmc'. If
\r
877 more than one independent analysis is run simultaneously (see
\r
878 Nruns below), convergence diagnostics for tree topology will
\r
879 also be printed to this file. The convergence diagnostic used
\r
880 is the average standard deviation in partition frequency
\r
881 values across independent analyses. The Burnin setting (see
\r
882 below) determines how many samples will be discarded as burnin
\r
883 before calculating the partition frequencies. The Minpartfreq
\r
884 setting (see below) determines the minimum partition frequency
\r
885 required for a partition to be included in the calculation. As
\r
886 the independent analyses approach stationarity (converge), the
\r
887 value of the diagnostic is expected to approach zero.
\r
888 Diagnfreq -- The number of generations between the calculation of MCMC
\r
889 diagnostics (see Mcmcdiagn above).
\r
890 Diagnstat -- The statistic to use for run-time convergence diagnostics.
\r
891 Choices are 'Avgstddev' for average standard deviation of
\r
892 split frequencies and 'Maxstddev' for maximum standard devia-
\r
893 tion of split frequencies.
\r
894 Savetrees -- If you are using a relative burnin for run-time convergence
\r
895 diagnostics, tree samples need to be deleted from split
\r
896 frequency counters as the cut-off point for the burnin moves
\r
897 during the run. If 'Savetrees' is set to 'No', tree samples
\r
898 to be discarded are read back in from file. If 'Savetrees' is
\r
899 set to 'Yes', the tree samples to be removed will be stored
\r
900 in the internal memory instead. This can use up a lot of
\r
901 memory in large analyses.
\r
902 Minpartfreq -- The minimum frequency required for a partition to be included
\r
903 in the calculation of the topology convergence diagnostic. The
\r
904 partition is included if the minimum frequency is reached in
\r
905 at least one of the independent tree samples that are com-
\r
907 Allchains -- If this option is set to YES, acceptance ratios for moves are
\r
908 recorded for all chains, cold or heated. By default, only the
\r
909 acceptance ratios for the cold chain are recorded.
\r
910 Allcomps -- If this option is set to YES, topological convergence diag-
\r
911 nostics are calculated over all pairwise comparisons of runs.
\r
912 If it is set to NO, only the overall value is reported.
\r
913 Relburnin -- If this option is set to YES, then a proportion of the sampled
\r
914 values will be discarded as burnin when calculating the con-
\r
915 vergence diagnostic. The proportion to be discarded is set
\r
916 with Burninfrac (see below). When the Relburnin option is set
\r
917 to NO, then a specific number of samples will be discarded
\r
918 instead. This number is set by Burnin (see below).
\r
919 Burnin -- Determines the number of samples (not generations) that will
\r
920 be discarded when convergence diagnostics are calculated.
\r
921 The value of this option is only relevant when Relburnin is
\r
923 BurninFrac -- Determines the fraction of samples that will be discarded
\r
924 when convergence diagnostics are calculated. The value of
\r
925 this option is only relevant when Relburnin is set to YES.
\r
926 Example: A value for this option of 0.25 means that 25% of
\r
927 the samples will be discarded.
\r
928 Stoprule -- If this option is set to NO, then the chain is run the number
\r
929 of generations determined by Ngen. If it is set to YES, and
\r
930 topological convergence diagnostics are calculated (Mcmcdiagn
\r
931 is set to YES), then the chain will be stopped before the pre-
\r
932 determined number of generations if the convergence diagnostic
\r
933 falls below the stop value.
\r
934 Stopval -- The critical value for the topological convergence diagnostic.
\r
935 Only used when Stoprule and Mcmcdiagn are set to yes, and
\r
936 more than one analysis is run simultaneously (Nruns > 1).
\r
937 Checkpoint -- If this parameter is set to 'Yes', all the current parameter
\r
938 values of all chains will be printed to a check-pointing file
\r
939 every 'Checkfreq' generation of the analysis. The file will be
\r
940 named <Filename>.ckp and allows you to restart the analysis
\r
941 from the last check point. This can be handy if you are
\r
942 running a long analysis and want to extend it, or if there is
\r
943 a risk that a long analysis will be inadvertently interupted
\r
944 by hardware failure or other factors that are out of your
\r
946 Checkfreq -- The number of generations between check-pointing. See the
\r
947 'Checkpoint' parameter above for more information.
\r
948 Filename -- The name of the files that will be generated. Two files
\r
949 are generated: "<Filename>.t" and "<Filename>.p".
\r
950 The .t file contains the trees whereas the .p file con-
\r
951 tains the sampled values of the parameters.
\r
952 Startparams -- The starting values for the model parameters are set to
\r
953 arbitrary or random values when the parameters are created.
\r
954 These starting values can be altered using the 'Startvals'
\r
955 command. The 'Startparams=reset' option allows you to reset
\r
956 the starting values to the default at the start of the ana-
\r
957 lysis, overriding any previous user-defined starting values.
\r
958 Under the default option, 'current', the chains will use the
\r
959 current starting values.
\r
960 Starttree -- The starting tree(s) for the chain can either be randomly
\r
961 selected or user-defined. It might be a good idea to
\r
962 start from randomly chosen trees; convergence seems
\r
963 likely if independently run chains, each of which
\r
964 started from different random trees, converge to the same
\r
965 answer. If you want the chain to start from user-defined
\r
966 trees instead, you first need to read in your tree(s) from a
\r
967 Nexus file with a 'trees' block, and then you need to set the
\r
968 starting tree(s) using the 'Startvals' command. Finally, you
\r
969 need to make sure that 'Starttree' is set to 'current'. If
\r
970 you do not set the starting tree(s), the chains will start
\r
971 with random trees. Setting 'Starttree' to 'random' causes
\r
972 new starting trees to be drawn randomly at the start of the
\r
973 run, overwriting any previous user-defined starting trees.
\r
974 Nperts -- This is the number of random perturbations to apply to the
\r
975 user starting tree. This allows you to have something
\r
976 between completely random and user-defined trees start
\r
978 Data -- When Data is set to NO, the chain is run without data. This
\r
979 should be used only for examining induced priors. DO NOT SET
\r
980 'DATA' TO 'NO' UNLESS YOU KNOW WHAT YOU ARE DOING!
\r
981 Ordertaxa -- Determines whether taxa should be ordered before trees are
\r
982 printed to file. If set to 'Yes', terminals in the sampled
\r
983 trees will be reordered to match the order of the taxa in the
\r
984 data matrix as closely as possible. By default, trees will be
\r
985 printed without reordering of taxa.
\r
986 Append -- Set this to 'Yes' to append the results of the current run to
\r
987 a previous run. MrBayes will first read in the results of the
\r
988 previous run (number of generations and sampled splits) and
\r
989 will then continue that run where you left it off. Make sure
\r
990 that the output file names used in the previous run are the
\r
991 same as those in the current run.
\r
992 Autotune -- Set this to 'Yes' to autotune the proposals that change
\r
993 substitution model parameters. When set to 'No', the tuning
\r
994 parameters are fixed to their starting values. Note that the
\r
995 autotuning occurs independently for each chain. The target
\r
996 acceptance rate for each move can be changed using the
\r
997 'Propset' command.
\r
998 Tunefreq -- When a proposal has been tried 'Tunefreq' times, its tuning
\r
999 parameter is adjusted to reach the target acceptance rate
\r
1000 if 'Autotune' is set to 'Yes'.
\r
1002 Parameter Options Current Setting
\r
1003 -----------------------------------------------------
\r
1004 Ngen <number> 1000000
\r
1006 Nchains <number> 4
\r
1007 Temp <number> 0.100000
\r
1008 Reweight <number>,<number> 0.00 v 0.00 ^
\r
1009 Swapfreq <number> 1
\r
1010 Nswaps <number> 1
\r
1011 Samplefreq <number> 500
\r
1012 Printfreq <number> 1000
\r
1013 Printall Yes/No Yes
\r
1014 Printmax <number> 8
\r
1015 Mcmcdiagn Yes/No Yes
\r
1016 Diagnfreq <number> 5000
\r
1017 Diagnstat Avgstddev/Maxstddev Avgstddev
\r
1018 Minpartfreq <number> 0.10
\r
1019 Allchains Yes/No No
\r
1020 Allcomps Yes/No No
\r
1021 Relburnin Yes/No Yes
\r
1022 Burnin <number> 0
\r
1023 Burninfrac <number> 0.25
\r
1024 Stoprule Yes/No No
\r
1025 Stopval <number> 0.05
\r
1026 Savetrees Yes/No No
\r
1027 Checkpoint Yes/No Yes
\r
1028 Checkfreq <number> 2000
\r
1029 Filename <name> temp.<p/t>
\r
1030 Startparams Current/Reset Current
\r
1031 Starttree Current/Random/ Current
\r
1033 Nperts <number> 0
\r
1035 Ordertaxa Yes/No No
\r
1037 Autotune Yes/No Yes
\r
1038 Tunefreq <number> 100
\r
1040 ---------------------------------------------------------------------------
\r
1041 ---------------------------------------------------------------------------
\r
1044 This command sets the parameters of the Markov chain Monte Carlo (MCMC)
\r
1045 analysis without actually starting the chain. This command is identical
\r
1046 in all respects to Mcmc, except that the analysis will not start after
\r
1047 this command is issued. For more details on the options, check the help
\r
1050 Parameter Options Current Setting
\r
1051 -----------------------------------------------------
\r
1052 Ngen <number> 1000000
\r
1054 Nchains <number> 4
\r
1055 Temp <number> 0.100000
\r
1056 Reweight <number>,<number> 0.00 v 0.00 ^
\r
1057 Swapfreq <number> 1
\r
1058 Nswaps <number> 1
\r
1059 Samplefreq <number> 500
\r
1060 Printfreq <number> 1000
\r
1061 Printall Yes/No Yes
\r
1062 Printmax <number> 8
\r
1063 Mcmcdiagn Yes/No Yes
\r
1064 Diagnfreq <number> 5000
\r
1065 Diagnstat Avgstddev/Maxstddev Avgstddev
\r
1066 Minpartfreq <number> 0.10
\r
1067 Allchains Yes/No No
\r
1068 Allcomps Yes/No No
\r
1069 Relburnin Yes/No Yes
\r
1070 Burnin <number> 0
\r
1071 Burninfrac <number> 0.25
\r
1072 Stoprule Yes/No No
\r
1073 Stopval <number> 0.05
\r
1074 Savetrees Yes/No No
\r
1075 Checkpoint Yes/No Yes
\r
1076 Checkfreq <number> 2000
\r
1077 Filename <name> temp.<p/t>
\r
1078 Startparams Current/Reset Current
\r
1079 Starttree Current/Random/ Current
\r
1081 Nperts <number> 0
\r
1083 Ordertaxa Yes/No No
\r
1085 Autotune Yes/No Yes
\r
1086 Tunefreq <number> 100
\r
1088 ---------------------------------------------------------------------------
\r
1089 ---------------------------------------------------------------------------
\r
1092 This command assigns a taxon to the outgroup. The correct usage is:
\r
1094 outgroup <number>/<taxon name>
\r
1096 For example, "outgroup 3" assigns the third taxon in the matrix to be
\r
1097 the outgroup. Similarly, "outgroup Homo_sapiens" assings the taxon
\r
1098 "Homo_sapiens" to be the outgroup (assuming that there is a taxon named
\r
1099 "Homo_sapiens" in the matrix). Only a single taxon can be assigned to
\r
1102 ---------------------------------------------------------------------------
\r
1103 ---------------------------------------------------------------------------
\r
1106 This command is used to specify pairs of nucleotides. For example, your
\r
1107 data may be RNA sequences with a known secondary structure of stems and
\r
1108 loops. Substitutions in nucleotides involved in a Watson-Crick pairing
\r
1109 in stems are not strictly independent; a change in one changes the prob-
\r
1110 ability of a change in the partner. A solution to this problem is to
\r
1111 expand the model around the pair of nucleotides in the stem. This
\r
1112 command allows you to do this. The correct usage is:
\r
1114 pairs <NUC1>:<NUC2>, <NUC1>:<NUC2>,..., <NUC1>:<NUC2>;
\r
1118 pairs 30:56, 31:55, 32:54, 33:53, 34:52, 35:51, 36:50;
\r
1120 specifies pairings between nucleotides 30 and 56, 31 and 55, etc. Only
\r
1121 nucleotide data (DNA or RNA) may be paired using this command. Note that
\r
1122 in order for the program to actually implement a "doublet" model
\r
1123 involving a 16 X 16 rate matrix, you must specify that the structure of
\r
1124 the model is 16 X 16 using "lset nucmodel=doublet".
\r
1125 ---------------------------------------------------------------------------
\r
1126 ---------------------------------------------------------------------------
\r
1129 This command allows you to specify a character partition. The format for
\r
1132 partition <name> = <num parts>:<chars in first>, ...,<chars in last>
\r
1134 For example, "partition by_codon = 3:1st_pos,2nd_pos,3rd_pos" specifies
\r
1135 a partition called "by_codon" which consists of three parts (first,
\r
1136 second, and third codon positions). Here, we are assuming that the sites
\r
1137 in each partition were defined using the charset command. You can specify
\r
1138 a partition without using charset as follows:
\r
1140 partition by_codon = 3:1 4 6 9 12,2 5 7 10 13,3 6 8 11 14
\r
1142 However, we recommend that you use the charsets to define a set of char-
\r
1143 acters and then use these predefined sets when defining the partition.
\r
1144 Also, it makes more sense to define a partition as a line in the mrbayes
\r
1145 block than to issue the command from the command line (then again, you
\r
1146 may be a masochist, and want to do extra work).
\r
1147 ---------------------------------------------------------------------------
\r
1148 ---------------------------------------------------------------------------
\r
1151 This command plots specified parameters in the .p file or one of the .p files
\r
1152 created during an MCMC analysis. An x-y graph of the parameter over the course
\r
1153 of the chain is created. The command can be useful for visually diagnosing
\r
1154 convergence for many of the parameters of the phylogenetic model. The para-
\r
1155 meter to be plotted is specified by the "parameter" option. Several para-
\r
1156 meters can be plotted at once by using the "match" option, which has a
\r
1157 default value of "perfect". For example, if you were to set "parameter = pi"
\r
1158 and "match = consistentwith", then all of the state frequency parameters
\r
1159 would be plotted. You can also set "match=all", in which case all of the
\r
1160 parameters are plotted.
\r
1162 Note that the "Sump" command provides a different set of convergence diag-
\r
1163 nostics tools that you may also want to explore. Unlike "Plot", "Sump" can
\r
1164 compare two or more parameter samples and will calculate convergence diagnos-
\r
1165 tics as wel as parameter summaries for the pooled sample.
\r
1169 Relburnin -- If this option is set to 'Yes', then a proportion of the
\r
1170 samples will be discarded as burnin when creating the plot.
\r
1171 The proportion to be discarded is set with Burninfrac (see
\r
1172 Burninfrac below). When the Relburnin option is set to 'No',
\r
1173 then a specific number of samples is discarded instead. This
\r
1174 number is set by Burnin (see below). Note that the burnin
\r
1175 setting is shared across the 'comparetree', 'sump' and 'sumt'
\r
1177 Burnin -- Determines the number of samples (not generations) that will
\r
1178 be discarded when summary statistics are calculated. The
\r
1179 value of this option is only relevant when Relburnin is set
\r
1181 Burninfrac -- Determines the fraction of samples that will be discarded
\r
1182 when creating a plot. The value of this parameter is only
\r
1183 relevant when Relburnin is set to 'Yes'. Example: A value of
\r
1184 this option of 0.25 means that 25% of the samples will be
\r
1186 Filename -- The name of the file to plot.
\r
1187 Parameter -- Specification of parameters to be plotted. See above for
\r
1189 Match -- Specifies how to match parameter names to the Parameter
\r
1190 specification. See above for details.
\r
1192 Current settings:
\r
1194 Parameter Options Current Setting
\r
1195 ------------------------------------------------------------
\r
1196 Relburnin Yes/No Yes
\r
1197 Burnin <number> 0
\r
1198 Burninfrac <number> 0.25
\r
1199 Filename <name> temp.p
\r
1200 Parameter <name> lnL
\r
1201 Match Perfect/Consistentwith/All Perfect
\r
1203 ---------------------------------------------------------------------------
\r
1204 ---------------------------------------------------------------------------
\r
1207 This command sets the priors for the phylogenetic model. Remember that
\r
1208 in a Bayesian analysis, you must specify a prior probability distribution
\r
1209 for the parameters of the likelihood model. The prior distribution rep-
\r
1210 resents your prior beliefs about the parameter before observation of the
\r
1211 data. This command allows you to tailor your prior assumptions to a large
\r
1216 Applyto -- This option allows you to apply the prset commands to
\r
1217 specific partitions. This command should be the first
\r
1218 in the list of commands specified in prset. Moreover, it
\r
1219 only makes sense to be using this command if the data
\r
1220 have been partitioned. A default partition is set on
\r
1221 execution of a matrix. If the data are homogeneous
\r
1222 (i.e., all of the same data type), then this partition
\r
1223 will not subdivide the characters. Up to 30 other part-
\r
1224 itions can be defined, and you can switch among them using
\r
1225 "set partition=<partition name>". Now, you may want to
\r
1226 specify different priors to different partitions of the
\r
1227 data. Applyto allows you to do this. For example, say
\r
1228 you have partitioned the data by codon position, and
\r
1229 you want to fix the statefreqs to equal for the first two
\r
1230 partitions but apply a flat Dirichlet prior to the state-
\r
1231 freqs of the last. This could be implemented in two uses of
\r
1234 prset applyto=(1,2) statefreqs=fixed(equal)
\r
1236 prset applyto=(3) statefreqs=dirichlet(1,1,1,1)
\r
1238 The first applies the parameters after "applyto"
\r
1239 to the first and second partitions. The second prset
\r
1240 applies a flat Dirichlet to the third partition. You can
\r
1241 also use applyto=(all), which attempts to apply the para-
\r
1242 meter settings to all of the data partitions. Importantly,
\r
1243 if the option is not consistent with the data in the part-
\r
1244 ition, the program will not apply the prset option to
\r
1246 Tratiopr -- This parameter sets the prior for the transition/trans-
\r
1247 version rate ratio (tratio). The options are:
\r
1249 prset tratiopr = beta(<number>, <number>)
\r
1250 prset tratiopr = fixed(<number>)
\r
1252 The program assumes that the transition and transversion
\r
1253 rates are independent gamma-distributed random variables
\r
1254 with the same scale parameter when beta is selected. If you
\r
1255 want a diffuse prior that puts equal emphasis on transition/
\r
1256 transversion rate ratios above 1.0 and below 1.0, then use a
\r
1257 flat Beta, beta(1,1), which is the default. If you wish to
\r
1258 concentrate this distribution more in the equal-rates region,
\r
1259 then use a prior of the type beta(x,x), where the magnitude
\r
1260 of x determines how much the prior is concentrated in the
\r
1261 equal rates region. For instance, a beta(20,20) puts more
\r
1262 probability on rate ratios close to 1.0 than a beta(1,1). If
\r
1263 you think it is likely that the transition/transversion rate
\r
1264 ratio is 2.0, you can use a prior of the type beta(2x,x),
\r
1265 where x determines how strongly the prior is concentrated on
\r
1266 tratio values near 2.0. For instance, a beta(2,1) is much
\r
1267 more diffuse than a beta(80,40) but both have the expected
\r
1268 tratio 2.0 in the absence of data. The parameters of the
\r
1269 Beta can be interpreted as counts: if you have observed x
\r
1270 transitions and y transversions, then a beta(x+1,y+1) is a
\r
1271 good representation of this information. The fixed option
\r
1272 allows you to fix the tratio to a particular value.
\r
1273 Revmatpr -- This parameter sets the prior for the substitution rates
\r
1274 of the GTR model for nucleotide data. The options are:
\r
1276 prset revmatpr = dirichlet(<number>,<number>,...,<number>)
\r
1277 prset revmatpr = fixed(<number>,<number>,...,<number>)
\r
1279 The program assumes that the six substitution rates
\r
1280 are independent gamma-distributed random variables with the
\r
1281 same scale parameter when dirichlet is selected. The six
\r
1282 numbers in brackets each corresponds to a particular substi-
\r
1283 tution type. Together, they determine the shape of the prior.
\r
1284 The six rates are in the order A<->C, A<->G, A<->T, C<->G,
\r
1285 C<->T, and G<->T. If you want an uninformative prior you can
\r
1286 use dirichlet(1,1,1,1,1,1), also referred to as a 'flat'
\r
1287 Dirichlet. This is the default setting. If you wish a prior
\r
1288 where the C<->T rate is 5 times and the A<->G rate 2 times
\r
1289 higher, on average, than the transversion rates, which are
\r
1290 all the same, then you should use a prior of the form
\r
1291 dirichlet(x,2x,x,x,5x,x), where x determines how much the
\r
1292 prior is focused on these particular rates. For more info,
\r
1293 see tratiopr. The fixed option allows you to fix the substi-
\r
1294 tution rates to particular values.
\r
1295 Revratepr -- This parameter sets the prior for each substitution rate of
\r
1296 the GTR model subspace when 'nst' is set to 'mixed' (see the
\r
1297 'lset' command). The only option is
\r
1299 prset revratepr = symdir(<number>)
\r
1301 which will associate each independent rate in the rate matrix
\r
1302 with a modified symmetric Dirichlet prior, where a singleton
\r
1303 rate has the specified alpha parameter, while a rate that
\r
1304 applies to n pairwise substitution types has an alpha that is
\r
1305 n times the specified number. The higher the specified num-
\r
1306 ber, the more focused the prior will be on equal rates. The
\r
1307 default value is 1, which gives an effect similar to a flat
\r
1309 Aamodelpr -- This parameter sets the rate matrix for amino acid data.
\r
1310 You can either fix the model by specifying aamodelpr=fixed
\r
1311 (<model name>), where <model name> is 'poisson' (a glorified
\r
1312 Jukes-Cantor model), 'jones', 'dayhoff', 'mtrev', 'mtmam',
\r
1313 'wag', 'rtrev', 'cprev', 'vt', 'blosum', 'lg', 'equalin'
\r
1314 (a glorified Felsenstein 1981 model), or 'gtr'. You can also
\r
1315 average over the first ten models by specifying aamodelpr=
\r
1316 mixed. If you do so, the Markov chain will sample each model
\r
1317 according to its probability. The sampled model is reported
\r
1318 as an index: poisson(0), jones(1), dayhoff(2), mtrev(3),
\r
1319 mtmam(4), wag(5), rtrev(6), cprev(7), vt(8), or blosum(9).
\r
1320 The 'Sump' command summarizes the MCMC samples and calculates
\r
1321 the posterior probability estimate for each of these models.
\r
1322 Aarevmatpr -- This parameter sets the prior for the substitution rates
\r
1323 of the GTR model for amino acid data. The options are:
\r
1325 prset aarevmatpr = dirichlet(<number>,<number>,...,<number>)
\r
1326 prset aarevmatpr = fixed(<number>,<number>,...,<number>)
\r
1328 The options are the same as those for 'Revmatpr' except that
\r
1329 they are defined over the 190 rates of the time-reversible
\r
1330 GTR model for amino acids instead of over the 6 rates of the
\r
1331 GTR model for nucleotides. The rates are in the order A<->R,
\r
1332 A<->N, etc to Y<->V. In other words, amino acids are listed
\r
1333 in alphabetic order based on their full name. The first amino
\r
1334 acid (Alanine) is then combined in turn with all amino acids
\r
1335 following it in the list, starting with amino acid 2 (Argi-
\r
1336 nine) and finishing with amino acid 20 (Valine). The second
\r
1337 amino acid (Arginine) is then combined in turn with all amino
\r
1338 acids following it, starting with amino acid 3 (Asparagine)
\r
1339 and finishing with amino acid 20 (Valine), and so on.
\r
1340 Omegapr -- This parameter specifies the prior on the nonsynonymous/
\r
1341 synonymous rate ratio. The options are:
\r
1343 prset omegapr = dirichlet(<number>,<number>)
\r
1344 prset omegapr = fixed(<number>)
\r
1346 This parameter is only in effect if the nucleotide sub-
\r
1347 stitution model is set to codon using the lset command
\r
1348 (lset nucmodel=codon). Moreover, it only applies to the
\r
1349 case when there is no variation in omega across sites (i.e.,
\r
1350 "lset omegavar=equal").
\r
1351 Ny98omega1pr -- This parameter specifies the prior on the nonsynonymous/
\r
1352 synonymous rate ratio for sites under purifying selection.
\r
1355 prset Ny98omega1pr = beta(<number>,<number>)
\r
1356 prset Ny98omega1pr = fixed(<number>)
\r
1358 This parameter is only in effect if the nucleotide sub-
\r
1359 stitution model is set to codon using the lset command
\r
1360 (lset nucmodel=codon). Moreover, it only applies to the
\r
1361 case where omega varies across sites using the model of
\r
1362 Nielsen and Yang (1998) (i.e., "lset omegavar=ny98"). If
\r
1363 fixing the parameter, you must specify a number between
\r
1365 Ny98omega3pr -- This parameter specifies the prior on the nonsynonymous/
\r
1366 synonymous rate ratio for positively selected sites. The
\r
1369 prset Ny98omega3pr = uniform(<number>,<number>)
\r
1370 prset Ny98omega3pr = exponential(<number>)
\r
1371 prset Ny98omega3pr = fixed(<number>)
\r
1373 This parameter is only in effect if the nucleotide sub-
\r
1374 stitution model is set to codon using the lset command
\r
1375 (lset nucmodel=codon). Moreover, it only applies to the
\r
1376 case where omega varies across sites according to the
\r
1377 NY98 model. Note that if the NY98 model is specified
\r
1378 that this parameter must be greater than 1, so you should
\r
1379 not specify a uniform(0,10) prior, for example.
\r
1380 M3omegapr -- This parameter specifies the prior on the nonsynonymous/
\r
1381 synonymous rate ratios for all three classes of sites for
\r
1382 the M3 model. The options are:
\r
1384 prset M3omegapr = exponential
\r
1385 prset M3omegapr = fixed(<number>,<number>,<number>)
\r
1387 This parameter is only in effect if the nucleotide sub-
\r
1388 stitution model is set to codon using the lset command
\r
1389 (lset nucmodel=codon). Moreover, it only applies to the
\r
1390 case where omega varies across sites using the M3 model of
\r
1391 Yang et al. (2000) (i.e., "lset omegavar=M3"). Under the
\r
1392 exponential prior, the four rates (dN1, dN2, dN3, and dS)
\r
1393 are all considered to be independent draws from the same
\r
1394 exponential distribution (the parameter of the exponential
\r
1395 does not matter, and so you don't need to specify it). The
\r
1396 rates dN1, dN2, and dN3 are taken to be the order statistics
\r
1397 with dN1 < dN2 < dN3. These three rates are all scaled to
\r
1398 the same synonymous rate, dS. The other option is to simply
\r
1399 fix the three rate ratios to some values.
\r
1400 Codoncatfreqs -- This parameter specifies the prior on frequencies of sites
\r
1401 under purifying, neutral, and positive selection. The
\r
1404 prset codoncatfreqs = dirichlet(<num>,<num>,<num>)
\r
1405 prset codoncatfreqs = fixed(<number>,<number>,<number>)
\r
1407 This parameter is only in effect if the nucleotide sub-
\r
1408 stitution model is set to codon using the lset command
\r
1409 (lset nucmodel=codon). Moreover, it only applies to the
\r
1410 case where omega varies across sites using the models of
\r
1411 Nielsen and Yang (1998) (i.e., "lset omegavar=ny98")
\r
1412 or Yang et al. (2000) (i.e., "lset omegavar=M3")
\r
1413 Note that the sum of the three frequencies must be 1.
\r
1414 Statefreqpr -- This parameter specifies the prior on the state freq-
\r
1415 uencies. The options are:
\r
1417 prset statefreqpr = dirichlet(<number>)
\r
1418 prset statefreqpr = dirichlet(<number>,...,<number>)
\r
1419 prset statefreqpr = fixed(equal)
\r
1420 prset statefreqpr = fixed(empirical)
\r
1421 prset statefreqpr = fixed(<number>,...,<number>)
\r
1423 For the dirichlet, you can specify either a single number
\r
1424 or as many numbers as there are states. If you specify a
\r
1425 single number, then the prior has all states equally
\r
1426 probable with a variance related to the single parameter
\r
1428 Shapepr -- This parameter specifies the prior for the gamma/lnorm shape
\r
1429 parameter for among-site rate variation. The options are:
\r
1431 prset shapepr = uniform(<number>,<number>)
\r
1432 prset shapepr = exponential(<number>)
\r
1433 prset shapepr = fixed(<number>)
\r
1435 Pinvarpr -- This parameter specifies the prior for the proportion of
\r
1436 invariable sites. The options are:
\r
1438 prset pinvarpr = uniform(<number>,<number>)
\r
1439 prset pinvarpr = fixed(<number>)
\r
1441 Note that the valid range for the parameter is between 0
\r
1442 and 1. Hence, "prset pinvarpr=uniform(0,0.8)" is valid
\r
1443 while "prset pinvarpr=uniform(0,10)" is not. The def-
\r
1444 ault setting is "prset pinvarpr=uniform(0,1)".
\r
1445 Ratecorrpr -- This parameter specifies the prior for the autocorrelation
\r
1446 parameter of the autocorrelated gamma distribution for
\r
1447 among-site rate variation. The options are:
\r
1449 prset ratecorrpr = uniform(<number>,<number>)
\r
1450 prset ratecorrpr = fixed(<number>)
\r
1452 Note that the valid range for the parameter is between -1
\r
1453 and 1. Hence, "prset ratecorrpr=uniform(-1,1)" is valid
\r
1454 while "prset ratecorrpr=uniform(-11,10)" is not. The
\r
1455 default setting is "prset ratecorrpr=uniform(-1,1)".
\r
1456 Covswitchpr -- This option sets the prior for the covarion switching
\r
1457 rates. The options are:
\r
1459 prset covswitchpr = uniform(<number>,<number>)
\r
1460 prset covswitchpr = exponential(<number>)
\r
1461 prset covswitchpr = fixed(<number>,<number>)
\r
1463 The covarion model has two rates: a rate from on to off
\r
1464 and a rate from off to on. The rates are assumed to have
\r
1465 independent priors that individually are either uniformly
\r
1466 or exponentially distributed. The other option is to
\r
1467 fix the switching rates, in which case you must specify
\r
1468 both rates. (The first number is off->on and the second
\r
1470 Symdirihyperpr - This option sets the prior for the stationary frequencies
\r
1471 of the states for morphological (standard) data. There can
\r
1472 be as many as 10 states for standard data. However, the
\r
1473 labelling of the states is somewhat arbitrary. For example,
\r
1474 the state "1" for different characters does not have the
\r
1475 same meaning. This is not true for DNA characters, for ex-
\r
1476 ample, where a "G" has the same meaning across characters.
\r
1477 The fact that the labelling of morphological characters is
\r
1478 arbitrary makes it difficult to allow unequal character-
\r
1479 state frequencies. MrBayes gets around this problem by
\r
1480 assuming that the states have a symmetric Dirichlet prior
\r
1481 (i.e. all Dirichlet parameters are equal). The variation in
\r
1482 the Dirichlet can be controlled by this parameter.
\r
1483 Symdirihyperpr specifies the distribution on the parameter
\r
1484 of the symmetric Dirichlet. The valid options are:
\r
1486 prset Symdirihyperpr = uniform(<number>,<number>)
\r
1487 prset Symdirihyperpr = exponential(<number>)
\r
1488 prset Symdirihyperpr = fixed(<number>)
\r
1489 prset Symdirihyperpr = fixed(infinity)
\r
1491 If "fixed(infinity)" is chosen, the Dirichlet prior is
\r
1492 fixed such that all character states have equal frequency.
\r
1493 Topologypr -- This parameter specifies the prior probabilities of
\r
1494 phylogenies. The options are:
\r
1496 prset topologypr = uniform
\r
1497 prset topologypr = speciestree
\r
1498 prset topologypr = constraints(<list>)
\r
1499 prset topologypr = fixed(<treename>)
\r
1501 If the prior is selected to be "uniform", the default,
\r
1502 then all possible trees are considered a priori equally
\r
1503 probable. The 'speciestree' option is used when the topology
\r
1504 is constrained to fold inside a species tree together with
\r
1505 other (gene) trees. The constraints option allows you to
\r
1506 specify complicated prior probabilities on trees (constraints
\r
1507 are discussed more fully in "help constraint"). Note that
\r
1508 you must specify a list of constraints that you wish to be
\r
1509 obeyed. The list can be either the constraints' name or
\r
1510 number. Finally, you can fix the topology to that of a user
\r
1511 tree defined in a trees block. Branch lengths will still be
\r
1512 sampled as usual on the fixed topology.
\r
1513 Brlenspr -- This parameter specifies the prior probability dist-
\r
1514 ribution on branch lengths. The options are specified using:
\r
1516 prset brlenspr = <setting>
\r
1518 where <setting> is one of
\r
1520 unconstrained:uniform(<num>,<num>)
\r
1521 unconstrained:exponential(<number>)
\r
1522 unconstrained:twoexp(<num>,<num>)
\r
1523 unconstrained:gammadir(<num>,<num>,<num>,<num>)
\r
1524 unconstrained:invgamdir(<num>,<num>,<num>,<num>)
\r
1527 clock:coalescence
\r
1528 clock:fossilization
\r
1529 clock:speciestree
\r
1530 fixed(<treename>)
\r
1532 Trees with unconstrained branch lengths are unrooted
\r
1533 whereas clock-constrained trees are rooted. The option
\r
1534 after the colon specifies the details of the probability
\r
1535 density of branch lengths. If you choose a birth-death
\r
1536 or coalescence prior, you may want to modify the details
\r
1537 of the parameters of those processes (speciation rate,
\r
1538 extinction rate and sample probability for the birth-death
\r
1539 prior; population size and clock rate parameter for the
\r
1540 coalescence prior). When gene trees are constrained to fold
\r
1541 inside species trees, the appropriate branch length prior is
\r
1542 'clock:speciestree'. Under this model, it is possible to
\r
1543 control whether the population size is constant or variable
\r
1544 across the species tree using the 'popvarpr' setting.
\r
1545 Branch lengths can also be fixed but only if the topology is
\r
1548 For unconstrained branch lengths, MrBayes offers five alter-
\r
1549 native prior distributions. The first two are the simple
\r
1550 'uniform' and 'exponential' priors. The 'uniform' prior takes
\r
1551 two parameters, the lower and upper bound of the uniform dis-
\r
1552 tribution, respectively. The 'exponential' prior takes a sin-
\r
1553 gle parameter, the rate of the exponential distribution. The
\r
1554 mean of the exponential distribution is the inverse of the
\r
1555 rate. For instance, an 'exp(10)' distribution has an expected
\r
1557 MrBayes also offers three more complex prior distributions
\r
1558 on unconstrained branch lengths. The two-exponential prior
\r
1559 (Yang and Rannala 2005; Yang 2007) uses two different expo-
\r
1560 nential distributions, one for internal and one for external
\r
1561 branch lengths. The two-exponential prior is invoked using
\r
1562 'twoexp(<r_I>,<r_E>)', where '<r_I>' is a number specifying
\r
1563 the rate of the exponential distribution on internal branch
\r
1564 lengths, while '<r_E>' is the rate for external branch
\r
1565 lengths. The prior mean for internal branch lengths is then
\r
1566 1/r_I, and for external ones is 1/r_E. For instance, to set
\r
1567 prior mean of internal branch lengths to 0.01, and external
\r
1568 ones to 0.1, use 'twoexp(100,10)'.
\r
1569 The setting 'twoexp(10,10)' is equivalent to 'exp(10)'.
\r
1570 The compound Dirichlet priors 'gammadir(<a_T>,<b_T>,<a>,<c>)'
\r
1571 and 'invgamdir(<a_T>,<b_T>,<a>,<c>)' specify a fairly diffuse
\r
1572 prior on tree length 'T', and then partition the tree length
\r
1573 into branch lengths according to a Dirichlet distribution
\r
1574 (Rannala et al. 2012). If 'T' is considered drawn from a
\r
1575 gamma distribution with parameters a_T and b_T, and with mean
\r
1576 a_T/b_T, we recommend setting a_T = 1; if it is instead con-
\r
1577 sidered drawn from an inverse gamma (invgamma) distribution
\r
1578 with parameters a_T and b_T, and with mean b_T/(a_T -1), then
\r
1579 we reccommend setting a_T = 3. In the latter case, b_T should
\r
1580 be chosen so that the prior mean of T is reasonable for the
\r
1581 data. In the former case, setting b_T = 0.1 (corresponding to
\r
1582 a mean tree length of 10) should be appropriate for a wide
\r
1583 range of tree lengths (at least in the interval 1 to 100).
\r
1584 The concentration parameter a of the Dirichlet distribution
\r
1585 is inversely related to the variance of the branch lengths,
\r
1586 while c is the ratio of the prior means for the internal and
\r
1587 external branch lengths. The default setting, a = c = 1,
\r
1588 specifies a uniform Dirichlet distribution of branch lengths
\r
1589 given the tree length. For instance, 'gammadir(1,0.1,1,1)'
\r
1590 specifies a compound Dirichlet prior on branch lengths, where
\r
1591 tree length is associated with a gamma distribution with mean
\r
1592 10, and branch length proportions are associated with a uni-
\r
1593 form Dirichlet distribution (default).
\r
1595 For clock trees with calibrated external nodes (fossils),
\r
1596 MrBayes also offers the fossilized birth-death prior:
\r
1597 'clock:fossilization'.
\r
1598 If 'SampleStrat' is set to 'fossiltip', it assumes that upon
\r
1599 sampling the lineage is dead and won't produce descendants,
\r
1600 meaning each fossil sample is a tip. If 'SampleStrat' is set
\r
1601 to 'random' (default), fossils are sampled serially along the
\r
1602 birth-death tree (Stadler 2010), so they can be tips or an-
\r
1603 cestors. See 'Speciationpr', 'Extinctionpr', 'SampleStrat',
\r
1604 'Fossilizationpr' for more information.
\r
1606 Treeagepr -- This parameter specifies the prior probability distribution
\r
1607 on the tree age when a uniform or fossilization prior is used
\r
1608 on the branch lengths of a clock tree.
\r
1612 prset treeagepr = <setting>
\r
1614 where <setting> is one of
\r
1617 uniform(<min_age>,<max_age>)
\r
1618 offsetexponential(<min_age>,<mean_age>)
\r
1619 truncatednormal(<min_age>,<mean_age>,<st.dev.>)
\r
1620 lognormal(<mean_age>,<st.dev.>)
\r
1621 offsetlognormal(<min_age>,<mean_age>,<st.dev.>)
\r
1622 gamma(<mean_age>,<st.dev.>)
\r
1623 offsetgamma(<min_age>,<mean_age>,<st.dev.>)
\r
1625 These are the same options used for the 'Calibrate' command.
\r
1626 Note that, unlike elsewhere in MrMayes, we always use the
\r
1627 mean and standard deviation of the resulting age distribution
\r
1628 rather than the standard parameterization, if different. This
\r
1629 is to facilitate for the users who want to focus on the in-
\r
1630 formation conveyed about the age. For those who wish to use
\r
1631 the standard parameterization, there are simple conversions
\r
1632 between the two. See the 'Calibrate' command for more infor-
\r
1635 The tree age is simply the age of the most recent common
\r
1636 ancestor of the tree. If the clock rate is fixed to 1.0,
\r
1637 which is the default, the tree age is equivalent to the
\r
1638 expected number of substitutions from the root to the tip of
\r
1639 the tree, that is, tree height. The tree age prior ensures
\r
1640 that the joint probability for the uniform prior (or fossil-
\r
1641 ization prior) model of branch lengths on a clock tree is
\r
1642 proper. The default setting is 'gamma(1,1)'. If the root node
\r
1643 in the tree is calibrated, the root calibration replaces the
\r
1645 Speciationpr -- This parameter sets the prior on the net speciation rate (net
\r
1646 diversification), that is, (lambda - mu) in the birth-death
\r
1647 model and the general case of fossilized birth-death model.
\r
1648 Or, (lambda - mu - psi) in the special case of f-b-d model
\r
1649 (fossiltip). Values of this parameter are > 0. Prior options:
\r
1651 prset speciationpr = uniform(<number>,<number>)
\r
1652 prset speciationpr = exponential(<number>)
\r
1653 prset speciationpr = fixed(<number>)
\r
1655 This parameter is only relevant if the (fossil) birth-death
\r
1656 process is selected as the prior on branch lengths.
\r
1657 Extinctionpr -- This parameter sets the prior on the relative extinction rate
\r
1658 (turnover), that is, (mu / lambda) in the birth-death model
\r
1659 and the general case of fossilized birth-death model.
\r
1660 Or, (mu + psi) / lambda in the special case of f-b-d model
\r
1661 (fossiltip). Values of this parameter are in range (0,1).
\r
1663 prset extinctionpr = beta(<number>,<number>)
\r
1664 prset extinctionpr = fixed(<number>)
\r
1666 This parameter is only relevant if the (fossil) birth-death
\r
1667 process is selected as the prior on branch lengths.
\r
1668 Fossilizationpr -- This parameter sets the prior on the relative fossilization
\r
1669 rate (sampling proportion), psi/(mu+psi), in the fossilized
\r
1670 b-d model. Values of this parameter are in range (0,1).
\r
1671 If SampleStrat is used to divide up time intervals, it sets
\r
1672 the prior for the fossilization parameter in each interval.
\r
1674 prset fossilizationpr = beta(<number>,<number>)
\r
1675 prset fossilizationpr = fixed(<number>)
\r
1677 This parameter is only relevant if the fossilized birth-death
\r
1678 process is selected as the prior on branch lengths.
\r
1679 SampleStrat -- This parameter sets the strategy under which species were
\r
1680 sampled in the analysis. For the birth-death prior, 'birth-
\r
1681 death' (Hohna et al. 2011), three strategies: 'random',
\r
1682 'diversity' and 'cluster' sampling can be used for extant
\r
1683 taxa. No extinct sample (fossil) is allowed in this prior.
\r
1684 For data with extant and extinct samples, use 'prset brlenspr
\r
1685 =clock:fossilization'. (Stadler 2010; Zhang et al. 2015)
\r
1686 For the fossilized birth-death prior, 'fossiltip' assumes
\r
1687 extant taxa are sampled randomly, and extinct taxa (fossils)
\r
1688 are sampled with constant rate and upon sampling the lineage
\r
1689 is dead and won't produce any descendant. So fossils are all
\r
1690 at tips. Except 'fossiltip', the following strategies allow
\r
1691 fossils also being ancestors of other samples.
\r
1692 'random' (default) assumes extant taxa are sampled randomly
\r
1693 with prob rho, while fossils are sampled on the birth-death
\r
1694 tree with piecewise constant rates, psi_i (i = 1,...,s+1).
\r
1695 'diversity' assumes extant taxa are sampled to maximize
\r
1696 diversity, while fossils are sampled randomly.
\r
1697 Time is divided by <s> slice samping events in the past, each
\r
1698 at time <t_i> with probability <rho_i> (s >= 0). If rho_i = 0
\r
1699 the slice is only used to divide up time intervals not for
\r
1700 sampling of fossils. Extant taxa are sampled with prob.
\r
1701 (proportion) rho (set in sampleprob).
\r
1703 prset samplestrat = random
\r
1704 prset samplestrat = diversity
\r
1705 prset samplestrat = cluster
\r
1706 prset samplestrat = fossiltip
\r
1707 prset samplestrat = random <s>:...,<t_i> <rho_i>,...
\r
1708 prset samplestrat = diversity <s>:...,<t_i> <rho_i>,...
\r
1710 Sampleprob -- This parameter sets the fraction of extant species that are
\r
1711 sampled in the analysis. This is used with the birth-death
\r
1712 prior on trees (Yang and Rannala 1997; Stadler 2009; Hohna
\r
1713 et al. 2011), and the fossilized birth-death prior (Stadler
\r
1714 2010, Zhang et al. 2015).
\r
1716 prset sampleprob = <number>
\r
1718 Popsizepr -- This parameter sets the prior on the population size compo-
\r
1719 nent of the coalescent parameter. The options are:
\r
1721 prset popsizepr = uniform(<number>,<number>)
\r
1722 prset popsizepr = lognormal(<number>,<number>)
\r
1723 prset popsizepr = normal(<number>,<number>)
\r
1724 prset popsizepr = gamma(<number>,<number>)
\r
1725 prset popsizepr = fixed(<number>)
\r
1727 This parameter is only relevant if the coalescence process is
\r
1728 selected as the prior on branch lengths. Note that the set-
\r
1729 ting of 'ploidy' in 'lset' is important for how this para-
\r
1730 meter is interpreted.
\r
1731 Popvarpr -- In a gene tree - species tree model, this parameter deter-
\r
1732 mines whether the population size is the same for the entire
\r
1733 species tree ('popvarpr = equal', the default), or varies
\r
1734 across branches of the species tree ('popvarpr=variable').
\r
1735 Nodeagepr -- This parameter specifies the assumptions concerning the age
\r
1736 of the terminal and interior nodes in the tree. The default
\r
1737 model ('nodeagepr = unconstrained') assumes that all terminal
\r
1738 nodes are of the same age while the age of interior nodes is
\r
1739 unconstrained. The alternative ('nodeagepr = calibrated')
\r
1740 option derives a prior probability distribution on terminal
\r
1741 and interior node ages from the calibration settings (see
\r
1742 the 'calibrate' command). The 'nodeagepr' parameter is only
\r
1743 relevant for clock trees.
\r
1744 Clockratepr -- This parameter specifies the prior assumptions concerning the
\r
1745 base substitution rate of the tree, measured in expected num-
\r
1746 ber of substitutions per site per time unit. The default set-
\r
1747 ting is 'Fixed(1.0)', which effectively means that the time
\r
1748 unit is the number of expected substitutions per site.
\r
1749 If you do not have any age calibrations in the tree, you can
\r
1750 still calibrate the tree using 'Clockratepr'. For instance,
\r
1751 if you know that your sequence data evolve at a rate of 0.20
\r
1752 substitutions per million years, you might calibrate the tree
\r
1753 by fixing the substitution rate to 0.20 using
\r
1755 prset clockratepr = fixed(0.20)
\r
1757 after which the tree will be calibrated using millions of
\r
1758 years as the unit.
\r
1760 You can also assign a prior probability distribution to the
\r
1761 substitution rate, accommodating the uncertainty of it.
\r
1762 When you calibrate the nodes, you should properly set this
\r
1763 prior to match the time unit of the calibrations.
\r
1764 You can choose among normal, lognormal, exponential and gamma
\r
1765 distributions for this purpose. For instance, to assign a
\r
1766 normal distribution truncated at 0, so that only positive
\r
1767 values are allowed, and with mean 0.20 and standard deviation
\r
1768 of 0.02, you would use
\r
1770 prset clockratepr = normal(0.20,0.02)
\r
1772 The lognormal distribution is parameterized in terms of the
\r
1773 mean and standard deviation on the log scale (natural logs).
\r
1776 prset clockratepr = lognormal(-1.61,0.10)
\r
1778 specifies a lognormal distribution with a mean of log values
\r
1779 of -1.61 and a standard deviation of log values of 0.10. In
\r
1780 such a case, the mean value of the lognormal distribution is
\r
1781 equal to e^(-1.61 + 0.10^2/2) = 0.20.
\r
1783 Note that the 'Clockratepr' parameter has no effect on non-
\r
1785 Clockvarpr -- This parameter allows you to specify the type of clock you
\r
1786 are assuming. The default is 'strict', which corresponds to
\r
1787 the standard clock model where the evolutionary rate is
\r
1788 constant throughout the tree. For relaxed clock models, you
\r
1789 can use 'cpp', 'tk02', 'igr'. ('mixed' is not working)
\r
1790 'cpp' invokes a relaxed clock model where the rate evolves
\r
1791 according to a Compound Poisson Process (CPP) (Huelsenbeck
\r
1793 'tk02' invokes the Brownian Motion model described by Thorne
\r
1794 and Kishino (2002). [autocorrelated lognormal distributions]
\r
1795 'igr' invokes the Independent Gamma Rate (IGR) model where
\r
1796 each branch has an independent rate drawn from a gamma
\r
1797 distribution (LePage et al., 2007).
\r
1798 Each of the relaxed clock models has additional parameters
\r
1799 with priors. For the CPP model, it is 'cppratepr' and
\r
1800 'cppmultdevpr'; for the TK02 model, it is 'tk02varpr'; for
\r
1801 the IGR model, it is 'igrvarpr'.
\r
1802 The 'clockvarpr' parameter is only relevant for clock trees.
\r
1804 For backward compatibility, 'bm' is allowed as a synonym of
\r
1805 'tk02', and 'ibr' as a synonym of 'igr'.
\r
1806 Cppratepr -- This parameter allows you to specify a prior probability
\r
1807 distribution on the rate of the Poisson process generating
\r
1808 changes in the evolutionary rate in the CPP relaxed clock
\r
1809 model. You can either fix the rate or associate it with an
\r
1810 exponential prior using
\r
1812 prset cppratepr = fixed(<number>)
\r
1813 prset cppratepr = exponential(<number>)
\r
1815 For instance, if you fix the rate to 2, then on a branch
\r
1816 with the length equual to one expresed in terms of average
\r
1817 expected number of substitution per site, you expect to see,
\r
1818 on average, two rate-modifying events.
\r
1819 If you put an exponential(0.1) on the rate, you will be
\r
1820 estimating the rate against a prior probability distribution
\r
1821 where the expected rate is 10 (= 1/0.1).
\r
1822 Cppmultdevpr -- This parameter allows you to specify the standard deviation
\r
1823 of the log-normal distribution from which the rate multi-
\r
1824 pliers of the CPP relaxed clock model are drawn. The standard
\r
1825 deviation is given on the log scale. The default value of 1.0
\r
1826 thus corresponds to rate multipliers varying from 0.37 (1/e)
\r
1827 to 2.7 (e) when they are +/- one standard deviation from the
\r
1828 expected mean. The expected mean of the logarithm of the mul-
\r
1829 pliers is fixed to 0, ensuring that the expected mean rate is
\r
1830 1.0. You can change the default value by using
\r
1832 prset cppmultdevpr = fixed(<number>)
\r
1834 where <number> is the standard deviation on the log scale.
\r
1835 TK02varpr -- This parameter allows you to specify the prior probability
\r
1836 distribution for the variance of the rate multiplier in the
\r
1837 Thorne-Kishino ('Brownian motion') relaxed clock model.
\r
1838 Specifically, the parameter specifies the rate at which the
\r
1839 variance increases with respect to the base rate of the
\r
1840 clock. If you have a branch of a length corresponding to 0.4
\r
1841 expected changes per site according to the base rate of the
\r
1842 clock, and the tk02var parameter has a value of 2.0, then the
\r
1843 rate multiplier at the end of the branch will be drawn from a
\r
1844 lognormal distribution with a variance of 0.4*2.0 (on the
\r
1845 linear, not the logarithm scale). The mean is the same as the
\r
1846 rate multiplier at the start of the branch (again on the
\r
1849 You can set the parameter to a fixed value, or specify that
\r
1850 it is drawn from an exponential or uniform distribution:
\r
1852 prset tk02varpr = fixed(<number>)
\r
1853 prset tk02varpr = exponential(<number>)
\r
1854 prset tk02varpr = uniform(<number>,<number>)
\r
1856 For backward compatibility, 'bmvarpr' is allowed as a synonym
\r
1858 Igrvarpr -- This parameter allows you to specify a prior on the variance
\r
1859 of the gamma distribution from which the branch lengths are
\r
1860 drawn in the independent branch rate (IGR) relaxed clock
\r
1861 model. Specifically, the parameter specifies the rate at
\r
1862 which the variance increases with respect to the base rate of
\r
1863 the clock. If you have a branch of a length corresponding to
\r
1864 0.4 expected changes per site according to the base rate of
\r
1865 the clock, and the igrvar parameter has a value of 2.0, then
\r
1866 the effective branch length will be drawn from a distribution
\r
1867 with a variance of 0.4*2.0.
\r
1869 You can set the parameter to a fixed value, or specify that
\r
1870 it is drawn from an exponential or uniform distribution:
\r
1872 prset igrvarpr = fixed(<number>)
\r
1873 prset igrvarpr = exponential(<number>)
\r
1874 prset igrvarpr = uniform(<number>,<number>)
\r
1876 For backward compatibility, 'ibrvarpr' is allowed as a syn-
\r
1877 onym of 'igrvarpr'.
\r
1878 Ratepr -- This parameter allows you to specify the site specific rates
\r
1879 model or any other model that allows different partitions to
\r
1880 evolve at different rates. First, you must have defined a
\r
1881 partition of the characters. For example, you may define a
\r
1882 partition that divides the characters by codon position, if
\r
1883 you have DNA data. You can also divide your data using a
\r
1884 partition that separates different genes from each other.
\r
1885 The next step is to make the desired partition the active one
\r
1886 using the set command. For example, if your partition is
\r
1887 called "by_codon", then you make that the active partition
\r
1888 using "set partition=by_codon". Now that you have defined
\r
1889 and activated a partition, you can specify the rate multi-
\r
1890 pliers for the various partitions. The options are:
\r
1892 prset ratepr = fixed
\r
1893 prset ratepr = variable
\r
1894 prset ratepr = dirichlet(<number>,<number>,...,<number>)
\r
1896 If you specify "fixed", then the rate multiplier for
\r
1897 that partition is set to 1 (i.e., the rate is fixed to
\r
1898 the average rate across partitions). On the other hand,
\r
1899 if you specify "variable", then the rate is allowed to
\r
1900 vary across partitions subject to the constraint that the
\r
1901 average rate of substitution across the partitions is 1.
\r
1902 You must specify a variable rate prior for at least two
\r
1903 partitions, otherwise the option is not activated when
\r
1904 calculating likelihoods. The variable option automatically
\r
1905 associates the partition rates with a dirichlet(1,...,1)
\r
1906 prior. The dirichlet option is an alternative way of setting
\r
1907 a partition rate to be variable, and also gives accurate
\r
1908 control of the shape of the prior. The parameters of the
\r
1909 Dirichlet are listed in the order of the partitions that the
\r
1910 ratepr is applied to. For instance, "prset applyto=(1,3,4)
\r
1911 ratepr = dirichlet(10,40,15)" would set the Dirichlet para-
\r
1912 meter 10 to partition 1, 40 to partition 3, and 15 to parti-
\r
1913 tion 4. The Dirichlet distribution is applied to the weighted
\r
1914 rates; that is, it weights the partition rates according to
\r
1915 the number of included characters in each partition.
\r
1916 Generatepr -- This parameter is similar to 'Ratepr' but applies to gene
\r
1917 trees in the multispecies coalescent, whereas 'Ratepr' app-
\r
1918 lies to partitions within genes.
\r
1920 Default model settings:
\r
1922 Parameter Options Current Setting
\r
1923 ------------------------------------------------------------------
\r
1924 Tratiopr Beta/Fixed Beta(1.0,1.0)
\r
1925 Revmatpr Dirichlet/Fixed Dirichlet(1.0,1.0,1.0,1.0,1.0,1.0)
\r
1926 Aamodelpr Fixed/Mixed Fixed(Poisson)
\r
1927 Aarevmatpr Dirichlet/Fixed Dirichlet(1.0,1.0,...)
\r
1928 Omegapr Dirichlet/Fixed Dirichlet(1.0,1.0)
\r
1929 Ny98omega1pr Beta/Fixed Beta(1.0,1.0)
\r
1930 Ny98omega3pr Uniform/Exponential/Fixed Exponential(1.0)
\r
1931 M3omegapr Exponential/Fixed Exponential
\r
1932 Codoncatfreqs Dirichlet/Fixed Dirichlet(1.0,1.0,1.0)
\r
1933 Statefreqpr Dirichlet/Fixed Dirichlet(1.0,1.0,1.0,1.0)
\r
1934 Shapepr Uniform/Exponential/Fixed Exponential(1.0)
\r
1935 Ratecorrpr Uniform/Fixed Uniform(-1.0,1.0)
\r
1936 Pinvarpr Uniform/Fixed Uniform(0.0,1.0)
\r
1937 Covswitchpr Uniform/Exponential/Fixed Uniform(0.0,100.0)
\r
1938 Symdirihyperpr Uniform/Exponential/Fixed Fixed(Infinity)
\r
1939 Topologypr Uniform/Constraints/Fixed/ Uniform
\r
1941 Brlenspr Unconstrained/Clock/Fixed Unconstrained:GammaDir(1.0,0.100,1.0,1.0)
\r
1942 Treeagepr Gamma/Uniform/Fixed/ Gamma(1.00,1.00)
\r
1943 Truncatednormal/Lognormal/
\r
1944 Offsetlognormal/Offsetgamma/
\r
1945 Offsetexponential
\r
1946 Speciationpr Uniform/Exponential/Fixed Exponential(10.0)
\r
1947 Extinctionpr Beta/Fixed Beta(1.0,1.0)
\r
1948 Fossilizationpr Beta/Fixed Beta(1.0,1.0)
\r
1949 SampleStrat Random/Diversity/Cluster/ Random
\r
1951 Sampleprob <number> 1.00000000
\r
1952 Popsizepr Lognormal/Gamma/Uniform/ Gamma(1.0,10.0)
\r
1954 Popvarpr Equal/Variable Equal
\r
1955 Nodeagepr Unconstrained/Calibrated Unconstrained
\r
1956 Clockratepr Fixed/Normal/Lognormal/ Fixed(1.00)
\r
1957 Exponential/Gamma
\r
1958 Clockvarpr Strict/Cpp/TK02/Igr/Mixed Strict
\r
1959 Cppratepr Fixed/Exponential Exponential(0.10)
\r
1960 Cppmultdevpr Fixed Fixed(0.40)
\r
1961 TK02varpr Fixed/Exponential/Uniform Exponential(1.00)
\r
1962 Igrvarpr Fixed/Exponential/Uniform Exponential(10.00)
\r
1963 Ratepr Fixed/Variable=Dirichlet Fixed
\r
1964 Generatepr Fixed/Variable=Dirichlet Fixed
\r
1965 ------------------------------------------------------------------
\r
1967 ---------------------------------------------------------------------------
\r
1970 This command allows the user to change the details of the MCMC samplers
\r
1971 (moves) that update the state of the chain. The useage is:
\r
1973 propset <move_name>$<tuning-parameter>=<value>
\r
1975 Assume we have a topology parameter called 'Tau{all}', which is sampled by
\r
1976 the move 'ExtTBR(Tau{all})' (note that the parameter name is included in the
\r
1977 move name). This move has three tuning parameters: (1) 'prob', the relative
\r
1978 proposal probability (a weight defining its probability relative to other
\r
1979 moves); (2) 'p_ext', the extension probability; and (3) 'lambda', the tuning
\r
1980 parameter of the branch length multiplier. A list of the tuning parameters is
\r
1981 available by using 'Showmoves' (see below). To change the relative proposal
\r
1982 probability to 20 and the extension probability to 0.7, use:
\r
1984 propset etbr(tau{all})$prob=20 etbr(tau{all})$p_ext=0.7
\r
1986 This change would apply to all chains in all runs. It is also possible to set
\r
1987 the tuning parameters of individual runs and chains using the format:
\r
1989 propset <move_name>$<tuning-parameter>(<run>,<chain>)=<value>
\r
1991 where <run> and <chain> are the index numbers of the run and chain for which
\r
1992 you want to change the value. If you leave out the index of the run, the
\r
1993 change will apply to all runs; if you leave out the index of the chain, the
\r
1994 change will similarly apply to all chains. To switch off the exttbr(tau{all})
\r
1995 move in chain 2 of all runs, use:
\r
1997 propset etbr(tau{all})$prob(,2)=0
\r
1999 It is important to note that all moves are not available until the model has
\r
2000 been completely defined. Any change to the model will cause all proposal
\r
2001 tuning parameters to return to their default values. To see a list of all the
\r
2002 moves that are currently switched on for the model, use 'showmoves'. You can
\r
2003 also see other available moves by using 'showmoves allavailable=yes'. A list
\r
2004 of the moves for each parameter in the model is available by using the command
\r
2005 'Showparams'. If you change proposal probabilities, make sure that all
\r
2006 parameters that are not fixed in your model have at least one move switched
\r
2009 One word of warning: You should be extremely careful when modifying any
\r
2010 of the chain parameters using 'propset'. It is quite possible to completely
\r
2011 wreck any hope of achieving convergence by inappropriately setting the
\r
2012 tuning parameters. In general, you want to set move tuning parameters such
\r
2013 that the acceptance rate of the move is intermediate (we suggest targeting
\r
2014 the range 10% to 70% acceptance, if possible). If the acceptance rate is
\r
2015 outside of this range, the MCMC chain will probably not sample that parameter
\r
2016 very efficiently. The acceptance rates for all moves in the cold chain(s) are
\r
2017 summarized at the end of each run in the screen output. The acceptance rates
\r
2018 (potentially for all chains, cold and heated) are also printed to the .mcmc
\r
2019 file if Mcmc convergence diagnostics are turned on (using 'Mcmc' or 'Mcmcp').
\r
2020 ---------------------------------------------------------------------------
\r
2021 ---------------------------------------------------------------------------
\r
2024 This command quits the program. The correct usage is:
\r
2028 It is a very easy command to use properly.
\r
2029 ---------------------------------------------------------------------------
\r
2030 ---------------------------------------------------------------------------
\r
2033 This command allows you to control how the posterior distribution is
\r
2034 reported. For rate parameters, it allows you to choose among several popular
\r
2035 parameterizations. The report command also allows you to request printing of
\r
2036 some model aspects that are usually not reported. For instance, if a node is
\r
2037 constrained in the analysis, MrBayes can print the probabilities of the
\r
2038 ancestral states at that node. Similarly, if there is rate variation in the
\r
2039 model, MrBayes can print the inferred site rates, and if there is omega varia-
\r
2040 tion, MrBayes can print the inferred omega (positive selection) values for
\r
2041 each codon. In a complex model with several partitions, each partition is
\r
2042 controlled separately using the same 'Applyto' mechanism as in the 'Lset' and
\r
2043 'Prset' commands.
\r
2047 Applyto -- This option allows you to apply the report commands to specific
\r
2048 partitions. This command should be the first in the list of
\r
2049 commands specified in 'report'.
\r
2052 report applyto=(1,2) tratio=ratio
\r
2054 report applyto=(3) tratio=dirichlet
\r
2056 would result in the transition and transversion rates of the
\r
2057 first and second partitions in the model being reported as a
\r
2058 ratio and the transition and transversion rates of the third
\r
2059 partition being reported as proportions of the rate sum (the
\r
2060 Dirichlet parameterization).
\r
2061 Tratio -- This specifies the report format for the transition and trans-
\r
2062 version rates of a nucleotide substituion model with nst=2.
\r
2063 If 'ratio' is selected, the rates will be reported as a ratio
\r
2064 (transition rate/transversion rate). If 'dirichlet' is selected,
\r
2065 the transition and transversion rates will instead be reported
\r
2066 as proportions of the rate sum. For example, if the transition
\r
2067 rate is three times the transversion rate and 'ratio' is selec-
\r
2068 ted, this will reported as a single value, '3.0'. If 'dirichlet'
\r
2069 is selected instead, the same rates will be reported using two
\r
2070 values, '0.75 0.25'. The sum of the Dirichlet values is always 1.
\r
2071 Although the Dirichlet format may be unfamiliar to some users,
\r
2072 it is more convenient for specifying priors than the ratio
\r
2074 Revmat -- This specifies the report format for the substitution rates of
\r
2075 a GTR substitution model for nucleotide or amino acid data. If
\r
2076 'ratio' is selected, the rates will be reported scaled to the
\r
2077 G-T rate (for nucleotides) or the Y-V rate (for amino acids). If
\r
2078 'dirichlet' is specified instead, the rates are reported as pro-
\r
2079 portions of the rate sum. For instance, assume that the C-T rate
\r
2080 is twice the A-G rate and four times the transversion rates,
\r
2081 which are equal. If the report format is set to 'ratio', this
\r
2082 would be reported as '1.0 2.0 1.0 1.0 4.0 1.0' since the rates
\r
2083 are reported in the order rAC, rAG, rAT, rCG, rCT, rGT and scaled
\r
2084 relative to the last rate, the G-T rate. If 'dirichlet' is selec-
\r
2085 ted instead, the same rates would have been reported as '0.1 0.2
\r
2086 0.1 0.1 0.4 0.1' since the rates are now scaled so that they sum
\r
2087 to 1.0. The Dirichlet format is the parameterization used for
\r
2088 formulating priors on the rates.
\r
2089 Ratemult -- This specifies the report format used for the rate multiplier of
\r
2090 different model partitions. Three formats are available. If
\r
2091 'scaled' is selected, then rates are scaled such that the mean
\r
2092 rate per site across partitions is 1.0. If 'ratio' is chosen,
\r
2093 the rates are scaled relative to the rate of the first parti-
\r
2094 tion. Finally, if 'dirichlet' is chosen, the rates are given as
\r
2095 proportions of the rate sum. The latter is the format used
\r
2096 when formulating priors on the rate multiplier.
\r
2097 Tree -- This specifies the report format used for the tree(s). Two op-
\r
2098 tions are available. 'Topology' results in only the topology
\r
2099 being printed to file, whereas 'brlens' causes branch lengths to
\r
2100 to be printed as well.
\r
2101 Ancstates -- If this option is set to 'yes', MrBayes will print the pro-
\r
2102 bability of the ancestral states at all constrained nodes. Typ-
\r
2103 ically, you are interested in the ancestral states of only a few
\r
2104 characters and only at one node in the tree. To perform such
\r
2105 an analysis, first define and enforce a topology constraint
\r
2106 using 'constraint' and 'prset topologypr = constraints (...)'.
\r
2107 Then put the character(s) of interest in a separate partition and
\r
2108 set MrBayes to report the ancestral states for that partition.
\r
2109 For instance, if the characters of interest are in partition 2,
\r
2110 use 'report applyto=(2) ancstates=yes' to force MrBayes to print
\r
2111 the probability of the ancestral states of those characters at
\r
2112 the constrained node to the '.p' file.
\r
2113 Siterates -- If this option is set to 'yes' and the relevant model has rate
\r
2114 variation across sites, then the site rates, weighted over rate
\r
2115 categories, will be reported to the '.p' file.
\r
2116 Possel -- If this option is set to 'yes' and the relevant model has omega
\r
2117 variation across sites, the probability that each model site
\r
2118 (codon in this case) is positively selected will be written to
\r
2120 Siteomega -- If this option is set to 'yes' and the relevant model has omega
\r
2121 variation across sites, the weighted omega value (over omega
\r
2122 categories) for each model site will be reported to file.
\r
2124 Default report settings:
\r
2126 Parameter Options Current Setting
\r
2127 --------------------------------------------------------
\r
2128 Tratio Ratio/Dirichlet Ratio
\r
2129 Revmat Ratio/Dirichlet Dirichlet
\r
2130 Ratemult Scaled/Ratio/Dirichlet Scaled
\r
2131 Tree Brlens/Topology Brlens
\r
2132 Ancstates Yes/No No
\r
2133 Siterates Yes/No No
\r
2135 Siteomega Yes/No No
\r
2137 ------------------------------------------------------------------
\r
2138 ---------------------------------------------------------------------------
\r
2141 This command restores taxa to the analysis. The correct usage is:
\r
2143 restore <name and/or number and/or taxset> ...
\r
2145 A list of the taxon names or taxon numbers (labelled 1 to ntax in the order
\r
2146 in the matrix) or taxset(s) can be used. For example, the following:
\r
2148 restore 1 2 Homo_sapiens
\r
2150 restores taxa 1, 2, and the taxon labelled Homo_sapiens to the analysis.
\r
2151 You can also use "all" to restore all of the taxa. For example,
\r
2155 restores all of the taxa to the analysis.
\r
2156 ---------------------------------------------------------------------------
\r
2157 ---------------------------------------------------------------------------
\r
2160 This command is used to set some general features of the model or program
\r
2161 behavior. The correct usage is
\r
2163 set <parameter>=<value> ... <parameter>=<value>
\r
2165 Available options:
\r
2167 Seed -- Sets the seed number for the random number generator. The
\r
2168 random number seed is initialized haphazardly at the beg-
\r
2169 inning of each MrBayes session. This option allows you to
\r
2170 set the seed to some specific value, thereby allowing you
\r
2171 to exactly repeat an analysis. If the analysis uses swapping
\r
2172 between cold and heated chains, you must also set the swap
\r
2173 seed (see below) to exactly repeat the analysis.
\r
2174 Swapseed -- Sets the seed used for generating the swapping sequence
\r
2175 when Metropolis-coupled heated chains are used. This seed
\r
2176 is initialized haphazardly at the beginning of each MrBayes
\r
2177 session. This option allows you to set the seed to some
\r
2178 specific value, thereby allowing you to exactly repeat a
\r
2179 swap sequence. See also the 'Seed' option.
\r
2180 Dir -- The working directory. Specifies the absolute or relative path
\r
2181 to the working directory. If left empty, the working directory
\r
2182 is the current directory.
\r
2183 Partition -- Set this option to a valid partition id, either the number or
\r
2184 name of a defined partition, to enforce a specific partition-
\r
2185 ing of the data. When a data matrix is read in, a partition
\r
2186 called "Default" is automatically created. It divides the
\r
2187 data into one part for each data type. If you only have one
\r
2188 data type, DNA for instance, the default partition will not
\r
2189 divide up the data at all. The default partition is always
\r
2190 the first partition, so 'set partition=1' is the same as
\r
2191 'set partition=default'.
\r
2192 Speciespartition -- Set this option to a valid speciespartition id, either the
\r
2193 number or name of a defined speciespartition, to enforce a
\r
2194 specific partitioning of taxa to species. When a data matrix
\r
2195 is read in, a speciespartition called "Default" is auto-
\r
2196 matically created. It assigns one taxon for each species. The
\r
2197 default speciespartition is always the first speciespartition,
\r
2198 so 'set speciespartition=1' is the same as
\r
2199 'set speciespartition=default'.
\r
2200 Autoclose -- If autoclose is set to 'yes', then the program will not prompt
\r
2201 you during the course of executing a file. This is particular-
\r
2202 ly useful when you run MrBayes in batch mode.
\r
2203 Nowarnings -- If nowarnings is set to yes, then the program will not prompt
\r
2204 you when overwriting or appending an ouput file that is al-
\r
2205 ready present. If 'nowarnings=no' (the default setting), then
\r
2206 the program propts the user before overwriting output files.
\r
2207 Autoreplace -- When nowarnings is set to yes, then MrBayes will by default
\r
2208 overwrite output files that already exists. This may cause
\r
2209 irrecoverable loss of previous results if you have not removed
\r
2210 or renamed the files from previous runs. To override this be-
\r
2211 havior, set autoreplace to no, in which case new output will
\r
2212 be appended to existing files instead.
\r
2213 Quitonerror -- If quitonerror is set to yes, then the program will quit when
\r
2214 an error is encountered, after printing an error message. If
\r
2215 quitonerror is set to no (the default setting), then the
\r
2216 program will wait for additional commands from the command
\r
2217 line after the error message is printed.
\r
2218 Scientific -- Set this option to 'Yes' to write sampled values to file in
\r
2219 scientific format and to 'No' to write them in fixed format.
\r
2220 Fixed format is easier for humans to read but you risk losing
\r
2221 precision for small numbers. For instance, sampled values that
\r
2222 are less than 1E-6 will print to file as '0.000000' if fixed
\r
2223 format is used and 'precision' is set to 6.
\r
2224 Precision -- Precision allows you to set the number of decimals to be prin-
\r
2225 ted when sampled values are written to file. Precision must be
\r
2226 in the range 3 to 15.
\r
2227 Usebeagle -- Set this option to 'Yes' to attempt to use the BEAGLE library
\r
2228 to compute the phylogenetic likelihood on a variety of high-
\r
2229 performance hardware including multicore CPUs and GPUs. Some
\r
2230 models in MrBayes are not yet supported by BEAGLE.
\r
2231 Beagleresource -- Set this option to the number of a specific resource you
\r
2232 wish to use with BEAGLE (use 'Showbeagle' to see the list of
\r
2233 available resources). Set to '99' for auto-resource selection.
\r
2234 Beagledevice -- Set this option to 'GPU' or 'CPU' to select processor.
\r
2235 Beagleprecision -- Selection 'Single' or 'Double' precision computation.
\r
2236 Beaglescaling -- 'Always' rescales partial likelihoods at each evaluation.
\r
2237 'Dynamic' rescales less frequently and should run faster.
\r
2238 Beaglesse -- Use SSE instructions on Intel CPU processors.
\r
2239 Beagleopenmp -- Use OpenMP to parallelize across multi-core CPU processors.
\r
2241 Current settings:
\r
2243 Parameter Options Current Setting
\r
2244 --------------------------------------------------------
\r
2245 Seed <number> 1448443295
\r
2246 Swapseed <number> 1448443295
\r
2248 Partition <name> ""
\r
2249 Speciespartition <name> ""
\r
2250 Autoclose Yes/No No
\r
2251 Nowarnings Yes/No No
\r
2252 Autoreplace Yes/No Yes
\r
2253 Quitonerror Yes/No No
\r
2254 Scientific Yes/No Yes
\r
2255 Precision <number> 6
\r
2256 Usebeagle Yes/No No
\r
2257 Beagleresource <number> 99
\r
2258 Beagledevice CPU/GPU CPU
\r
2259 Beagleprecision Single/Double Double
\r
2260 Beaglescaling Always/Dynamic Always
\r
2261 Beaglesse Yes/No No
\r
2262 Beagleopenmp Yes/No No
\r
2264 ---------------------------------------------------------------------------
\r
2265 ---------------------------------------------------------------------------
\r
2268 This command shows available BEAGLE resources.
\r
2269 ---------------------------------------------------------------------------
\r
2270 ---------------------------------------------------------------------------
\r
2273 This command shows the character matrix currently in memory.
\r
2274 ---------------------------------------------------------------------------
\r
2275 ---------------------------------------------------------------------------
\r
2278 This command shows the current trees used by the Markov chains.
\r
2279 is "showmcmctrees".
\r
2280 ---------------------------------------------------------------------------
\r
2281 ---------------------------------------------------------------------------
\r
2284 This command shows the current model settings. The correct usage is
\r
2288 After typing "showmodel", the modelling assumptions are shown on a
\r
2289 partition-by-partition basis.
\r
2290 ---------------------------------------------------------------------------
\r
2291 ---------------------------------------------------------------------------
\r
2294 This command shows the MCMC samplers (moves) that are switched on for the
\r
2295 parameters in the current model. The basic usage is
\r
2299 If you want to see all available moves, use
\r
2301 showmoves allavailable=yes
\r
2303 If you want to change any of the tuning parameters for the moves, use the
\r
2304 'propset' command.
\r
2305 ---------------------------------------------------------------------------
\r
2306 ---------------------------------------------------------------------------
\r
2309 This command shows all of the parameters in the current model. The basic
\r
2314 The parameters are listed together with their priors, the available moves,
\r
2315 and the current value(s), which will be used as the starting values in the
\r
2316 next mcmc analysis.
\r
2317 ---------------------------------------------------------------------------
\r
2318 ---------------------------------------------------------------------------
\r
2321 This command shows the currently defined user trees. The correct usage
\r
2322 is "showusertrees".
\r
2323 ---------------------------------------------------------------------------
\r
2324 ---------------------------------------------------------------------------
\r
2327 Defines a partition of tips into species. The format for the speciespartition
\r
2330 Speciespartition <name> = <species name>:<taxon list> ,...,<sp nm>:<tx lst>
\r
2332 The command enumerates comma separated list of pairs consisting of 'species
\r
2333 name' and 'taxon list'. The 'taxon list' is a standard taxon list, as used by
\r
2334 the 'Taxset' command. This means that you can use either the index or the name
\r
2335 of a sequence ('taxon'). Ranges are specified using a dash, and a period can
\r
2336 be used as a synonym of the last sequence in the matrix.
\r
2338 For exammple: speciespartition species = SpeciesA: 1, SpeciesB: 2-.
\r
2339 Here, we name two species. SpeciesA is represented by a single sequence while
\r
2340 SpeciesB is represented by all remaining sequences in the matrix.
\r
2341 Each sequence is specified by its row index in the data matrix.
\r
2343 As with ordinary partitioning you may define multiple species partitioning
\r
2344 scheme. You have to use command 'set speciespartition' to enable use of one of
\r
2347 Currently defined Speciespartitions:
\r
2349 Number Speciespartition name Number of species
\r
2350 --------------------------------------------------------------------------
\r
2352 --------------------------------------------------------------------------
\r
2353 ---------------------------------------------------------------------------
\r
2356 This command is used to start stepping-stone sampling, which is an efficient
\r
2357 and accurate method for estimating the marginal likelihood of the currently
\r
2358 specified model. It is considerably more accurate than the harmonic mean of
\r
2359 the likelihoods from a standard MCMC run on the model (calculated by the
\r
2360 'Sump' command) but it requires a separate MCMC-like run. To be more specific,
\r
2361 stepping-stone sampling uses importance sampling to estimate each ratio in a
\r
2362 series of discrete steps bridging the posterior and prior distributions.
\r
2363 The importance distributions that are used are called power posterior distri-
\r
2364 butions, and are defined as prior*(likelihood^beta). By varying beta from 1 to
\r
2365 0, we get a series of distributions that connect the posterior (beta = 1) to
\r
2366 the prior (beta = 0).
\r
2368 The power posterior distributions are sampled using MCMC. First, we start a
\r
2369 standard MCMC chain on the posterior distribution, and let it run until we
\r
2370 have reached the criterion specified by the 'Burninss' option. After this, we
\r
2371 step through the power posterior distributions until we reach the prior dis-
\r
2372 tribution. In each of the 'Nsteps' steps, we sample from a new power poster-
\r
2373 ior distribution with a distinct beta value. The beta values correspond to
\r
2374 'Nsteps' evenly spaced quantiles in a Beta distribution with the parameters
\r
2375 'Alpha' and 1.0. For the first sampling step, the beta value is equal to the
\r
2376 last quantile, i.e., it is close to 1.0. For each successive step, the beta
\r
2377 value takes on the value of the next quantile, in decreasing order, until it
\r
2378 reaches the value of 0.0. If you change value of 'FromPrior' from default 'No'
\r
2379 to 'Yes' then the direction of power posterior change during SS analizes is
\r
2380 opposite to the one described above, i.e. we start from sampling prior and
\r
2381 finish close to posterior.
\r
2383 The 'Ss' procedure uses the same machinery as the standard 'Mcmc' algorithm,
\r
2384 and shares most of its parameters with the 'Mcmc' and 'Mcmcp' commands. All
\r
2385 'Mcmc' parameters, except those related to burnin, have the same meaning and
\r
2386 usage in the 'Ss' command as they have in the 'Mcmc' command. The 'Mcmc'
\r
2387 burnin parameters are used to set up burnin within each step. The 'Ss' command
\r
2388 also uses its own burnin parameter, 'Burninss' (see below for details). The
\r
2389 'Ss' command also has its own parameters for specifying the number of steps
\r
2390 and the shape of the Beta distribution from which the beta values are computed
\r
2393 Note that the 'Ngen' parameter of 'Mcmc' is used to set the maximum number of
\r
2394 generations processed, including both the burnin and the following steps in
\r
2395 the stepping-stone sampling phase. For instance, assume that 'Burninss' is set
\r
2396 to '-1', 'Nsteps' to '49', 'Ngen' to '1000000' and 'Samplefreq' to '1000'.
\r
2397 We will then get 1,000 samples in total (1,000,000 / 1,000). These will fall
\r
2398 into 50 bins, one of which represents the burnin and is discarded. Each step
\r
2399 in the algorithm will thus be represented by 20 samples.
\r
2401 More information on 'Mcmc' parameters is available in the help for the 'Mcmc'
\r
2402 and 'Mcmcp' commands. Only the exclusive 'Ss' parameters are listed below.
\r
2403 These can only be set up using the 'Ss' command, while the parameters shared
\r
2404 with 'Mcmc' and 'Mcmcp' can also be set up using those commands.
\r
2406 The correct usage is
\r
2408 ss <parameter>=<value> ... <parameter>=<value>
\r
2410 Note that a command:
\r
2412 ss <setting parameters shared with mcmc> <setting exclusive ss parameters>
\r
2414 would be equivalent to executing two commands:
\r
2416 mcmcp <setting parameters shared with mcmc>;
\r
2417 ss <setting exclusive ss parameters>;
\r
2419 For more information on the stepping-stone algorithm, see:
\r
2421 Xie, W., P. O. Lewis, Y. Fan, L. Kuo, and M.-H. Chen. 2011. Improving marginal
\r
2422 likelihood estimation for Bayesian phylogenetic model selection. Systematic
\r
2423 Biology 60:150-160.
\r
2425 Available options:
\r
2426 (NB: Only exclusive ss parameters listed here. For additional parameters, see
\r
2427 help on 'mcmc' or 'mcmcp'.
\r
2429 Alpha -- The beta values used in the stepping-stone sampling procedure
\r
2430 correspond to evenly spaced quantiles from a Beta('Alpha',1.0)
\r
2431 distribution. The parameter 'Alpha' determines the skewness of
\r
2432 the beta values. If 'Alpha' is set to '1.0', the beta values
\r
2433 would be spaced uniformly on the interval (0.0,1.0). However,
\r
2434 better results are obtained if the beta values are skewed.
\r
2435 Empirically, it was observed that 'Alpha' values in the range
\r
2436 of 0.3 to 0.5 produce the most accurate results.
\r
2437 Burninss -- Fixed number of samples discarded before sampling of the first
\r
2438 step starts. 'Burninss' can be specified using either a pos-
\r
2439 itive or a negative number. If the number is positive, it is
\r
2440 interpreted as the number of samples to discard as burnin. If
\r
2441 the number is negative, its absolute value is interpreted as
\r
2442 the length of the burnin in terms of the length of each of the
\r
2443 following steps in the stepping-stone algorithm. For instance,
\r
2444 a value of '-1' means that the length of the burnin is the
\r
2445 same as the length of each of the subsequent steps.
\r
2446 Nsteps -- Number of steps in the stepping-stone algorithm. Typically, a
\r
2447 number above 30 is sufficient for accurate results.
\r
2448 FromPrior -- If it is set to 'Yes', it indicates that in the first step we
\r
2449 sample from the prior, with each consequtive step we sample
\r
2450 closer to the posterior. 'No' indicates the opposite direction
\r
2451 of power posterior change, i.e. in the first step we sample
\r
2452 close to the posterior, and with each consequtive step we
\r
2453 sample closer to the prior.
\r
2455 Current settings:
\r
2457 Parameter Options Current Setting
\r
2458 --------------------------------------------------------
\r
2459 Alpha <number> 0.40
\r
2460 BurninSS <number> -1
\r
2461 Nsteps <number> 50
\r
2462 FromPrior Yes/No No
\r
2464 ---------------------------------------------------------------------------
\r
2465 ---------------------------------------------------------------------------
\r
2468 This command sets the parameters of the stepping-stone sampling
\r
2469 analysis without actually starting the chain. This command is identical
\r
2470 in all respects to Ss, except that the analysis will not start after
\r
2471 this command is issued. For more details on the options, check the help
\r
2474 Current settings:
\r
2476 Parameter Options Current Setting
\r
2477 --------------------------------------------------------
\r
2478 Alpha <number> 0.40
\r
2479 BurninSS <number> -1
\r
2480 Nsteps <number> 50
\r
2481 FromPrior Yes/No No
\r
2483 ---------------------------------------------------------------------------
\r
2484 ---------------------------------------------------------------------------
\r
2487 Use this command to change the current values for parameters in your model.
\r
2488 These values will be used as the starting values in the next mcmc analysis.
\r
2489 The basic format is:
\r
2491 startvals <param>=(<value_1>,<value_2>,...,<value_n>)
\r
2493 for all substitution model parameters. The format is slightly different for
\r
2494 parameters that are written to a tree file:
\r
2496 startvals <param>=<tree_name>
\r
2498 This version of the command will look for a tree with the specified name
\r
2499 among the trees read in previously when parsing a tree block. The information
\r
2500 stored in that tree will be used to set the starting value of the parameter.
\r
2501 The parameters that are set using this mechanism include topology and branch
\r
2502 length parameters, as well as relaxed clock branch rates, cpp events and
\r
2503 cpp branch rate multipliers.
\r
2505 The above versions of the command will set the value for all runs and chains.
\r
2506 You can also set the value for an individual run and chain by using the format
\r
2508 startvals <param>(<run>,<chain>)=(<value_1>,...)
\r
2510 where <run> is the index of the run and <chain> the index of the chain. If
\r
2511 the run index is omitted, the values will be changed for all runs. Similarly,
\r
2512 if the chain index is omitted, all chains will be set to the specified value.
\r
2513 For example, if we wanted to set the values of the stationary frequency
\r
2514 parameter pi{1} to (0.1,0.1,0.4,0.4) for all chains in run 1, and to
\r
2515 (0.3,0.3,0.2,0.2) for chain 3 of run 2, we would use
\r
2517 startvals pi{1}(1,)=(0.1,0.1,0.4,0.4) pi{1}(2,3)=(0.3,0.3,0.2,0.2)
\r
2519 ---------------------------------------------------------------------------
\r
2520 ---------------------------------------------------------------------------
\r
2523 During an MCMC analysis, MrBayes prints the sampled parameter values to one or
\r
2524 more tab-delimited text files, one for each independent run in your analysis.
\r
2525 The command 'Sump' summarizes the information in this parameter file or these
\r
2526 parameter files. By default, the root of the parameter file name(s) is assumed
\r
2527 to be the name of the last matrix-containing nexus file. MrBayes also remem-
\r
2528 bers the number of independent runs in the last analysis that you set up, re-
\r
2529 gardless of whether you actually ran it. For instance, if there were two in-
\r
2530 dependent runs, which is the initial setting when you read in a new matrix,
\r
2531 MrBayes will assume that there are two parameter files with the endings
\r
2532 '.run1.p' and '.run2.p'. You can change the root of the file names and the
\r
2533 number of runs using the 'Filename' and 'Nruns' settings.
\r
2535 When you invoke the 'Sump' command, three items are output: (1) a generation
\r
2536 plot of the likelihood values; (2) estimates of the marginal likelihood of
\r
2537 the model; and (3) a table with the mean, variance, and 95 percent credible
\r
2538 interval for the sampled parameters. All three items are output to screen.
\r
2539 The table of marginal likelihoods is also printed to a file with the ending
\r
2540 '.lstat' and the parameter table to a file with the ending '.pstat'. For some
\r
2541 model parameters, there may also be a '.mstat' file.
\r
2543 When running 'Sump' you typically want to discard a specified number or
\r
2544 fraction of samples from the beginning of the chain as the burn in. This is
\r
2545 done using the same mechanism used by the 'mcmc' command. That is, if you
\r
2546 run an mcmc analysis with a relative burn in of 25 % of samples for con-
\r
2547 vergence diagnostics, then the same burn in will be used for a subsequent
\r
2548 sump command, unless a different burn in is specified. That is, issuing
\r
2552 immediately after 'mcmc', will result in using the same burn in settings as
\r
2553 for the 'mcmc' command. All burnin settings are reset to default values every
\r
2554 time a new matrix is read in, namely relative burnin ('relburnin=yes') with
\r
2555 25 % of samples discarded ('burninfrac = 0.25').
\r
2559 Relburnin -- If this option is set to 'Yes', then a proportion of the
\r
2560 samples will be discarded as burnin when calculating summary
\r
2561 statistics. The proportion to be discarded is set with
\r
2562 'Burninfrac' (see below). When the 'Relburnin' option is set
\r
2563 to 'No', then a specific number of samples is discarded
\r
2564 instead. This number is set by 'Burnin' (see below). Note that
\r
2565 the burnin setting is shared across the 'sumt', 'sump', and
\r
2567 Burnin -- Determines the number of samples (not generations) that will
\r
2568 be discarded when summary statistics are calculated. The
\r
2569 value of this option is only applicable when 'Relburnin' is
\r
2571 Burninfrac -- Determines the fraction of samples that will be discarded when
\r
2572 summary statistics are calculated. The setting only takes
\r
2573 effect if 'Relburnin' is set to 'Yes'.
\r
2574 Nruns -- Determines how many '.p' files from independent analyses that
\r
2575 will be summarized. If Nruns > 1 then the names of the files
\r
2576 are derived from 'Filename' by adding '.run1.p', '.run2.p',
\r
2577 etc. If Nruns=1, then the single file name is obtained by
\r
2578 adding '.p' to 'Filename'.
\r
2579 Filename -- The name of the file to be summarized. This is the base of the
\r
2580 file name to which endings are added according to the current
\r
2581 setting of the 'Nruns' parameter. If 'Nruns' is 1, then only
\r
2582 '.p' is added to the file name. Otherwise, the endings will
\r
2583 be '.run1.p', '.run2.p', etc.
\r
2584 Outputname -- Base name of the file(s) to which 'Sump' results will be
\r
2586 Hpd -- Determines whether credibility intervals will be given as the
\r
2587 region of Highest Posterior Density ('Yes') or as the interval
\r
2588 containing the median 95 % of sampled values ('No').
\r
2589 Minprob -- Determines the minimum probability of submodels to be included
\r
2590 in summary statistics. Only applicable to models that explore
\r
2591 submodel spaces, like 'nst=mixed' and 'aamodelpr=mixed'.
\r
2593 Current settings:
\r
2595 Parameter Options Current Setting
\r
2596 --------------------------------------------------------
\r
2597 Relburnin Yes/No Yes
\r
2598 Burnin <number> 0
\r
2599 Burninfrac <number> 0.25
\r
2601 Filename <name> temp<.run<i>.p>
\r
2602 Outputname <name> temp<.pstat etc>
\r
2604 Minprob <number> 0.050
\r
2606 ---------------------------------------------------------------------------
\r
2607 ---------------------------------------------------------------------------
\r
2610 This command summarizes results of stepping stone analyses. It is a tool to
\r
2611 investigate the obtained results, and to help find the proper step burn-in.
\r
2612 To get more help information on stepping-stone analyses, use 'help ss'.
\r
2614 During stepping-stone analysis, MrBayes collects the sampled likelihoods in
\r
2615 order to estimate the marginal likelihood at the end. It also prints the sam-
\r
2616 pled parameter values to one or more tab-delimited text files, one for each
\r
2617 independent run in your analysis. The command 'Sumss' summarizes likelihood
\r
2618 values stored in these parameter files and calculates marginal likelihood es-
\r
2619 timates. The names of the files that are summarized are exactly the same as
\r
2620 the names of the files used for the 'sump' command. In fact, the 'filename'
\r
2621 setting is a shared setting for the 'sump' and 'sumss' commands. That is, if
\r
2622 you change the setting in one of the commands, it would change the setting in
\r
2623 the other command as well.
\r
2625 When you invoke the 'Sumss' command, three items are output: (1) 'Step contri-
\r
2626 bution table' - summarizes the contribution of each step to the overall esti-
\r
2627 mate; (2) 'Step plot' - plot of the likelihood values for the initial burn-in
\r
2628 phase or a chosen step in the stepping-stone algorithm; (3) 'Joined plot' -
\r
2629 summarizes sampling across all steps in the algorithm.
\r
2631 Step contribution table
\r
2632 The printed table is similar to the one output to the .ss file. The main pur-
\r
2633 pose of the table is to summarize marginal likelihood for different values of
\r
2634 the step burn-in after the stepping stone analysis has finished. The burn-in
\r
2635 is controlled by the 'Relburnin', 'Burnin' and 'Burninfrac' settings.
\r
2636 Note that during stepping-stone analyses, step contributions to marginal
\r
2637 likelihood are calculated based on all generations excluding burn-in. 'Sumss'
\r
2638 on the other hand makes estimates based only on the sampled generations. This
\r
2639 may lead to slight difference in results compared to the one printed to the
\r
2643 The main objective of the plot is to provide a close look at a given step in
\r
2644 the analysis. Which step is printed here is defined by the 'Steptoplot' set-
\r
2645 ting. The plot could be used to inspect if the chosen step burn-in is appro-
\r
2646 priate for the given step. It could also be used to check if the initial burn-
\r
2647 in phase has converged. Note that the amount of discarded samples is controled
\r
2648 by the 'Discardfrac' setting, and not by the ordinary burn-in settings.
\r
2651 Different steps sample from different power posterior distributions. When we
\r
2652 switch from one distribution to another, it takes some number of generations
\r
2653 before the chain settles at the correct stationary distribution. This lag is
\r
2654 called a 'temperature lag' and if the corresponding samples are not removed,
\r
2655 it will result in a biased estimate. It is difficult to determine the lag be-
\r
2656 forehand, but MrBayes allows you to explore different step burn-in settings
\r
2657 after you have finished the stepping-stone algorithm, without having to rerun
\r
2658 the whole analysis. The 'Joined plot' helps to facilitate the choice of the
\r
2659 right step burn-in. The plot summarizes samples across all steps and gives you
\r
2660 a quick overview of the whole analysis.
\r
2662 Specifically, the following procedure is used to obtain the joined plot. Each
\r
2663 step has the same number N of samples taken. We number each sample 1 to N
\r
2664 within steps according to the order in which the samples are taken. The first
\r
2665 sample in each step is numbered 1, and the last sample is N. For each number i
\r
2666 in [1,..., N], we sum up log likelihoods for all samples numbered i across all
\r
2667 steps. The joined plot is a graph of the step number versus the normalized
\r
2668 sums we get in the procedure describe above. This directly visualizes the tem-
\r
2669 perature lag and allows you to select the appropriate step burn-in.
\r
2671 Ideally, after you discard the appropriate step burn-in, the graph should
\r
2672 appear as white noise around the estimated value. If you see an increasing or
\r
2673 decreasing tendency in the beginning of the graph, you should increase the
\r
2674 step burn-in. If you see an increasing or decreasing tendency across the whole
\r
2675 graph, then the initial burn-in phase was not long enough. In this case, you
\r
2676 need to rerun the analysis with a longer initial burn-in.
\r
2678 To make it easier to observe tendencies in the plotted graph you can choose
\r
2679 different levels of curve smoothing. If 'Smoothing' is set to k, it means that
\r
2680 for each step i we take an average over step i and k neighboring samples in
\r
2681 both directions, i.e., the k-smoothed estimate for step i is an average over
\r
2682 values for steps [i-k,...,i+k].
\r
2687 Allruns -- If set to 'Yes', it forces all runs to be printed on the same
\r
2688 graph when drawing joined and step plots. If set to 'No', each
\r
2689 run is printed on a separat plot.
\r
2690 Askmore -- Long analyses may produce huge .p files. Reading in them may
\r
2691 take several minutes. If you want to investigate different
\r
2692 aspects of your analyses, it could be very inconvenient to
\r
2693 wait for several minutes each time you want to get a new sum-
\r
2694 mary for different settings. If you set 'Askmore' to 'YES',
\r
2695 sumss will read .p files only once. After responding to the
\r
2696 original query, it will interactivaly ask you if you wish to
\r
2697 produce more tables and plots for different settings of
\r
2698 'Burnin' or 'Smoothing' (see below).
\r
2699 Relburnin -- If this option is set to 'Yes', then a proportion of the
\r
2700 samples from each step will be discarded as burnin when calcu-
\r
2701 lsting summary statistics. The proportion to be discarded is
\r
2702 set with 'Burninfrac' (see below). When the 'Relburnin' option
\r
2703 is set to 'No', then a specific number of samples is discarded
\r
2704 instead. This number is set by 'Burnin'. Note that the burnin
\r
2705 settings --- 'Relburnin', 'Burnin', and 'Burninfrac' --- are
\r
2706 shared across the 'sumt', 'sump', 'sumss' and 'mcmc' commands.
\r
2707 Burnin -- Determines the number of samples (not generations) that will
\r
2708 be discarded from each step when summary statistics are calcu-
\r
2709 lated. The value of this option is only applicable when
\r
2710 'Relburnin' is set to 'No'.
\r
2711 Burninfrac -- Determines the fraction of samples that will be discarded from
\r
2712 each step when summary statistics are calculated. The setting
\r
2713 only takes effect if 'Relburnin' is set to 'Yes'.
\r
2714 Discardfrac -- Determines the fraction of samples that will be discarded when
\r
2715 a step plot is printed. It is similar to the 'Burninfrac' set-
\r
2716 ting, but unlike 'Burninfrac' it is used only for better vis-
\r
2717 ualization of the step plot. It has no effect on the number of
\r
2718 samples discarded during marginal likelihood computation.
\r
2719 Filename -- The name of the file to be summarized. This is the base of the
\r
2720 file name to which endings are added according to the current
\r
2721 setting of the 'Nruns' parameter. If 'Nruns' is 1, then only
\r
2722 '.p' is added to the file name. Otherwise, the endings will
\r
2723 be '.run1.p', '.run2.p', etc. Note that the 'Filename' setting
\r
2724 is shared with 'sump' command.
\r
2725 Nruns -- Determines how many '.p' files from independent analyses that
\r
2726 will be summarized. If Nruns > 1 then the names of the files
\r
2727 are derived from 'Filename' by adding '.run1.p', '.run2.p',
\r
2728 etc. If Nruns=1, then the single file name is obtained by
\r
2729 adding '.p' to 'Filename'.
\r
2730 Steptoplot -- Defines which step will be printed in the step plot.If the
\r
2731 value is set to 0, then the initial sample from the posterior
\r
2733 Smoothing -- Determines smoothing of the joined plot (see above). A value
\r
2734 equal to 0 results in no smoothing.
\r
2736 Current settings:
\r
2738 Parameter Options Current Setting
\r
2739 --------------------------------------------------------
\r
2740 Allruns Yes/No Yes
\r
2741 Askmore Yes/No Yes
\r
2742 Relburnin Yes/No Yes
\r
2743 Burnin <number> 0
\r
2744 Burninfrac <number> 0.25
\r
2745 Discardfrac <number> 0.80
\r
2746 Filename <name> temp<.run<i>.p>
\r
2748 Steptoplot <number> 0
\r
2749 Smoothing <number> 0
\r
2750 ---------------------------------------------------------------------------
\r
2751 ---------------------------------------------------------------------------
\r
2754 This command is used to produce summary statistics for trees sampled during
\r
2755 a Bayesian MCMC analysis. You can either summarize trees from one individual
\r
2756 analysis, or trees coming from several independent analyses. In either case,
\r
2757 all the sampled trees are read in and the proportion of the time any single
\r
2758 taxon bipartition (split) is found is counted. The proportion of the time that
\r
2759 the bipartition is found is an approximation of the posterior probability of
\r
2760 the bipartition. (Remember that a taxon bipartition is defined by removing a
\r
2761 branch on the tree, dividing the tree into those taxa to the left and right
\r
2762 of the removed branch. This set is called a taxon bipartition.) The branch
\r
2763 length of the bipartition is also recorded, if branch lengths have been saved
\r
2764 to file. The result is a list of the taxon bipartitions found, the frequency
\r
2765 with which they were found, the posterior probability of the bipartition
\r
2766 and, the mean and variance of the branch lengths or node depths, and various
\r
2767 other statistics.
\r
2769 The key to the partitions is output to a file with the suffix '.parts'. The
\r
2770 summary statistics pertaining to bipartition probabilities are output to a
\r
2771 file with the suffix '.tstat', and the statistics pertaining to branch or node
\r
2772 parameters are output to a file with the suffix '.vstat'.
\r
2774 A consensus tree is also printed to a file with the suffix '.con.tre' and
\r
2775 printed to the screen as a cladogram, and as a phylogram if branch lengths
\r
2776 have been saved. The consensus tree is either a 50 percent majority rule tree
\r
2777 or a majority rule tree showing all compatible partitions. If branch lengths
\r
2778 have been recorded during the run, the '.con.tre' file will contain a consen-
\r
2779 sus tree with branch lengths and interior nodes labelled with support values.
\r
2780 By default, the consensus tree will also contain other summary information in
\r
2781 a format understood by the program 'FigTree'. To use a simpler format under-
\r
2782 stood by other tree-drawing programs, such as 'TreeView', set 'Conformat' to
\r
2785 MrBayes alo produces a file with the ending ".trprobs" that contains a list
\r
2786 of all the trees that were found during the MCMC analysis, sorted by their
\r
2787 probabilities. This list of trees can be used to construct a credible set of
\r
2788 trees. For example, if you want to construct a 95 percent credible set of
\r
2789 trees, you include all of those trees whose cumulative probability is less
\r
2790 than or equal to 0.95. You have the option of displaying the trees to the
\r
2791 screen using the "Showtreeprobs" option. The default is to not display the
\r
2792 trees to the screen; the number of different trees sampled by the chain can
\r
2793 be quite large. If you are analyzing a large set of taxa, you may actually
\r
2794 want to skip the calculation of tree probabilities entirely by setting
\r
2795 'Calctreeprobs' to 'No'.
\r
2797 When calculating summary statistics you probably want to skip those trees that
\r
2798 were sampled in the initial part of the run, the so-called burn-in period. The
\r
2799 number of skipped samples is controlled by the 'Relburnin', 'Burnin', and
\r
2800 'Burninfrac' settings, just as for the 'Mcmc' command. Since version 3.2.0,
\r
2801 the burn-in settings are shared across the 'Sumt', 'Sump' and 'Mcmc' commands.
\r
2802 That is, changing the burn-in setting for one command will change the settings
\r
2803 for subsequent calls to any of the other commands.
\r
2805 If you are summarizing the trees sampled in several independent analyses,
\r
2806 such as those resulting from setting the 'Nruns' option of the 'Mcmc' command
\r
2807 to a value larger than 1, MrBayes will also calculate convergence diagnostics
\r
2808 for the sampled topologies and branch lengths. These values can help you
\r
2809 determine whether it is likely that your chains have converged.
\r
2811 The 'Sumt' command expands the 'Filename' according to the current values of
\r
2812 the 'Nruns' and 'Ntrees' options. For instance, if both 'Nruns' and 'Ntrees'
\r
2813 are set to 1, 'Sumt' will try to open a file named '<Filename>.t'. If 'Nruns'
\r
2814 is set to 2 and 'Ntrees' to 1, then 'Sumt' will open two files, the first
\r
2815 named '<Filename>.run1.t' and the second '<Filename>.run2.t', etc. By default,
\r
2816 the 'Filename' option is set such that 'Sumt' automatically summarizes all the
\r
2817 results from your immediately preceding 'Mcmc' command. You can also use the
\r
2818 'Sumt' command to summarize tree samples in older analyses. If you want to do
\r
2819 that, remember to first read in a matrix so that MrBayes knows what taxon
\r
2820 names to expect in the trees. Then set the 'Nruns', 'Ntrees' and 'Filename'
\r
2821 options appropriately if they differ from the MrBayes defaults.
\r
2825 Relburnin -- If this option is set to YES, then a proportion of the
\r
2826 samples will be discarded as burnin when calculating summary
\r
2827 statistics. The proportion to be discarded is set with
\r
2828 Burninfrac (see below). When the Relburnin option is set to
\r
2829 NO, then a specific number of samples is discarded instead.
\r
2830 This number is set by Burnin (see below). Note that the
\r
2831 burnin setting is shared across the 'sumt', 'sump', and
\r
2833 Burnin -- Determines the number of samples (not generations) that will
\r
2834 be discarded when summary statistics are calculated. The
\r
2835 value of this option is only relevant when Relburnin is set
\r
2837 BurninFrac -- Determines the fraction of samples that will be discarded
\r
2838 when summary statistics are calculated. The value of this
\r
2839 option is only relevant when Relburnin is set to YES.
\r
2840 Example: A value for this option of 0.25 means that 25% of
\r
2841 the samples will be discarded.
\r
2842 Nruns -- Determines how many '.t' files from independent analyses that
\r
2843 will be summarized. If Nruns > 1 then the names of the files
\r
2844 are derived from 'Filename' by adding '.run1.t', '.run2.t',
\r
2845 etc. If Nruns=1 and Ntrees=1 (see below), then only '.t' is
\r
2846 added to 'Filename'.
\r
2847 Ntrees -- Determines how many trees there are in the sampled model. If
\r
2848 'Ntrees' > 1 then the names of the files are derived from
\r
2849 'Filename' by adding '.tree1.t', '.tree2.t', etc. If there
\r
2850 are both multiple trees and multiple runs, the filenames will
\r
2851 be '<Filename>.tree1.run1.t', '<Filename>.tree1.run2.t', etc.
\r
2852 Filename -- The name of the file(s) to be summarized. This is the base of
\r
2853 the file name, to which endings are added according to the
\r
2854 current settings of the 'Nruns' and 'Ntrees' options.
\r
2855 Minpartfreq -- The minimum probability of partitions to include in summary
\r
2857 Contype -- Type of consensus tree. 'Halfcompat' results in a 50% major-
\r
2858 ity rule tree, 'Allcompat' adds all compatible groups to such
\r
2860 Conformat -- Format of consensus tree. The 'Figtree' setting results in a
\r
2861 consensus tree formatted for the program FigTree, with rich
\r
2862 summary statistics. The 'Simple' setting results in a simple
\r
2863 consensus tree written in a format read by a variety of pro-
\r
2865 Outputname -- Base name of the file(s) to which 'sumt' results will be
\r
2866 printed. The default is the same as 'Filename'.
\r
2867 Calctreeprobs -- Determines whether tree probabilities should be calculated.
\r
2868 Showtreeprobs -- Determines whether tree probabilities should be displayed on
\r
2870 Hpd -- Determines whether credibility intervals will be given as the
\r
2871 region of Highest Posterior Density ('Yes') or as the inter-
\r
2872 val containing the median 95 % of sampled values ('No').
\r
2874 Current settings:
\r
2876 Parameter Options Current Setting
\r
2877 --------------------------------------------------------
\r
2878 Relburnin Yes/No Yes
\r
2879 Burnin <number> 0
\r
2880 Burninfrac <number> 0.25
\r
2882 Ntrees <number> 1
\r
2883 Filename <name> temp<.run<i>.t>
\r
2884 Minpartfreq <number> 0.10
\r
2885 Contype Halfcompat/Allcompat Halfcompat
\r
2886 Conformat Figtree/Simple Figtree
\r
2887 Outputname <name> temp<.parts etc>
\r
2888 Calctreeprobs Yes/No Yes
\r
2889 Showtreeprobs Yes/No No
\r
2892 ---------------------------------------------------------------------------
\r
2893 ---------------------------------------------------------------------------
\r
2896 This command shows the status of all the taxa. The correct usage is
\r
2900 After typing "taxastat", the taxon number, name, and whether it is
\r
2901 excluded or included are shown.
\r
2902 ---------------------------------------------------------------------------
\r
2903 ---------------------------------------------------------------------------
\r
2906 This command defines a taxon set. The format for the taxset command
\r
2909 taxset <name> = <taxon names or numbers>
\r
2911 For example, "taxset apes = Homo Pan Gorilla Orang gibbon" defines a
\r
2912 taxon set called "apes" that includes five taxa (namely, apes).
\r
2913 You can assign up to 30 taxon sets. This option is best used
\r
2914 not from the command line but rather as a line in the mrbayes block
\r
2916 ---------------------------------------------------------------------------
\r
2917 ---------------------------------------------------------------------------
\r
2920 This command unlinks model parameters across partitions of the data. The
\r
2921 correct usage is:
\r
2923 unlink <parameter name> = (<all> or <partition list>)
\r
2925 A little background is necessary to understand this command. Upon exe-
\r
2926 cution of a file, a default partition is set up. This partition refer-
\r
2927 enced either by its name ("default") or number (0). If your data are
\r
2928 all of one type, then this default partition does not actually divide up
\r
2929 your characters. However, if your datatype is mixed, then the default
\r
2930 partition contains as many divisions as there are datatypes in your
\r
2931 character matrix. Of course, you can also define other partitions, and
\r
2932 switch among them using the set command ("set partition=<name/number>").
\r
2933 Importantly, you can also assign model parameters to individual part-
\r
2934 itions or to groups of them using the "applyto" option in lset and
\r
2935 prset. When the program attempts to perform an analysis, the model is
\r
2936 set for individual partitions. If the same parameter applies to differ-
\r
2937 partitions and if that parameter has the same prior, then the program
\r
2938 will link the parameters: that is, it will use a single value for the
\r
2939 parameter. The program's default, then, is to strive for parsimony.
\r
2940 However, there are lots of cases where you may want unlink a parameter
\r
2941 across partitions. For example, you may want a different transition/
\r
2942 transversion rate ratio to apply to different partitions. This command
\r
2943 allows you to unlink the parameters, or to make them different across
\r
2944 partitions. The converse of this command is "link", which links to-
\r
2945 gether parameters that were previously told to be different. The list
\r
2946 of parameters that can be unlinked includes:
\r
2948 Tratio -- Transition/transversion rate ratio
\r
2949 Revmat -- Substitution rates of GTR model
\r
2950 Omega -- Nonsynonymous/synonymous rate ratio
\r
2951 Statefreq -- Character state frequencies
\r
2952 Shape -- Gamma/LNorm shape parameter
\r
2953 Pinvar -- Proportion of invariable sites
\r
2954 Correlation -- Correlation parameter of autodiscrete gamma
\r
2955 Ratemultiplier -- Rate multiplier for partitions
\r
2956 Switchrates -- Switching rates for covarion model
\r
2957 Topology -- Topology of tree
\r
2958 Brlens -- Branch lengths of tree
\r
2959 Speciationrate -- Speciation rates for birth-death process
\r
2960 Extinctionrate -- Extinction rates for birth-death process
\r
2961 Popsize -- Population size for coalescence process
\r
2962 Growthrate -- Growth rate of coalescence process
\r
2963 Aamodel -- Aminoacid rate matrix
\r
2964 Cpprate -- Rate of Compound Poisson Process (CPP)
\r
2965 Cppmultdev -- Standard dev. of CPP rate multipliers (log scale)
\r
2966 Cppevents -- CPP events
\r
2967 TK02var -- Variance increase in TK02 relaxed clock model
\r
2968 Igrvar -- Variance increase in IGR relaxed clock model
\r
2969 Mixedvar -- Variance increase in Mixed relaxed clock model
\r
2973 unlink shape=(all)
\r
2975 unlinks the gamma/lnorm shape parameter across all partitions of the data.
\r
2976 You can use "showmodel" to see the current linking status of the
\r
2978 ---------------------------------------------------------------------------
\r
2979 ---------------------------------------------------------------------------
\r
2982 This command shows the release version of the program.
\r
2983 ---------------------------------------------------------------------------
\r
2985 ***************************************************************************
\r
2987 * 3. 'Data' or 'tree' block commands (in #NEXUS file) *
\r
2989 ***************************************************************************
\r
2991 ---------------------------------------------------------------------------
\r
2994 This command is used to format data or commands in the program. The correct
\r
2997 begin <data or mrbayes>;
\r
2999 The two valid uses of the "begin" command, then, are
\r
3004 The "data" specifier is used to specify the beginning of a data block; your
\r
3005 character data should follow. For example, the following is an example of
\r
3006 a data block for four taxa and ten DNA sites:
\r
3009 dimensions ntax=4 nchar=10;
\r
3010 format datatype=dna;
\r
3012 taxon_1 AACGATTCGT
\r
3013 taxon_2 AAGGATTCCA
\r
3014 taxon_3 AACGACTCCT
\r
3015 taxon_4 AAGGATTCCT
\r
3019 The other commands -- dimensions, format, and matrix -- are discussed
\r
3020 in the appropriate help menu. The only thing to note here is that the
\r
3021 block begins with a "begin data" command. The "mrbayes" command is
\r
3022 used to enter commands specific to the MrBayes program into the file.
\r
3023 This allows you to automatically process commands on execution of the
\r
3024 program. The following is a simple mrbayes block:
\r
3027 charset first = 1-10\3;
\r
3028 charset second = 2-10\3;
\r
3029 charset third = 3-10\3;
\r
3032 This mrbayes block sets off the three "charset" commands, used to
\r
3033 predefine some blocks of characters. The mrbayes block can be very useful.
\r
3034 For example, in this case, it would save you the time of typing the char-
\r
3035 acter sets each time you executed the file. Also, note that every
\r
3036 "begin <data or mrbayes>" command ends with an "end". Finally, you can
\r
3037 have so-called foreign blocks in the file. An example of a foreign block
\r
3038 would be "begin paup". The program will simply skip this block. This is
\r
3039 useful because it means that you can use the same file for MrBayes, PAUP*
\r
3040 or MacClade (although it isn't clear why you would want to use those other
\r
3042 ---------------------------------------------------------------------------
\r
3043 ---------------------------------------------------------------------------
\r
3046 This command is used in a data block to define the number of taxa and
\r
3047 characters. The correct usage is
\r
3049 dimensions ntax=<number> nchar=<number>
\r
3051 The dimensions must be the first command in a data block. The following
\r
3052 provides an example of the proper use of this command:
\r
3055 dimensions ntax=4 nchar=10;
\r
3056 format datatype=dna;
\r
3058 taxon_1 AACGATTCGT
\r
3059 taxon_2 AAGGATTCCA
\r
3060 taxon_3 AACGACTCCT
\r
3061 taxon_4 AAGGATTCCT
\r
3065 Here, the dimensions command tells MrBayes to expect a matrix with four
\r
3066 taxa and 10 characters.
\r
3067 ---------------------------------------------------------------------------
\r
3068 ---------------------------------------------------------------------------
\r
3071 This command is used to terminate a data or mrbayes block. The correct
\r
3076 For more information on this, check the help for the "begin" command.
\r
3077 ---------------------------------------------------------------------------
\r
3078 ---------------------------------------------------------------------------
\r
3081 This is an older, deprecated version of "End", see that command.
\r
3082 ---------------------------------------------------------------------------
\r
3083 ---------------------------------------------------------------------------
\r
3086 This command is used in a data block to define the format of the char-
\r
3087 acter matrix. The correct usage is
\r
3089 format datatype=<name> ... <parameter>=<option>
\r
3091 The format command must be the second command in a data block. The following
\r
3092 provides an example of the proper use of this command:
\r
3095 dimensions ntax=4 nchar=10;
\r
3096 format datatype=dna gap=-;
\r
3098 taxon_1 AACGATTCGT
\r
3099 taxon_2 AAGGAT--CA
\r
3100 taxon_3 AACGACTCCT
\r
3101 taxon_4 AAGGATTCCT
\r
3105 Here, the format command tells MrBayes to expect a matrix with DNA char-
\r
3106 acters and with gaps coded as "-".
\r
3108 The following are valid options for format:
\r
3110 Datatype -- This parameter MUST BE INCLUDED in the format command. More-
\r
3111 over, it must be the first parameter in the line. The
\r
3112 datatype command specifies what type of characters are
\r
3113 in the matrix. The following are valid options:
\r
3114 Datatype = Dna: DNA states (A,C,G,T,R,Y,M,K,S,W,H,B,
\r
3116 Datatype = Rna: DNA states (A,C,G,U,R,Y,M,K,S,W,H,B,
\r
3118 Datatype = Protein: Amino acid states (A,R,N,D,C,Q,E,
\r
3119 G,H,I,L,K,M,F,P,S,T,W,Y,V)
\r
3120 Datatype = Restriction: Restriction site (0,1) states
\r
3121 Datatype = Standard: Morphological (0,1) states
\r
3122 Datatype = Continuous: Real number valued states
\r
3123 Datatype = Mixed(<type>:<range>,...,<type>:<range>): A
\r
3124 mixture of the above datatypes. For example,
\r
3125 "datatype=mixed(dna:1-100,protein:101-200)"
\r
3126 would specify a mixture of DNA and amino acid
\r
3127 characters with the DNA characters occupying
\r
3128 the first 100 sites and the amino acid char-
\r
3129 acters occupying the last 100 sites.
\r
3131 Interleave -- This parameter specifies whether the data matrix is in
\r
3132 interleave format. The valid options are "Yes" or "No",
\r
3133 with "No" as the default. An interleaved matrix looks like
\r
3135 format datatype=dna gap=- interleave=yes;
\r
3137 taxon_1 AACGATTCGT
\r
3138 taxon_2 AAGGAT--CA
\r
3139 taxon_3 AACGACTCCT
\r
3140 taxon_4 AAGGATTCCT
\r
3148 Gap -- This parameter specifies the format for gaps. Note that
\r
3149 gap character can only be a single character and that it
\r
3150 cannot correspond to a standard state (e.g., A,C,G,T,R,Y,
\r
3151 M,K,S,W,H,B,V,D,N for nucleotide data).
\r
3153 Missing -- This parameter specifies the format for missing data. Note
\r
3154 that the missing character can only be a single character and
\r
3155 cannot correspond to a standard state (e.g., A,C,G,T,R,Y,
\r
3156 M,K,S,W,H,B,V,D,N for nucleotide data). This is often an
\r
3157 unnecessary parameter to set because many data types, such
\r
3158 as nucleotide or amino acid, already have a missing char-
\r
3159 acter specified. However, for morphological or restriction
\r
3160 site data, "missing=?" is often used to specify ambiguity
\r
3161 or unobserved data.
\r
3163 Matchchar -- This parameter specifies the matching character for the
\r
3164 matrix. For example,
\r
3166 format datatype=dna gap=- matchchar=.;
\r
3168 taxon_1 AACGATTCGT
\r
3169 taxon_2 ..G...--CA
\r
3170 taxon_3 .....C..C.
\r
3171 taxon_4 ..G.....C.
\r
3176 format datatype=dna gap=-;
\r
3178 taxon_1 AACGATTCGT
\r
3179 taxon_2 AAGGAT--CA
\r
3180 taxon_3 AACGACTCCT
\r
3181 taxon_4 AAGGATTCCT
\r
3184 The only non-standard NEXUS format option is the use of the "mixed",
\r
3185 "restriction", "standard" and "continuous" datatypes. Hence, if
\r
3186 you use any of these datatype specifiers, a program like PAUP* or
\r
3187 MacClade will report an error (as they should because MrBayes is not
\r
3188 strictly NEXUS compliant).
\r
3189 ---------------------------------------------------------------------------
\r
3190 ---------------------------------------------------------------------------
\r
3193 This command specifies the actual data for the phylogenetic analysis.
\r
3194 The character matrix should follow the dimensions and format commands
\r
3195 in a data block. The matrix can have all of the characters for a taxon
\r
3196 on a single line:
\r
3199 dimensions ntax=4 nchar=10;
\r
3200 format datatype=dna gap=-;
\r
3202 taxon_1 AACGATTCGT
\r
3203 taxon_2 AAGGAT--CA
\r
3204 taxon_3 AACGACTCCT
\r
3205 taxon_4 AAGGATTCCT
\r
3209 or be in "interleaved" format:
\r
3212 dimensions ntax=4 nchar=20;
\r
3213 format datatype=dna gap=- interleave=yes;
\r
3215 taxon_1 AACGATTCGT
\r
3216 taxon_2 AAGGAT--CA
\r
3217 taxon_3 AACGACTCCT
\r
3218 taxon_4 AAGGATTCCT
\r
3220 taxon_1 TTTTCGAAGC
\r
3221 taxon_2 TTTTCGGAGC
\r
3222 taxon_3 TTTTTGATGC
\r
3223 taxon_4 TTTTCGGAGC
\r
3227 Note that the taxon names must not have spaces. If you really want to
\r
3228 indicate a space in a taxon name (perhaps between a genus and species
\r
3229 name), then you might use an underline ("_"). There should be at
\r
3230 least a single space after the taxon name, separating the name from
\r
3231 the actual data on that line. There can be spaces between the char-
\r
3234 If you have mixed data, then you specify all of the data in the same
\r
3235 matrix. Here is an example that includes two different data types:
\r
3238 dimensions ntax=4 nchar=20;
\r
3239 format datatype=mixed(dna:1-10,standard:21-30) interleave=yes;
\r
3241 taxon_1 AACGATTCGT
\r
3242 taxon_2 AAGGAT--CA
\r
3243 taxon_3 AACGACTCCT
\r
3244 taxon_4 AAGGATTCCT
\r
3246 taxon_1 0001111111
\r
3247 taxon_2 0111110000
\r
3248 taxon_3 1110000000
\r
3249 taxon_4 1000001111
\r
3253 The matrix command is terminated by a semicolon.
\r
3255 Finally, just a note on data presentation. It is much easier for others
\r
3256 to (1) understand your data and (2) repeat your analyses if you make
\r
3257 your data clean, comment it liberally (using the square brackets), and
\r
3258 embed the commands you used in a publication in the mrbayes block.
\r
3259 Remember that the data took a long time for you to collect. You might
\r
3260 as well spend a little time making the data file look nice and clear to
\r
3261 any that may later request the data for further analysis.
\r
3262 ---------------------------------------------------------------------------
\r
3263 ---------------------------------------------------------------------------
\r
3266 This command defines taxon labels. It could be used within taxa block.
\r
3267 ---------------------------------------------------------------------------
\r
3268 ---------------------------------------------------------------------------
\r
3271 This command is used by MrBayes to specify the mapping between taxon names
\r
3272 and taxon numbers in a Nexus tree file. For instance,
\r
3280 establishes that the taxon labeled 1 in the trees that follow is Homo, the
\r
3281 taxon labeled 2 is Pan, etc.
\r
3282 ---------------------------------------------------------------------------
\r
3283 ---------------------------------------------------------------------------
\r
3286 This command is used by MrBayes to write trees to a nexus tree file. Trees
\r
3287 are written in the Newick format. For instance,
\r
3289 tree ((1,2),3,4);
\r
3291 describes an unrooted tree with taxa 1 and 2 being more closely related to
\r
3292 each other than to taxa 3 and 4. If branch lengths are saved to file, they
\r
3293 are given after a colon sign immediately following the terminal taxon or the
\r
3294 interior node they refer to. An example of an unrooted tree with branch
\r
3297 tree ((1:0.064573,2:0.029042):0.041239,3:0.203988,4:0.187654);
\r
3299 Trees that are rooted (clock trees) are written with a basal dichotomy
\r
3300 instead of a basal trichotomy. If the tree described above had been rooted
\r
3301 on the branch leading to taxon 4, it would have been represented as:
\r
3303 tree (((1,2),3),4);
\r
3305 ---------------------------------------------------------------------------
\r