examples/lysozyme/README.txt

   1 This folder contains files for the branch (Yang 1998) and branch-site\r
   2 (Yang and Nielsen 2002; Yang et al. 2005; Zhang et al. 2005) analyses\r
   3 using the lysozyme data set of Messier and Stewart (1997).\r
   4 \r
   5 \r
   6 (A) \r
   7 This folder contains the control file, the sequence data file and\r
   8 the tree file for demonstrating codon models that use different dN/dS\r
   9 ratios among lineages (Yang 1998).  The data set is the "small" data\r
  10 set analyzed in Yang (1998).  The default control file let you\r
  11 duplicate the results for the small data set in table 1 of Yang\r
  12 (1998).  Also look at the tree file about specifying the branches of\r
  13 interest, for which positive selection is tested.\r
  14 \r
  15 To fix a particular w to 1, arrange the labels so the concerned branch\r
  16 is the last and then use\r
  17 \r
  18        model = 2\r
  19    fix_omega = 1\r
  20        omega = 1\r
  21 \r
  22 For example, the tree\r
  23 \r
  24  ((1,2) #2, ((3,4) #1, 5), (6,7) );     / * table 1E&J */\r
  25 \r
  26 fits a model with w0 (background), w1, and w2.  Then the above\r
  27 specification will force w2 = 1 to be fixed.\r
  28 \r
  29 Usage:\r
  30 \r
  31         codeml lysozymeSmall.ctl\r
  32 \r
  33 Or you can rename the file lysozyme.ctl as codeml.ctl, and then run \r
  34 \r
  35          codeml\r
  36 \r
  37 \r
  38 (B) The folder also contains another set of files for the "large" data\r
  39 set analyzed under the branch models by Yang (1998).  This data set is\r
  40 used by Yang and Nielsen (2002) and Zhang et al. (2005) to test the\r
  41 branch-site models, which are specified as follows\r
  42 \r
  43    Model A:  model = 2    NSsites = 2\r
  44    Model B:  model = 2    NSsites = 3\r
  45 \r
  46 A complication is that from version 3.14, branch-site model A was\r
  47 modified slightly.  In the old model of Yang and Nielsen (2002), w0 =\r
  48 0 was fixed, while in the new models (described in Yang et al. 2005\r
  49 and tested in Zhang et al. 2005), 0 < w0 < 1 is estimated from the\r
  50 data.  The old branch-site model A is not in the program anymore.\r
  51 Furthermore, version 3.14 or later implements the BEB procedure for\r
  52 identifying sites (Yang et al. 2005), although the NEB results are\r
  53 still included in the output.  Our suggestion is that you use\r
  54 branch-site model A to construct branch-site test 2, which is also\r
  55 called the branch-site test of positive selectin.  We advise that you\r
  56 do not use branch-site test 1 or branch-site model B.  \r
  57 \r
  58 The control file lysozymeLarge.ctl specifies branch-site model A, the\r
  59 alternative hypothesis.  Here are specifications to implement both the\r
  60 null and alternative hypotheses.  See Zhang et al. (2005; table 5).\r
  61 \r
  62 \r
  63 Null hypothesis (branch site model A, with w2 = 1 fixed):\r
  64 \r
  65     model = 2    NSsites = 2   fix_omega = 1   omega = 1\r
  66 \r
  67 \r
  68 Alternative hypothesis (branch site model A, with w2 estimated):\r
  69 \r
  70     model = 2    NSsites = 2   fix_omega = 0   omega = 1.5 (or any value > 1)\r
  71 \r
  72 Look at the tree file lysozymeLarge.trees for specification of the\r
  73 "branch of interest" of "foreground" branch.  You can remove the first\r
  74 line of numbers and the file will be readable from TreeView, which\r
  75 allows you to show the branch (node) labels as well.\r
  76 \r
  77 The variable ncatG is ignored by the program, since the number of site\r
  78 classes is fixed under both models A and B.  (To run the site model\r
  79 "discrete" with only 2 site classes, which is the null model to be\r
  80 compared with model B in Yang & Nielsen 2002, you should specify model\r
  81 = 0, NSsites = 3, ncatG = 2.  Note that this test is not recommended.)\r
  82 See the paper for details.  Also please heed the warnings in the\r
  83 Discussion section of that paper.\r
  84 \r
  85 The branch-site models are very difficult to use, as the numerical\r
  86 iteration algorithm often has problems.  You are advised to run the\r
  87 program multiple times, using different initial values.  If you know\r
  88 how to generate a file or initial values called in.codeml (see\r
  89 Manual), you can edit that file to change initial values.  For\r
  90 example, you can use the estimates of branch lengths and other\r
  91 parameters from the null model to start the iteration for the\r
  92 alternative model.\r
  93 \r
  94 \r
  95 References\r
  96 \r
  97 Yang, Z. 1998. Likelihood ratio tests for detecting positive selection\r
  98 and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.\r
  99 \r
 100 Yang, Z., and R. Nielsen, 2002 Codon-substitution models for detecting\r
 101 molecular adaptation at individual sites along specific\r
 102 lineages. Mol. Biol. Evol. 19: 908-917.\r
 103 \r
 104 Yang, Z., W. S. W. Wong, and R. Nielsen. 2005. Bayes empirical Bayes\r
 105 inference of amino acid sites under positive selection. Molecular\r
 106 Biology and Evolution 22:1107-1118.\r
 107 \r
 108 Zhang, J., R. Nielsen, and Z. Yang. 2005. Evaluation of an improved\r
 109 branch-site likelihood method for detecting positive selection at the\r
 110 molecular level. Molecular Biology and Evolution 22:2472-2479.\r
 111 \r
 112 Ziheng Yang\r
 113 \r
 114 11 September 2001, last modified on 24 November 2005\r