Refactored wiggle code and added rsem-bam2readdepth program

[rsem.git] / README.md
diff --git a/README.md b/README.md

index 0c900b725107ac55b3074fa6cc0fa5e1b628c978..f8c4e2709436f97e446ac3dbbda2306818df70f4 100644 (file)
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ the EM algorithm, single-end and paired-end read data, quality scores,
  variable-length reads and RSPD estimation. It can also generate
  genomic-coordinate BAM files and UCSC wiggle files for visualization. In
  addition, it provides posterior mean and 95% credibility interval
  variable-length reads and RSPD estimation. It can also generate
  genomic-coordinate BAM files and UCSC wiggle files for visualization. In
  addition, it provides posterior mean and 95% credibility interval
-estimates for expression levels.
+estimates for expression levels. 
  
  ## <a name="compilation"></a> Compilation & Installation
  
  
  ## <a name="compilation"></a> Compilation & Installation
  
@@ -40,9 +40,13 @@ variable.
  
  ### Prerequisites
  
  
  ### Prerequisites
  
+C++ and Perl are required to be installed. 
+
  To take advantage of RSEM's built-in support for the Bowtie alignment
  program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed.
  
  To take advantage of RSEM's built-in support for the Bowtie alignment
  program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed.
  
+If you want to plot model learned by RSEM, you should also install R. 
+
  ## <a name="usage"></a> Usage
  
  ### I. Preparing Reference Sequences
  ## <a name="usage"></a> Usage
  
  ### I. Preparing Reference Sequences
@@ -96,6 +100,12 @@ and provide the SAM or BAM file as an argument.  When using an
  alternative aligner, you may also want to provide the --no-bowtie option
  to rsem-prepare-reference so that the Bowtie indices are not built.
  
  alternative aligner, you may also want to provide the --no-bowtie option
  to rsem-prepare-reference so that the Bowtie indices are not built.
  
+However, please note that RSEM does ** not ** support gapped
+alignments. So make sure that your aligner does not produce alignments
+with intersions/deletions. Also, please make sure that you use
+'reference_name.idx.fa' , which is generated by RSEM, to build your
+aligner's indices.
+
  ### III. Visualization
  
  RSEM contains a version of samtools in the 'sam' subdirectory. When
  ### III. Visualization
  
  RSEM contains a version of samtools in the 'sam' subdirectory. When
@@ -125,21 +135,32 @@ Refer to the [UCSC custom track help page](http://genome.ucsc.edu/goldenPath/hel
  
  #### c) Visualize the model learned by RSEM
  
  
  #### c) Visualize the model learned by RSEM
  
-RSEM provides an R script, plotModel.R, for visulazing the model learned.
+RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned.
  
  Usage:
      
  
  Usage:
      
-    plotModel.R modelF outF
+    rsem-plot-model sample_name outF
  
  
-modelF: the sample_name.model file generated by RSEM
-outF: the file name for plots generated from the model. It is a pdf file
+sample_name: the name of the sample analyzed    
+outF: the file name for plots generated from the model. It is a pdf file    
  
  The plots generated depends on read type and user configuration. It
  may include fragment length distribution, mate length distribution,
  
  The plots generated depends on read type and user configuration. It
  may include fragment length distribution, mate length distribution,
-read start position distribution (RSPD), quality score vs percentage
-of sequecing error given the reference base, position vs percentage of
-sequencing errro given the reference base.
+read start position distribution (RSPD), quality score vs observed
+quality given a reference base, position vs percentage of sequencing
+error given a reference base and histogram of reads with different
+number of alignments.
+
+fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length
  
  
+RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the probability of each bin. RSPD can be used as an indicator of 3' bias
+
+Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base
+
+Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error
+
+Histogram of reads with different number of alignments: x-axis is the number of alignments a read has and y-axis is the number of such reads. The inf in x-axis means number of reads filtered due to too many alignments
+ 
  ## <a name="example"></a> Example
  
  Suppose we download the mouse genome from UCSC Genome Browser.  We will
  ## <a name="example"></a> Example
  
  Suppose we download the mouse genome from UCSC Genome Browser.  We will
@@ -187,9 +208,7 @@ output_name.sim.isoforms.results, output_name.sim.genes.results : Results estima
  
  ## <a name="acknowledgements"></a> Acknowledgements
  
  
  ## <a name="acknowledgements"></a> Acknowledgements
  
-RSEM uses randomc.h and mersenne.cpp from
-<http://lxnt.info/rng/randomc.htm> for random number generation. RSEM
-also uses the [Boost C++](http://www.boost.org) and
+RSEM uses the [Boost C++](http://www.boost.org) and
  [samtools](http://samtools.sourceforge.net) libraries.
  
  ## <a name="license"></a> License
  [samtools](http://samtools.sourceforge.net) libraries.
  
  ## <a name="license"></a> License