tested version for tbam2gbam

[rsem.git] / README.md
diff --git a/README.md b/README.md

index 4af35d867a7fd7d3c5ce3dd65f6796ba9a5ab7d2..7ef198ce38fbccaa85149267deb89d74ab69f5d3 100644 (file)
--- a/README.md
+++ b/README.md
@@ -13,12 +13,13 @@ Table of Contents
  * [Usage](#usage)
  * [Example](#example)
  * [Simulation](#simulation)
+* [Generate Transcript-to-Gene-Map from Trinity Output](#gen_trinity)
  * [Acknowledgements](#acknowledgements)
  * [License](#license)
  
  * * *
  
-<h2 id="introduction">Introduction</h2>
+## <a name="introduction"></a> Introduction
  
  RSEM is a software package for estimating gene and isoform expression
  levels from RNA-Seq data.  The new RSEM package (rsem-1.x) provides an
@@ -27,9 +28,9 @@ the EM algorithm, single-end and paired-end read data, quality scores,
  variable-length reads and RSPD estimation. It can also generate
  genomic-coordinate BAM files and UCSC wiggle files for visualization. In
  addition, it provides posterior mean and 95% credibility interval
-estimates for expression levels.
+estimates for expression levels. 
  
-<h2 id="compilation">Compilation & Installation</h2>
+## <a name="compilation"></a> Compilation & Installation
  
  To compile RSEM, simply run
     
@@ -40,10 +41,14 @@ variable.
  
  ### Prerequisites
  
+C++ and Perl are required to be installed. 
+
  To take advantage of RSEM's built-in support for the Bowtie alignment
  program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed.
  
-## Usage <a name="usage"></a>
+If you want to plot model learned by RSEM, you should also install R. 
+
+## <a name="usage"></a> Usage
  
  ### I. Preparing Reference Sequences
  
@@ -96,6 +101,12 @@ and provide the SAM or BAM file as an argument.  When using an
  alternative aligner, you may also want to provide the --no-bowtie option
  to rsem-prepare-reference so that the Bowtie indices are not built.
  
+However, please note that RSEM does ** not ** support gapped
+alignments. So make sure that your aligner does not produce alignments
+with intersions/deletions. Also, please make sure that you use
+'reference_name.idx.fa' , which is generated by RSEM, to build your
+aligner's indices.
+
  ### III. Visualization
  
  RSEM contains a version of samtools in the 'sam' subdirectory. When
@@ -123,7 +134,35 @@ wiggle_name: the name the user wants to use for this wiggle plot
  
  Refer to the [UCSC custom track help page](http://genome.ucsc.edu/goldenPath/help/customTrack.html).
  
-<h2 id="example">Example</h2>
+#### c) Visualize the model learned by RSEM
+
+RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned.
+
+Usage:
+    
+    rsem-plot-model sample_name outF
+
+sample_name: the name of the sample analyzed    
+outF: the file name for plots generated from the model. It is a pdf file    
+
+The plots generated depends on read type and user configuration. It
+may include fragment length distribution, mate length distribution,
+read start position distribution (RSPD), quality score vs observed
+quality given a reference base, position vs percentage of sequencing
+error given a reference base and histogram of reads with different
+number of alignments.
+
+fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length
+
+RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the probability of each bin. RSPD can be used as an indicator of 3' bias
+
+Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base
+
+Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error
+
+Histogram of reads with different number of alignments: x-axis is the number of alignments a read has and y-axis is the number of such reads. The inf in x-axis means number of reads filtered due to too many alignments
+ 
+## <a name="example"></a> Example
  
  Suppose we download the mouse genome from UCSC Genome Browser.  We will
  use a reference_name of 'mm9'.  We have a FASTQ-formatted file,
@@ -143,15 +182,15 @@ The commands for this scenario are as follows:
      rsem-calculate-expression --bowtie-path /sw/bowtie --phred64-quals --fragment-length-mean 150.0 --fragment-length-sd 35.0 -p 8 --out-bam --calc-ci --memory-allocate 1024 /data/mmliver.fq /ref/mm9 mmliver_single_quals
      rsem-bam2wig mmliver_single_quals.sorted.bam mmliver_single_quals.sorted.wig mmliver_single_quals
  
-<h2 id="simulation">Simulation</h2>
+## <a name="simulation"></a> Simulation
  
  ### Usage: 
  
      rsem-simulate-reads reference_name estimated_model_file estimated_isoform_results theta0 N output_name [-q]
  
-estimated_model_file:  File containing model parameters.  Generated by
+estimated_model_file:  file containing model parameters.  Generated by
  rsem-calculate-expression.   
-estimated_isoform_results: File containing isoform expression levels.
+estimated_isoform_results: file containing isoform expression levels.
  Generated by rsem-calculate-expression.   
  theta0: fraction of reads that are "noise" (not derived from a transcript).   
  N: number of reads to simulate.   
@@ -168,13 +207,22 @@ output_name_1.fq & output_name_2.fq if paired-end with quality score.
  
  output_name.sim.isoforms.results, output_name.sim.genes.results : Results estimated based on sample values.
  
-<h2 id="acknowledgements">Acknowledgements</h2> 
+## <a name="gen_trinity"></a> Generate Transcript-to-Gene-Map from Trinity Output
+
+For Trinity users, RSEM provides a perl script to generate transcript-to-gene-map file from the fasta file produced by Trinity.
+
+### Usage:
+
+    extract-transcript-to-gene-map-from-trinity trinity_fasta_file map_file
+
+trinity_fasta_file: the fasta file produced by trinity, which contains all transcripts assembled.    
+map_file: transcript-to-gene-map file's name.    
+ 
+## <a name="acknowledgements"></a> Acknowledgements
  
-RSEM uses randomc.h and mersenne.cpp from
-<http://lxnt.info/rng/randomc.htm> for random number generation. RSEM
-also uses the [Boost C++](http://www.boost.org) and
+RSEM uses the [Boost C++](http://www.boost.org) and
  [samtools](http://samtools.sourceforge.net) libraries.
  
-<h2 id="license">License</h2>
+## <a name="license"></a> License
  
  RSEM is licensed under the [GNU General Public License v3](http://www.gnu.org/licenses/gpl-3.0.html).