--bowtie-chunkmbs

[rsem.git] / README.md
diff --git a/README.md b/README.md

index 5acd5937db2f573f9e2a3900baa9bb5b84cdec3c..7ef198ce38fbccaa85149267deb89d74ab69f5d3 100644 (file)
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ Table of Contents
  * [Usage](#usage)
  * [Example](#example)
  * [Simulation](#simulation)
+* [Generate Transcript-to-Gene-Map from Trinity Output](#gen_trinity)
  * [Acknowledgements](#acknowledgements)
  * [License](#license)
  
@@ -27,7 +28,7 @@ the EM algorithm, single-end and paired-end read data, quality scores,
  variable-length reads and RSPD estimation. It can also generate
  genomic-coordinate BAM files and UCSC wiggle files for visualization. In
  addition, it provides posterior mean and 95% credibility interval
-estimates for expression levels.
+estimates for expression levels. 
  
  ## <a name="compilation"></a> Compilation & Installation
  
@@ -100,6 +101,12 @@ and provide the SAM or BAM file as an argument.  When using an
  alternative aligner, you may also want to provide the --no-bowtie option
  to rsem-prepare-reference so that the Bowtie indices are not built.
  
+However, please note that RSEM does ** not ** support gapped
+alignments. So make sure that your aligner does not produce alignments
+with intersions/deletions. Also, please make sure that you use
+'reference_name.idx.fa' , which is generated by RSEM, to build your
+aligner's indices.
+
  ### III. Visualization
  
  RSEM contains a version of samtools in the 'sam' subdirectory. When
@@ -133,16 +140,17 @@ RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned.
  
  Usage:
      
-    rsem-plot-model modelF outF
+    rsem-plot-model sample_name outF
  
-modelF: the sample_name.model file generated by RSEM    
+sample_name: the name of the sample analyzed    
  outF: the file name for plots generated from the model. It is a pdf file    
  
  The plots generated depends on read type and user configuration. It
  may include fragment length distribution, mate length distribution,
  read start position distribution (RSPD), quality score vs observed
  quality given a reference base, position vs percentage of sequencing
-error given a reference base.
+error given a reference base and histogram of reads with different
+number of alignments.
  
  fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length
  
@@ -151,6 +159,8 @@ RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the prob
  Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base
  
  Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error
+
+Histogram of reads with different number of alignments: x-axis is the number of alignments a read has and y-axis is the number of such reads. The inf in x-axis means number of reads filtered due to too many alignments
   
  ## <a name="example"></a> Example
  
@@ -178,9 +188,9 @@ The commands for this scenario are as follows:
  
      rsem-simulate-reads reference_name estimated_model_file estimated_isoform_results theta0 N output_name [-q]
  
-estimated_model_file:  File containing model parameters.  Generated by
+estimated_model_file:  file containing model parameters.  Generated by
  rsem-calculate-expression.   
-estimated_isoform_results: File containing isoform expression levels.
+estimated_isoform_results: file containing isoform expression levels.
  Generated by rsem-calculate-expression.   
  theta0: fraction of reads that are "noise" (not derived from a transcript).   
  N: number of reads to simulate.   
@@ -197,11 +207,20 @@ output_name_1.fq & output_name_2.fq if paired-end with quality score.
  
  output_name.sim.isoforms.results, output_name.sim.genes.results : Results estimated based on sample values.
  
+## <a name="gen_trinity"></a> Generate Transcript-to-Gene-Map from Trinity Output
+
+For Trinity users, RSEM provides a perl script to generate transcript-to-gene-map file from the fasta file produced by Trinity.
+
+### Usage:
+
+    extract-transcript-to-gene-map-from-trinity trinity_fasta_file map_file
+
+trinity_fasta_file: the fasta file produced by trinity, which contains all transcripts assembled.    
+map_file: transcript-to-gene-map file's name.    
+ 
  ## <a name="acknowledgements"></a> Acknowledgements
  
-RSEM uses randomc.h and mersenne.cpp from
-<http://lxnt.info/rng/randomc.htm> for random number generation. RSEM
-also uses the [Boost C++](http://www.boost.org) and
+RSEM uses the [Boost C++](http://www.boost.org) and
  [samtools](http://samtools.sourceforge.net) libraries.
  
  ## <a name="license"></a> License