X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=README.md;h=f8c4e2709436f97e446ac3dbbda2306818df70f4;hb=a49cbd60d6e84346edbfb27d46f0a3630d404665;hp=0a76a6f06ab54afb865619ae30dafb464e2b1e89;hpb=3a69384beb61e14ce2830191538a6a26bb51d929;p=rsem.git diff --git a/README.md b/README.md index 0a76a6f..f8c4e27 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Table of Contents * * * -

Introduction

+## Introduction RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The new RSEM package (rsem-1.x) provides an @@ -27,9 +27,9 @@ the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. It can also generate genomic-coordinate BAM files and UCSC wiggle files for visualization. In addition, it provides posterior mean and 95% credibility interval -estimates for expression levels. +estimates for expression levels. -

Compilation & Installation

+## Compilation & Installation To compile RSEM, simply run @@ -40,10 +40,14 @@ variable. ### Prerequisites +C++ and Perl are required to be installed. + To take advantage of RSEM's built-in support for the Bowtie alignment program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed. -

Usage

+If you want to plot model learned by RSEM, you should also install R. + +## Usage ### I. Preparing Reference Sequences @@ -62,7 +66,7 @@ To prepare the reference sequences, you should run the rsem-prepare-reference --help to get usage information or visit the [rsem-prepare-reference -documentation page](rsem-prepare-reference.html). +documentation page](http://deweylab.biostat.wisc.edu/rsem/rsem-prepare-reference.html). ### II. Calculating Expression Values @@ -72,7 +76,7 @@ To calculate expression values, you should run the rsem-calculate-expression --help to get usage information or visit the [rsem-calculate-expression -documentation page](rsem-calculate-expression.html). +documentation page](http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html). #### Calculating expression values from single-end data @@ -96,6 +100,12 @@ and provide the SAM or BAM file as an argument. When using an alternative aligner, you may also want to provide the --no-bowtie option to rsem-prepare-reference so that the Bowtie indices are not built. +However, please note that RSEM does ** not ** support gapped +alignments. So make sure that your aligner does not produce alignments +with intersions/deletions. Also, please make sure that you use +'reference_name.idx.fa' , which is generated by RSEM, to build your +aligner's indices. + ### III. Visualization RSEM contains a version of samtools in the 'sam' subdirectory. When @@ -123,7 +133,35 @@ wiggle_name: the name the user wants to use for this wiggle plot Refer to the [UCSC custom track help page](http://genome.ucsc.edu/goldenPath/help/customTrack.html). -

Example

+#### c) Visualize the model learned by RSEM + +RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned. + +Usage: + + rsem-plot-model sample_name outF + +sample_name: the name of the sample analyzed +outF: the file name for plots generated from the model. It is a pdf file + +The plots generated depends on read type and user configuration. It +may include fragment length distribution, mate length distribution, +read start position distribution (RSPD), quality score vs observed +quality given a reference base, position vs percentage of sequencing +error given a reference base and histogram of reads with different +number of alignments. + +fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length + +RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the probability of each bin. RSPD can be used as an indicator of 3' bias + +Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base + +Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error + +Histogram of reads with different number of alignments: x-axis is the number of alignments a read has and y-axis is the number of such reads. The inf in x-axis means number of reads filtered due to too many alignments + +## Example Suppose we download the mouse genome from UCSC Genome Browser. We will use a reference_name of 'mm9'. We have a FASTQ-formatted file, @@ -143,7 +181,7 @@ The commands for this scenario are as follows: rsem-calculate-expression --bowtie-path /sw/bowtie --phred64-quals --fragment-length-mean 150.0 --fragment-length-sd 35.0 -p 8 --out-bam --calc-ci --memory-allocate 1024 /data/mmliver.fq /ref/mm9 mmliver_single_quals rsem-bam2wig mmliver_single_quals.sorted.bam mmliver_single_quals.sorted.wig mmliver_single_quals -

Simulation

+## Simulation ### Usage: @@ -168,13 +206,11 @@ output_name_1.fq & output_name_2.fq if paired-end with quality score. output_name.sim.isoforms.results, output_name.sim.genes.results : Results estimated based on sample values. -

Acknowledgements

+## Acknowledgements -RSEM uses randomc.h and mersenne.cpp from - for random number generation. RSEM -also uses the [Boost C++](http://www.boost.org) and +RSEM uses the [Boost C++](http://www.boost.org) and [samtools](http://samtools.sourceforge.net) libraries. -

License

+## License RSEM is licensed under the [GNU General Public License v3](http://www.gnu.org/licenses/gpl-3.0.html).