X-Git-Url: https://git.donarmstrong.com/?p=rsem.git;a=blobdiff_plain;f=rsem-calculate-expression;h=f90ec4814891280e7c7b12691b5bdcb7ebfa3fbe;hp=cbfef53fd7da453c07c29a5830691933aa8be1fc;hb=4a435fbee229af1dd5f76b4feb0ca71398ac8796;hpb=28196eb2a94c1585850189028fcaf7d51fbf68b8

diff --git a/rsem-calculate-expression b/rsem-calculate-expression
index cbfef53..f90ec48 100755
--- a/rsem-calculate-expression
+++ b/rsem-calculate-expression
@@ -558,11 +558,13 @@ Show help information.
 
 In its default mode, this program aligns input reads against a reference transcriptome with Bowtie and calculates expression values using the alignments.  RSEM assumes the data are single-end reads with quality scores, unless the '--paired-end' or '--no-qualities' options are specified.  Users may use an alternative aligner by specifying one of the --sam and --bam options, and providing an alignment file in the specified format. However, users should make sure that they align against the indices generated by 'rsem-prepare-reference' and the alignment file satisfies the requirements mentioned in ARGUMENTS section. 
 
-One simple way to make the alignment file (e.g. input.sam) satisfying RSEM's requirements (assuming the aligner used put mates in a paired-end read adjacent) is to use the following command:
+One simple way to make the alignment file satisfying RSEM's requirements (assuming the aligner used put mates in a paired-end read adjacent) is to use 'convert-sam-for-rsem' script. This script only accept SAM format files as input. If a BAM format file is obtained, please use samtools to convert it to a SAM file first. For example, if '/ref/mouse_125' is the 'reference_name' and the SAM file is named 'input.sam', you can run the following command: 
 
-  sort -k 1,1 -s input.sam > input.sorted.sam
+  convert-sam-for-rsem /ref/mouse_125 input.sam -o input_for_rsem.sam  
 
-The SAM/BAM format RSEM uses is v1.4. However, it is compatible with old SAM/BAM format. However, RSEM cannot recognize 0x100 in the FLAG field. In addition, RSEM requires SEQ and QUAL not be '*'. 
+For details, please refer to 'convert-sam-for-rsem's documentation page.
+
+The SAM/BAM format RSEM uses is v1.4. However, it is compatible with old SAM/BAM format. However, RSEM cannot recognize 0x100 in the FLAG field. In addition, RSEM requires SEQ and QUAL are not '*'. 
 
 The user must run 'rsem-prepare-reference' with the appropriate reference before using this program.
 
@@ -572,7 +574,7 @@ Please note that some of the default values for the Bowtie parameters are not th
 
 The temporary directory and all intermediate files will be removed when RSEM finishes unless '--keep-intermediate-files' is specified.
 
-With the "--calc-ci" option, 95% credibility intervals and posterior mean estimates will be calculated in addition to maximum likelihood estimates.
+With the '--calc-ci' option, 95% credibility intervals and posterior mean estimates will be calculated in addition to maximum likelihood estimates.
 
 =head1 OUTPUT
 
@@ -647,7 +649,7 @@ This is a folder instead of a file. All model related statistics are stored in t
 
 =head1 EXAMPLES
 
-Assume the path to the bowtie executables is in the user's PATH environment variable. Reference files are under '/ref' with name 'mm9'. 
+Assume the path to the bowtie executables is in the user's PATH environment variable. Reference files are under '/ref' with name 'mouse_125'. 
 
 1) '/data/mmliver.fq', single-end reads with quality scores. Quality scores are encoded as for 'GA pipeline version >= 1.3'. We want to use 8 threads and generate a genome BAM file:
 
@@ -655,7 +657,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                            -p 8 \
                            --output-genome-bam \
                            /data/mmliver.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                            mmliver_single_quals
 
 2) '/data/mmliver_1.fq' and '/data/mmliver_2.fq', paired-end reads with quality scores. Quality scores are in SANGER format. We want to use 8 threads and do not generate a genome BAM file:
@@ -664,7 +666,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                            --paired-end \
                            /data/mmliver_1.fq \
                            /data/mmliver_2.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                            mmliver_paired_end_quals
 
 3) '/data/mmliver.fa', single-end reads without quality scores. We want to use 8 threads:
@@ -672,7 +674,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
  rsem-calculate-expression -p 8 \
                            --no-qualities \
                            /data/mmliver.fa \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                            mmliver_single_without_quals
 
 4) Data are the same as 1). We want to take a fragment length distribution into consideration. We set the fragment length mean to 150 and the standard deviation to 35. In addition to a BAM file, we also want to generate credibility intervals.  We allow RSEM to use 1GB of memory for CI calculation:
@@ -686,7 +688,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                            --calc-ci \
                            --ci-memory 1024 \
                            /data/mmliver.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                            mmliver_single_quals
 
 5) '/data/mmliver_paired_end_quals.bam', paired-end reads with quality scores.  We want to use 8 threads:
@@ -695,7 +697,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                            --bam \
                            -p 8 \
                            /data/mmliver_paired_end_quals.bam \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                            mmliver_paired_end_quals
 
 =cut