updated some description

author Bo Li <bli@cs.wisc.edu>

Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)

committer Bo Li <bli@cs.wisc.edu>

Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)
author Bo Li <bli@cs.wisc.edu>
Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)
committer Bo Li <bli@cs.wisc.edu>
Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)
diff --git a/README.md b/README.md

index a4c5ce504a0f284bb594e9a57807a17969a4fc51..b46826a58209fda36bc59cff8e860a95707dfe25 100644 (file)
--- a/README.md
+++ b/README.md
@@ -84,8 +84,8 @@ documentation page](http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-express
  #### Calculating expression values from single-end data
  
  For single-end models, users have the option of providing a fragment
-length distribution via the --fragment-length-mean and
---fragment-length-sd options.  The specification of an accurate fragment
+length distribution via the '--fragment-length-mean' and
+'--fragment-length-sd' options.  The specification of an accurate fragment
  length distribution is important for the accuracy of expression level
  estimates from single-end data.  If the fragment length mean and sd are
  not provided, RSEM will not take a fragment length distribution into
@@ -96,12 +96,21 @@ consideration.
  By default, RSEM automates the alignment of reads to reference
  transcripts using the Bowtie alignment program.  To use an alternative
  alignment program, align the input reads against the file
-'reference_name.idx.fa' generated by rsem-prepare-reference, and format
+'reference_name.idx.fa' generated by 'rsem-prepare-reference', and format
  the alignment output in SAM or BAM format.  Then, instead of providing
-reads to rsem-calculate-expression, specify the --sam or --bam option
+reads to 'rsem-calculate-expression', specify the '--sam' or '--bam' option
  and provide the SAM or BAM file as an argument.  When using an
-alternative aligner, you may also want to provide the --no-bowtie option
-to rsem-prepare-reference so that the Bowtie indices are not built.
+alternative aligner, you may also want to provide the '--no-bowtie' option
+to 'rsem-prepare-reference' so that the Bowtie indices are not built.
+
+Some aligners' (other than Bowtie) output might need to be converted
+so that RSEM can use. For conversion, please run
+ 
+   convert-sam-for-rsem --help
+
+to get usage information or visit the [convert-sam-for-rsem
+documentation
+page](http://deweylab.biostat.wisc.edu/rsem/convert-sam-for-rsem.html).
  
  However, please note that RSEM does ** not ** support gapped
  alignments. So make sure that your aligner does not produce alignments
diff --git a/SingleReadQ.h b/SingleReadQ.h

index 8e7d898bdf8c0810c26f17a0403f0e2603463d4e..9fd3dd18156b74674e5f19a43a50e937403181df 100644 (file)
--- a/SingleReadQ.h
+++ b/SingleReadQ.h
@@ -72,7 +72,6 @@ void SingleReadQ::calc_lq(bool hasPolyA, int seedLen) {
         int numA = 0, numT = 0, numAO = 0, numTO = 0; // numAO : number of A in overlap seed region
         int threshold_1, threshold_2;
  
-
         threshold_1 = int(0.9 * len - 1.5 * sqrt(len * 1.0) + 0.5);
         threshold_2 = (OLEN - 1) / 2 + 1;
         for (int i = 0; i < len; i++) {
diff --git a/convert_sam_for_rsem b/convert_sam_for_rsem

index ee035f0ff471b62515ffd173cdea161fe488c521..a6bc1248c5881738a1d4862e47deb446c5f3c865 100755 (executable)
--- a/convert_sam_for_rsem
+++ b/convert_sam_for_rsem
@@ -165,7 +165,9 @@ Show help information.
  
  This program converts the SAM file generated by user's aligner into a SAM file which RSEM can process. However, users should make sure their aligners use 'reference_name.idx.fa' generated by 'rsem-prepare-reference' as their references. In addition, their aligners should output header information and make two mates of the same alignment adjacent to each other for paired-end data. This program will output the converted file into standard output by default for the purpose of piping. By setting '-o' option, users can make the converted file written into disk.
  
-Note: You do not need to run this script if Bowtie (not Bowtie 2) is used.
+Note: You do not need to run this script if Bowtie (not Bowtie 2) is used, or the order of @SQ lines is the same as 'reference_name.idx.fa' and the alignment lines of a same read group together and the mates of the same alignment are adjacent each other for paired-end reads.
+
+Note: This program can only recognize SAM files. See ARGUMENTS section.
  
  =head1 EXAMPLES
  
diff --git a/rsem-calculate-expression b/rsem-calculate-expression

index cbfef53fd7da453c07c29a5830691933aa8be1fc..f90ec4814891280e7c7b12691b5bdcb7ebfa3fbe 100755 (executable)
--- a/rsem-calculate-expression
+++ b/rsem-calculate-expression
@@ -558,11 +558,13 @@ Show help information.
  
  In its default mode, this program aligns input reads against a reference transcriptome with Bowtie and calculates expression values using the alignments.  RSEM assumes the data are single-end reads with quality scores, unless the '--paired-end' or '--no-qualities' options are specified.  Users may use an alternative aligner by specifying one of the --sam and --bam options, and providing an alignment file in the specified format. However, users should make sure that they align against the indices generated by 'rsem-prepare-reference' and the alignment file satisfies the requirements mentioned in ARGUMENTS section. 
  
-One simple way to make the alignment file (e.g. input.sam) satisfying RSEM's requirements (assuming the aligner used put mates in a paired-end read adjacent) is to use the following command:
+One simple way to make the alignment file satisfying RSEM's requirements (assuming the aligner used put mates in a paired-end read adjacent) is to use 'convert-sam-for-rsem' script. This script only accept SAM format files as input. If a BAM format file is obtained, please use samtools to convert it to a SAM file first. For example, if '/ref/mouse_125' is the 'reference_name' and the SAM file is named 'input.sam', you can run the following command: 
  
-  sort -k 1,1 -s input.sam > input.sorted.sam
+  convert-sam-for-rsem /ref/mouse_125 input.sam -o input_for_rsem.sam  
  
-The SAM/BAM format RSEM uses is v1.4. However, it is compatible with old SAM/BAM format. However, RSEM cannot recognize 0x100 in the FLAG field. In addition, RSEM requires SEQ and QUAL not be '*'. 
+For details, please refer to 'convert-sam-for-rsem's documentation page.
+
+The SAM/BAM format RSEM uses is v1.4. However, it is compatible with old SAM/BAM format. However, RSEM cannot recognize 0x100 in the FLAG field. In addition, RSEM requires SEQ and QUAL are not '*'. 
  
  The user must run 'rsem-prepare-reference' with the appropriate reference before using this program.
  
@@ -572,7 +574,7 @@ Please note that some of the default values for the Bowtie parameters are not th
  
  The temporary directory and all intermediate files will be removed when RSEM finishes unless '--keep-intermediate-files' is specified.
  
-With the "--calc-ci" option, 95% credibility intervals and posterior mean estimates will be calculated in addition to maximum likelihood estimates.
+With the '--calc-ci' option, 95% credibility intervals and posterior mean estimates will be calculated in addition to maximum likelihood estimates.
  
  =head1 OUTPUT
  
@@ -647,7 +649,7 @@ This is a folder instead of a file. All model related statistics are stored in t
  
  =head1 EXAMPLES
  
-Assume the path to the bowtie executables is in the user's PATH environment variable. Reference files are under '/ref' with name 'mm9'. 
+Assume the path to the bowtie executables is in the user's PATH environment variable. Reference files are under '/ref' with name 'mouse_125'. 
  
  1) '/data/mmliver.fq', single-end reads with quality scores. Quality scores are encoded as for 'GA pipeline version >= 1.3'. We want to use 8 threads and generate a genome BAM file:
  
@@ -655,7 +657,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                             -p 8 \
                             --output-genome-bam \
                             /data/mmliver.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                             mmliver_single_quals
  
  2) '/data/mmliver_1.fq' and '/data/mmliver_2.fq', paired-end reads with quality scores. Quality scores are in SANGER format. We want to use 8 threads and do not generate a genome BAM file:
@@ -664,7 +666,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                             --paired-end \
                             /data/mmliver_1.fq \
                             /data/mmliver_2.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                             mmliver_paired_end_quals
  
  3) '/data/mmliver.fa', single-end reads without quality scores. We want to use 8 threads:
@@ -672,7 +674,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
   rsem-calculate-expression -p 8 \
                             --no-qualities \
                             /data/mmliver.fa \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                             mmliver_single_without_quals
  
  4) Data are the same as 1). We want to take a fragment length distribution into consideration. We set the fragment length mean to 150 and the standard deviation to 35. In addition to a BAM file, we also want to generate credibility intervals.  We allow RSEM to use 1GB of memory for CI calculation:
@@ -686,7 +688,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                             --calc-ci \
                             --ci-memory 1024 \
                             /data/mmliver.fq \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                             mmliver_single_quals
  
  5) '/data/mmliver_paired_end_quals.bam', paired-end reads with quality scores.  We want to use 8 threads:
@@ -695,7 +697,7 @@ Assume the path to the bowtie executables is in the user's PATH environment vari
                             --bam \
                             -p 8 \
                             /data/mmliver_paired_end_quals.bam \
-                           /ref/mm9 \
+                           /ref/mouse_125 \
                             mmliver_paired_end_quals
  
  =cut
diff --git a/rsem-prepare-reference b/rsem-prepare-reference

index 072bb61cd5de1b17b4ba41d67683f94dfe77b2a9..78743e946f2b2f1b6b02ecdb6cbeeaaebdf3c70e 100755 (executable)
--- a/rsem-prepare-reference
+++ b/rsem-prepare-reference
@@ -211,7 +211,7 @@ B<'reference_name.transcripts.fa'> contains the extracted reference transcripts
  
  =head1 EXAMPLES
  
-1) Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the directory '/data/mm9'.  We want to put the generated reference files under '/ref' with name 'mm9'. We'll add poly(A) tails with length 125. Please note that GTF files generated from UCSC's Table Browser do not contain isoform-gene relationship information.  For the UCSC Genes annotation, this information can be obtained from the knownIsoforms.txt file.  Suppose we want to build Bowtie indices and Bowtie executables are found in '/sw/bowtie'.
+1) Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the directory '/data/mm9'.  We want to put the generated reference files under '/ref' with name 'mouse_125'. We'll add poly(A) tails with length 125. Please note that GTF files generated from UCSC's Table Browser do not contain isoform-gene relationship information.  For the UCSC Genes annotation, this information can be obtained from the knownIsoforms.txt file.  Suppose we want to build Bowtie indices and Bowtie executables are found in '/sw/bowtie'.
  
  There are two ways to write the command:
  
@@ -219,7 +219,7 @@ There are two ways to write the command:
                          --transcript-to-gene-map knownIsoforms.txt \
                          --bowtie-path /sw/bowtie \                  
                          /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
-                        /ref/mm9
+                        /ref/mouse_125
  
  OR
  
@@ -227,14 +227,14 @@ OR
                          --transcript-to-gene-map knownIsoforms.txt \
                          --bowtie-path /sw/bowtie \
                          /data/mm9 \
-                        /ref/mm9
+                        /ref/mouse_125
  
-2) Suppose we only have transcripts from EST tags in 'mm9.fasta'. In addition, we also have isoform-gene information in 'mapping.txt'. We do not want to add any poly(A) tails. The reference_name will be set to 'mm9'. In addition, we do not want to build Bowtie indices, and will use an alternative aligner to align reads against the 'mm9.idx.fa' output file:
+2) Suppose we only have transcripts from EST tags in 'mm9.fasta'. In addition, we also have isoform-gene information in 'mapping.txt'. We do not want to add any poly(A) tails. The reference_name will be set to 'mouse_0'. In addition, we do not want to build Bowtie indices, and will use an alternative aligner to align reads against the 'mouse_0.idx.fa' output file:
  
   rsem-prepare-reference --transcript-to-gene-map mapping.txt \
                          --no-polyA \
                          --no-bowtie \
                          mm9.fasta \
-                        mm9
+                        mouse_0
  
  =cut
author	Bo Li <bli@cs.wisc.edu>
	Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)
committer	Bo Li <bli@cs.wisc.edu>
	Sat, 10 Dec 2011 23:44:50 +0000 (17:44 -0600)
README.md		patch \| blob \| history
SingleReadQ.h		patch \| blob \| history
convert_sam_for_rsem		patch \| blob \| history
rsem-calculate-expression		patch \| blob \| history
rsem-prepare-reference		patch \| blob \| history