X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=README.md;fp=README.md;h=459616fbffb51d04d7582bd5fc1317837b886f1a;hb=8ca5b7c2fb57bc523431c1e37d5ab9337eccbc37;hp=8d3c15f80f90ed0f52755582af952ebd971b7da4;hpb=9c2e46183a19d661f0a618a8eabe8ce1f6a8e2d6;p=rsem.git
diff --git a/README.md b/README.md
index 8d3c15f..459616f 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@ Table of Contents
* [Simulation](#simulation)
* [Generate Transcript-to-Gene-Map from Trinity Output](#gen_trinity)
* [Differential Expression Analysis](#de)
+* [Authors](#authors)
* [Acknowledgements](#acknowledgements)
* [License](#license)
@@ -300,13 +301,12 @@ named 'EBSeq'.
For more information about EBSeq (including the paper describing their
method), please visit EBSeq
-website. You can also find a local version of vignette under
-'EBSeq/inst/doc/EBSeq_Vignette.pdf'.
+website.
EBSeq requires gene-isoform relationship for its isoform DE
detection. However, for de novo assembled transcriptome, it is hard to
obtain an accurate gene-isoform relationship. Instead, RSEM provides a
-script 'rsem-generate-ngvector', which clusters isoforms based on
+script 'rsem-generate-ngvector', which clusters transcripts based on
measures directly relating to read mappaing ambiguity. First, it
calcualtes the 'unmappability' of each transcript. The 'unmappability'
of a transcript is the ratio between the number of k mers with at
@@ -336,20 +336,54 @@ section 3.2.5 (Page 10) of EBSeq's vignette:
IsoEBres=EBTest(Data=IsoMat, NgVector=NgVec, ...)
For users' convenience, RSEM also provides a script
-'rsem-form-counts-matrix' to extract input matrix from expression
+'rsem-generate-data-matrix' to extract input matrix from expression
results:
- rsem-form-counts-matrix sampleA.[genes/isoforms].results sampleB.[genes/isoforms].results ... > output_name.counts.matrix
+ rsem-generate-data-matrix sampleA.[genes/isoforms].results sampleB.[genes/isoforms].results ... > output_name.counts.matrix
The results files are required to be either all gene level results or
all isoform level results. You can load the matrix into R by
- IsoMat <- read.table(file="output_name.counts.matrix")
+ IsoMat <- data.matrix(read.table(file="output_name.counts.matrix"))
before running function 'EBTest'.
-Questions related to EBSeq should be sent to Ning Leng.
-
+At last, RSEM provides a R script, 'rsem-find-DE', which run EBSeq for
+you.
+
+Usage:
+
+ rsem-find-DE data_matrix_file [--ngvector ngvector_file] number_sample_condition1 FDR_rate output_file
+
+This script calls EBSeq to find differentially expressed genes/transcripts in two conditions.
+
+data_matrix_file: m by n matrix containing expected counts, m is the number of transcripts/genes, n is the number of total samples.
+[--ngvector ngvector_file]: optional field. 'ngvector_file' is calculated by 'rsem-generate-ngvector'. Having this field is recommended for transcript data.
+number_sample_condition1: the number of samples in condition 1. A condition's samples must be adjacent. The left group of samples are defined as condition 1.
+FDR_rate: false discovery rate.
+output_file: the output file.
+
+The results are written as a matrix with row and column names. The row names are the differentially expressed transcripts'/genes' ids. The column names are 'PPEE', 'PPDE', 'PostFC' and 'RealFC'.
+
+PPEE: posterior probability of being equally expressed.
+PPDE: posterior probability of being differentially expressed.
+PostFC: posterior fold change (condition 1 over condition2).
+RealFC: real fold change (condition 1 over condition2).
+
+To get the above usage information, type
+
+ rsem-find-DE
+
+Note: any wrong parameter setting will lead 'rsem-find-DE' to output
+usage information and halt.
+
+Questions related to EBSeq should
+be sent to Ning Leng.
+
+## Authors
+
+RSEM is developed by Bo Li, with substaintial technical input from Colin Dewey.
+
## Acknowledgements
RSEM uses the [Boost C++](http://www.boost.org) and