'rsem-run-ebseq': Added 'output_file.condmeans' for tests with more than 2 conditions

author Bo Li <bli@cs.wisc.edu>

Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)

committer Bo Li <bli@cs.wisc.edu>

Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)
author Bo Li <bli@cs.wisc.edu>
Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)
committer Bo Li <bli@cs.wisc.edu>
Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)
diff --git a/EBSeq/rsem-for-ebseq-find-DE b/EBSeq/rsem-for-ebseq-find-DE

index 07b989991cea46e63d064eacaa475c3731227398..3c5d304d6ca5bb250789c3ac50b904a686fcde9a 100755 (executable)
--- a/EBSeq/rsem-for-ebseq-find-DE
+++ b/EBSeq/rsem-for-ebseq-find-DE
@@ -58,8 +58,12 @@ if (nc == 2) {
  
    results <- cbind(PP, MultiPP$MAP[rownames(PP)], probs)
    colnames(results) <- c(colnames(PP), "MAP", "PPDE")  
-  results <- results[order(results[,"PPDE"], decreasing = TRUE),]
+  ord <- order(results[,"PPDE"], decreasing = TRUE)
+  results <- results[ord,]
    write.table(results, file = output_file, sep = "\t")
  
    write.table(MultiPP$Patterns, file = paste(output_file, ".pattern", sep = ""), sep = "\t")
+
+  MultiFC <- GetMultiFC(MultiOut)
+  write.table(MultiFC$CondMeans[ord,], file = paste(output_file, ".condmeans", sep = ""), sep = "\t")
  }
diff --git a/README.md b/README.md

index b0fa97cd27c9831e9fb8876f28a5c3d725d6b1fc..f729ef8136adfa47c293ff4d6d0aada39dd30416 100644 (file)
--- a/README.md
+++ b/README.md
@@ -313,9 +313,9 @@ EBSeq, an empirical Bayesian DE analysis tool developed in UW-Madison,
  can take variance due to read mapping ambiguity into consideration by
  grouping isoforms with parent gene's number of isoforms. In addition,
  it is more robust to outliers. For more information about EBSeq
-(including the paper describing their method), please visit <a
-href="http://www.biostat.wisc.edu/~ningleng/EBSeq_Package">EBSeq
-website</a>.
+(including the paper describing their method), please visit [EBSeq's
+website](http://www.biostat.wisc.edu/~ningleng/EBSeq_Package).
+
  
  RSEM includes EBSeq in its folder named 'EBSeq'. To use it, first type
  
@@ -369,7 +369,7 @@ before running either 'EBTest' or 'EBMultiTest'.
  
  Lastly, RSEM provides two scripts, 'rsem-run-ebseq' and
  'rsem-control-fdr', to help users find differential expressed
-genes. First, 'rsem-run-ebseq' calls EBSeq to calculate related statistics
+genes/transcripts. First, 'rsem-run-ebseq' calls EBSeq to calculate related statistics
  for all genes/transcripts. Run 
  
      rsem-run-ebseq --help
@@ -387,6 +387,11 @@ page](http://deweylab.biostat.wisc.edu/rsem/rsem-control-fdr.html). These
  two scripts can perform DE analysis on either 2 conditions or multiple
  conditions.
  
+Please note that 'rsem-run-ebseq' and 'rsem-control-fdr' use EBSeq's
+default parameters. For advanced use of EBSeq or information about how
+EBSeq works, please refer to [EBSeq's
+manual](http://www.bioconductor.org/packages/devel/bioc/vignettes/EBSeq/inst/doc/EBSeq_Vignette.pdf).
+
  Questions related to EBSeq should
  be sent to <a href="mailto:nleng@wisc.edu">Ning Leng</a>.
  
diff --git a/rsem-control-fdr b/rsem-control-fdr

index 86cfb936d6e4ff91acd7f3a3c4ead43ecf6a826c..65c57cbc866277fa772ba59760d59e5c696cbfec 100755 (executable)
--- a/rsem-control-fdr
+++ b/rsem-control-fdr
@@ -74,7 +74,7 @@ rsem-control-fdr [options] input_file fdr_rate output_file
  
  =item B<input_file>
  
-This should be the result file generated by 'rsem-run-ebseq', which contains all genes/transcripts and their associated statistics.
+This should be the main result file generated by 'rsem-run-ebseq', which contains all genes/transcripts and their associated statistics.
  
  =item B<fdr_rate>
  
@@ -82,7 +82,7 @@ The desire false discovery rate (FDR).
  
  =item B<output_file>
  
-This file is a subset of the 'input_file'. It only contains the genes/transcripts called as differentially expressed (DE). When more than 2 conditions exist, DE is defined as not all conditions are equally expressed.
+This file is a subset of the 'input_file'. It only contains the genes/transcripts called as differentially expressed (DE). When more than 2 conditions exist, DE is defined as not all conditions are equally expressed.  Because statistical significance does not necessarily mean biological significance, users should also refer to the fold changes to decide which genes/transcripts are biologically significant. When more than two conditions exist, this file will not contain fold change information and users need to calculate it from 'input_file.condmeans' by themselves.
  
  =back
  
@@ -96,7 +96,7 @@ Use hard threshold method to control FDR. If this option is set, only those gene
  
  =item B<--soft-threshold>
  
-Use soft threshold method to control FDR. If this option is set, this program will try to report as many genes/transcripts as possible, as long as their average PPDE >= 1 - fdr_rate. (Default: off)
+Use soft threshold method to control FDR. If this option is set, this program will try to report as many genes/transcripts as possible, as long as their average PPDE >= 1 - fdr_rate. This option is equivalent to use EBSeq's 'crit_fun' for FDR control. (Default: off)
  
  =item B<-h/--help> 
  
@@ -114,8 +114,4 @@ We assume that we have 'GeneMat.results' as input. We want to control FDR at 0.0
  
   rsem-control-fdr GeneMat.results 0.05 GeneMat.de.txt
  
-If we instead want to use soft threshold method to obtain more called genes/transcripts, we can use:
-
- rsem-control-fdr --soft-threshold GeneMat.results 0.05 GeneMat.de.txt
-
  =cut
diff --git a/rsem-run-ebseq b/rsem-run-ebseq

index c4140c53551f6be26ad23435194edff2dd609cc5..cd4750f4a6107dccbdb27efbaf6e56e025d0a2ae 100755 (executable)
--- a/rsem-run-ebseq
+++ b/rsem-run-ebseq
@@ -93,6 +93,10 @@ If there are more than 2 different conditions among the samples, the output form
  
  This file is only generated when there are more than 2 conditions. It defines all possible expression patterns over the conditions using a matrix with names. Each row of the matrix refers to a different expression pattern and each column gives the expression status of a different condition. Two conditions are equally expressed if and only if their statuses are the same.
  
+=item B<output_file.condmeans>
+
+This file is only generated when there are more than 2 conditions. It gives the normalized mean count value for each gene/transcript at each condition. It is formatted as a matrix with names. Each row represents a gene/transcript and each column represent a condition. The order of genes/transcripts is the same as 'output_file'. This file can be used to calculate fold changes between conditions which users are interested in.  
+
  =back
  
  =head1 EXAMPLES
@@ -107,7 +111,7 @@ The results will be in 'IsoMat.results'.
  
   rsem-run-ebseq GeneMat 3,4,4 GeneMat.results
  
-Two files, 'GeneMat.results' and 'GeneMat.results.pattern', will be generated. 
+Three files, 'GeneMat.results', 'GeneMat.results.pattern', and 'GeneMat.results.condmeans', will be generated. 
  
  =cut
author	Bo Li <bli@cs.wisc.edu>
	Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)
committer	Bo Li <bli@cs.wisc.edu>
	Sat, 7 Sep 2013 04:00:12 +0000 (23:00 -0500)
EBSeq/rsem-for-ebseq-find-DE		patch \| blob \| history
README.md		patch \| blob \| history
rsem-control-fdr		patch \| blob \| history
rsem-run-ebseq		patch \| blob \| history