results <- cbind(PP, MultiPP$MAP[rownames(PP)], probs)
colnames(results) <- c(colnames(PP), "MAP", "PPDE")
- results <- results[order(results[,"PPDE"], decreasing = TRUE),]
+ ord <- order(results[,"PPDE"], decreasing = TRUE)
+ results <- results[ord,]
write.table(results, file = output_file, sep = "\t")
write.table(MultiPP$Patterns, file = paste(output_file, ".pattern", sep = ""), sep = "\t")
+
+ MultiFC <- GetMultiFC(MultiOut)
+ write.table(MultiFC$CondMeans[ord,], file = paste(output_file, ".condmeans", sep = ""), sep = "\t")
}
can take variance due to read mapping ambiguity into consideration by
grouping isoforms with parent gene's number of isoforms. In addition,
it is more robust to outliers. For more information about EBSeq
-(including the paper describing their method), please visit <a
-href="http://www.biostat.wisc.edu/~ningleng/EBSeq_Package">EBSeq
-website</a>.
+(including the paper describing their method), please visit [EBSeq's
+website](http://www.biostat.wisc.edu/~ningleng/EBSeq_Package).
+
RSEM includes EBSeq in its folder named 'EBSeq'. To use it, first type
Lastly, RSEM provides two scripts, 'rsem-run-ebseq' and
'rsem-control-fdr', to help users find differential expressed
-genes. First, 'rsem-run-ebseq' calls EBSeq to calculate related statistics
+genes/transcripts. First, 'rsem-run-ebseq' calls EBSeq to calculate related statistics
for all genes/transcripts. Run
rsem-run-ebseq --help
two scripts can perform DE analysis on either 2 conditions or multiple
conditions.
+Please note that 'rsem-run-ebseq' and 'rsem-control-fdr' use EBSeq's
+default parameters. For advanced use of EBSeq or information about how
+EBSeq works, please refer to [EBSeq's
+manual](http://www.bioconductor.org/packages/devel/bioc/vignettes/EBSeq/inst/doc/EBSeq_Vignette.pdf).
+
Questions related to EBSeq should
be sent to <a href="mailto:nleng@wisc.edu">Ning Leng</a>.
=item B<input_file>
-This should be the result file generated by 'rsem-run-ebseq', which contains all genes/transcripts and their associated statistics.
+This should be the main result file generated by 'rsem-run-ebseq', which contains all genes/transcripts and their associated statistics.
=item B<fdr_rate>
=item B<output_file>
-This file is a subset of the 'input_file'. It only contains the genes/transcripts called as differentially expressed (DE). When more than 2 conditions exist, DE is defined as not all conditions are equally expressed.
+This file is a subset of the 'input_file'. It only contains the genes/transcripts called as differentially expressed (DE). When more than 2 conditions exist, DE is defined as not all conditions are equally expressed. Because statistical significance does not necessarily mean biological significance, users should also refer to the fold changes to decide which genes/transcripts are biologically significant. When more than two conditions exist, this file will not contain fold change information and users need to calculate it from 'input_file.condmeans' by themselves.
=back
=item B<--soft-threshold>
-Use soft threshold method to control FDR. If this option is set, this program will try to report as many genes/transcripts as possible, as long as their average PPDE >= 1 - fdr_rate. (Default: off)
+Use soft threshold method to control FDR. If this option is set, this program will try to report as many genes/transcripts as possible, as long as their average PPDE >= 1 - fdr_rate. This option is equivalent to use EBSeq's 'crit_fun' for FDR control. (Default: off)
=item B<-h/--help>
rsem-control-fdr GeneMat.results 0.05 GeneMat.de.txt
-If we instead want to use soft threshold method to obtain more called genes/transcripts, we can use:
-
- rsem-control-fdr --soft-threshold GeneMat.results 0.05 GeneMat.de.txt
-
=cut
This file is only generated when there are more than 2 conditions. It defines all possible expression patterns over the conditions using a matrix with names. Each row of the matrix refers to a different expression pattern and each column gives the expression status of a different condition. Two conditions are equally expressed if and only if their statuses are the same.
+=item B<output_file.condmeans>
+
+This file is only generated when there are more than 2 conditions. It gives the normalized mean count value for each gene/transcript at each condition. It is formatted as a matrix with names. Each row represents a gene/transcript and each column represent a condition. The order of genes/transcripts is the same as 'output_file'. This file can be used to calculate fold changes between conditions which users are interested in.
+
=back
=head1 EXAMPLES
rsem-run-ebseq GeneMat 3,4,4 GeneMat.results
-Two files, 'GeneMat.results' and 'GeneMat.results.pattern', will be generated.
+Three files, 'GeneMat.results', 'GeneMat.results.pattern', and 'GeneMat.results.condmeans', will be generated.
=cut