From: Don Armstrong Date: Wed, 15 Jun 2016 20:22:19 +0000 (-0700) Subject: add supercomputer wishlist X-Git-Url: https://git.donarmstrong.com/?p=don.git;a=commitdiff_plain;h=a792e30c41dea89e2ced4af53b59f4b6a3f1d45c add supercomputer wishlist --- diff --git a/posts/supercomputer_wishlist.mdwn b/posts/supercomputer_wishlist.mdwn new file mode 100644 index 0000000..7d829ef --- /dev/null +++ b/posts/supercomputer_wishlist.mdwn @@ -0,0 +1,35 @@ +[[!meta title="Bioinformatic Supercomputer Wishlist"]] + +Many bioinformatic problems require large amounts of memory and +processor time to complete. For example, running WGCNA across 10^6 CpG +sites requires 10^6 choose 2 or 10^13 comparisons, which needs 10 TB +to store the resulting matrix. While embarrassingly parallel, the +dataset upon which the regressions are calculated is very large, and +cannot fit into main memory of most existing supercomputers, which are +often tuned for small-data fast-interconnect problems. + +Another problem which I am interested in is computing ancestral trees +from whole human genomes. This involves running maximum likelihood +calculations across 10^9 bases and thousands of samples. The matrix +itself could potentially take 1 TB, and calculating the likelihood +across that many positions is computationally expensive. Furthermore, +an exhaustive search of trees for 2000 individuals requires 2000!! +comparisons, or 10^2868; even searching a small fraction of that +subspace requires lots of computational time. + +Some things that a future supercomputer could have that would enable +better solutions to bioinformatic problems include: + +1. Fast local storage +2. Better hierarchical storage with smarter caching. Data should + ideally move easily between local memory, shared memory, local + storage, and remote storage. +3. Fault-tolerant, storage affinity aware schedulers. +4. GPUs and/or other coprocessors with larger memory and faster memory + interconnects. +5. Larger memory (at least on some nodes) +6. Support for docker (or similar) images. +7. Better bioinformatics software which can actually take advantage of + advances in computer architecture. + +[[!tag biology bioinformatics bluewaters]]