From: Don Armstrong Date: Thu, 22 Feb 2018 04:19:23 +0000 (-0800) Subject: Merge branch 'master' into jobs/data_scientist X-Git-Url: https://git.donarmstrong.com/?a=commitdiff_plain;h=b38c97c6bc73b0c05266f82de1652bfff0ed9bed;p=resume.git Merge branch 'master' into jobs/data_scientist --- b38c97c6bc73b0c05266f82de1652bfff0ed9bed diff --cc don_armstrong_resume.org index 5dcfff3,57c9f5e..f42d8d6 --- a/don_armstrong_resume.org +++ b/don_armstrong_resume.org @@@ -29,28 -64,25 +64,47 @@@ ** Batchelor of Science (BS) in Biology \hfill UC Riverside * Skills +** Data Science ++ Reproducible, scalable analyses using *R*, *perl*, and python with + workflows on cloud- and cluster-based systems on terabyte-scale + datasets ++ Experimental design and correction to overcome multiple testing, + confounders, and batch effects using Bayesian and frequentist + methods ++ Design, development, and deployment of algorithms and data-driven + products, including APIs, reports, and interactive web applications ++ Statistical modeling (regression, inference, prediction/forecasting, + time series, and machine learning in very large (> 1TB) datasets) ++ Data mining, cleaning, processing and quality assurance of data + sources and products using tidydata formalisms ++ Visualization using *R*, ggplot, Shiny, and custom written routines. +** Software Development ++ Languages: perl, R, C, C++, python, groovy, sh, make ++ Collaborative Development: git, travis, continuous integration, + automated testing ++ Web, Mobile: Shiny, jQuery, JavaScript ++ Databases: Postgresql (PL/SQL), SQLite, Mysql, NoSQL ++ Office Software: Gnumeric, Libreoffice, \LaTeX, Word, Excel, + Powerpoint + ** Genomics and Epigenomics + + NGS and array-based Genomics and Epigenomics of complex human + diseases using RNA-seq, targeted DNA sequencing, RRBS, Illumina + bead arrays, and Affymetrix microarrays from sample collection to + publication. + + Reproducible, scalable bioinformatics analysis using make, + nextflow, and cwl based workflows on cloud- and cluster-based + systems on terabyte-scale datasets + + Alignment, annotation, and variant calling using existing and custom + software, including GATK, bwa, STAR, and kallisto. + + Correcting for and experimental design to overcome multiple + testing, confounders, and batch effects using Bayesian and + frequentist methods approaches + + Using evolutionary genomics to identify causal human variants + ** Statistics + + Statistical modeling (regression, inference, prediction, and + learning in very large (> 1TB) datasets) + + Addressing confounders and batch effects + + Reproducible research ** Big Data + Parallel and Cloud Computing (slurm, torque, AWS, OpenStack, Azure) + Inter-process communication: MPI, OpenMP