X-Git-Url: https://git.donarmstrong.com/?p=don.git;a=blobdiff_plain;f=resume.mdwn;h=6497721cc42c5a14122092a440dc5dea48d989c1;hp=400c266e88a6c831c4f47f53c642b601b1343a98;hb=HEAD;hpb=6615b322d2c85f55dfb72a962a5e0fc55062e581 diff --git a/resume.mdwn b/resume.mdwn index 400c266..43e34c0 100644 --- a/resume.mdwn +++ b/resume.mdwn @@ -1,11 +1,51 @@ [[!meta title="Resumé"]] # Experience -## Research Scientist at UIUC 2015--2017 -+ Primarily responsible for the planning, design, organization, - execution, and analysis of multiple complex epidemiological studies - involving epigenomics, transcriptomics, and genomics of diseases of - pregnancy and post-traumatic stress disorder. +## Team Lead Data Engineering at Ginkgo Bioworks 2022–Present ++ Lead and manged team of data engineers, system administrators, + statisticians, bioinformaticians, and scientists at the PhD level + working within the AgBio unit of Ginkgo Bioworks. ++ Mentored and coached team members in data science, bioinformatics, + data engineering, and statistics. ++ Key leadership role in successful merger of AgBio unit with Ginkgo, + including all relevant R&D business applications and data-adjacent + systems. + +## Team Lead Data Engineering at Bayer Crop Science 2018–2022 ++ Hired, managed, and developed team of 5+ Data Engineers, Systems + Administrators, and Business Analysts working within the Biologics + R&D unit of Bayer Crop Science enabling data capture, data + integration, and operationalization of data analysis pipelines ++ Developed and supervised implementation of data capture, + integration, and analysis strategies to increase the value of + genomics, metabolomics, transcriptomics, spectroscopic, phenotypic + (/in vitro/ and /in planta/), and fermentation/formulation process + data for discovery and development ++ Lead the development of multiple systems while coaching, mentoring, + and developing developers and engineers ++ Served as a key collaborator on multiple cross-function and + cross-divisional projects, including leading the architecture of a + life science collaboration using serverless architecture to provide + machine-learning estimates of critical parameters from + spectrographic measurements ++ Established and developed network of internal and external contacts + for technical implementation of Bayer program goals. + +## Debian Developer 2004–Present ++ Maintained, managed configurations, and resolved issues in multiple + packages written in R, perl, python, scheme, C++, and C. ++ Resolved technical conflicts, developed technical standards, and + provided leadership as the elected chair of the Technical Committee. ++ Developer of [Debbugs](https://bugs.debian.org), a perl and SQL-based issue-tracker with ≥ 100 + million entries with web, REST, and SOAP interfaces. ++ Provided vendor-level support for complex systems integration issues + on Debian GNU/Linux systems. + +## Research Scientist at UIUC 2015–2017 ++ Planning, design, organization, execution, and analysis of multiple + complex epidemiological studies involving epigenomics, + transcriptomics, and genomics of diseases of pregnancy and + post-traumatic stress disorder. + Published results in scientific publications and presented results orally at major scientific conferences. + Wrote and completed grants, including budgeting, scientific @@ -16,141 +56,111 @@ maintain abreast of current scientific literature, principles of scientific research, and modern statistical methodology. + Wrote software and designed relational databases using R, perl, C, - SQL, make, and very large computational systems. - -## Postdoctoral Researcher at USC 2013--2015 -+ Primarily responsible for the design, execution, and analysis of an - epidemiological study to identify genomic variants associated with - systemic lupus erythematosus using targeted deep sequencing. -+ Designed, budgeted, configured, maintained, and supported a secure - linux analysis cluster (MPI/torque) with a shared filesystem (NFS - over gluster) for statistical analyses. + SQL, make, and very large computational systems ([Blue Waters](https://bluewaters.ncsa.illinois.edu/)) + +## Postdoctoral Researcher at USC 2013–2015 ++ Design, execution, and analysis of an epidemiological study to + identify genomic variants associated with systemic lupus + erythematosus using targeted deep sequencing. + Wrote multiple pieces of software to reproducibly analyze and archive large datasets resulting from genomic sequencing. + Coordinated with clinicians, molecular biologists, and biologists to produce analyses and major reports. -## Postdoctoral Researcher at UCR 2010--2012 -+ Primarily responsible for the execution and analysis of an - epidemiological study to identify genomic variants associated with - systemic lupus erythematosus using prior information and array based - approaches in a trio and cross sectional study of individuals from - the Los Angeles and greater United States. +## Postdoctoral Researcher at UCR 2010–2012 ++ Executed and analyzed an epidemiological study to identify genomic + variants associated with systemic lupus erythematosus using prior + information and array based approaches in a trio and cross sectional + study of individuals from the Los Angeles and greater United States. + Wrote and maintained multiple software components to reproducibly perform the analyses. -## Debian Developer 2004--Present -+ Maintained, managed configurations, and resolved issues in multiple - packages written in R, perl, python, scheme, C++, and C. -+ Resolved technical conflicts, developed technical standards, and - provided leadership as the elected chair of the Technical Committee. -+ Developer of [Debbugs](https://bugs.debian.org), a perl and SQL-based issue-tracker with ≥ 100 - million entries with web, REST, and SOAP interfaces. - -## Independent Systems Administrator 2004--Present -+ Researched, recommended, budgeted, designed, deployed, configured, - operated, and monitored highly-available high-performance enterprise - hardware and software for web applications, authentication, backup, - email, and databases. -+ Provided vendor-level support for complex systems integration issues - on Debian GNU/Linux systems. -+ Full life-cycle support of medium and small business networking - infrastructure, including VPN, network security, wireless networks, - routing, DNS, DHCP, and authentication. - # Education + Doctor of Philosophy (PhD) in Cell, Molecular and Developmental Biology at UC Riverside + Batchelor of Science (BS) in Biology at UC Riverside # Skills -## Data Science -+ Reproducible, scalable analyses using *R*, *perl*, and python with - workflows on cloud- and cluster-based systems on terabyte-scale - datasets -+ Experimental design and correction to overcome multiple testing, - confounders, and batch effects using Bayesian and frequentist - methods -+ Design, development, and deployment of algorithms and data-driven - products, including APIs, reports, and interactive web applications -+ Statistical modeling (regression, inference, prediction/forecasting, - time series, and machine learning in very large (> 1TB) datasets) -+ Data mining, cleaning, processing and quality assurance of data - sources and products using tidydata formalisms -+ Visualization using *R*, ggplot, Shiny, and custom written routines. - -## Software Development -+ Languages: perl, R, C, C++, python, groovy, sh, make -+ Collaborative Development: git, travis, continuous integration, - automated testing -+ Web, Mobile: Shiny, jQuery, JavaScript -+ Databases: Postgresql (PL/SQL), SQLite, Mysql, NoSQL -+ Office Software: Gnumeric, Libreoffice, \LaTeX, Word, Excel, - Powerpoint +## Leadership and Mentoring ++ Lead teams of PhD and MD scientists in multiple scientific and + industrial programs ++ Mentored graduate students and Outreachy and Google Summer of Code + interns ++ Former chair of Debian's Technical Committee ++ Head developer behind https://bugs.debian.org -## Genomics and Epigenomics +## Bioinformatics, Genomics, and Epigenomics + NGS and array-based Genomics and Epigenomics of complex human diseases using RNA-seq, targeted DNA sequencing, RRBS, Illumina bead arrays, and Affymetrix microarrays from sample collection to - publication. + publication + Reproducible, scalable bioinformatics analysis using make, nextflow, and cwl based workflows on cloud- and cluster-based systems on terabyte-scale datasets + Alignment, annotation, and variant calling using existing and custom - software, including GATK, bwa, STAR, and kallisto. -+ Correcting for and experimental design to overcome multiple - testing, confounders, and batch effects using Bayesian and - frequentist methods approaches + software, including GATK, bwa, STAR, and kallisto + Using evolutionary genomics to identify causal human variants ## Statistics -+ Statistical modeling (regression, inference, prediction, and - learning in very large (> 1TB) datasets) -+ Addressing confounders and batch effects ++ Statistical modeling (regression, inference, prediction, and machine + learning in very large (> 1TB) datasets) using R and python. ++ Correcting & experimental design to overcome multiple testing, + confounders, and batch effects (both Bayesian and frequentist) + Reproducible research +## Software Development ++ Languages: python, R, perl, C, C++, python, groovy, sh (bash, POSIX, + and zsh), make ++ Collaborative Development: git, Jira, gitlab CI/CD, github actions, + Aha!, continuous integration & deployment, automated testing ++ Web, Mobile: Shiny, jQuery, JavaScript ++ Databases: Postgresql (PL/SQL), SQLite, Mysql, NoSQL + ## Big Data + Parallel and Cloud Computing (slurm, torque, AWS, OpenStack, Azure) + Inter-process communication: MPI, OpenMP + Filestorage: Gluster, CEFS, GPFS, Lustre + Linux system administration -## Genomics and Epigenomics -+ Linkage and association-based mapping of complex phenotypes using - next-generation sequencing and arrays -+ Alignment, annotation, and variant calling using existing and custom - software +## Applications and Daemons ++ Web: apache, ngix, varnish (load balancing/caching), REST, SOAP, + Tomcat ++ Build Tools: GNU make, cmake ++ Virtualization: libvirt, KVM, qemu, VMware, docker ++ VCS: git, mercurial, subversion ++ Mail: postfix, exim, sendmail, spamassassin ++ Configuration Infrastructure: puppet, hiera, etckeeper, git ++ Documentation: \LaTeX, confluence, emacs, MarkDown, MediaWiki, ikiwiki, trac ++ Monitoring: munin, nagios, icinga, prometheus ++ Issue Tracking: Debbugs, Request Tracker, Trac, JIRA ++ Office Software: Gnumeric, Libreoffice, \LaTeX, Word, Excel, + Powerpoint -## Mentoring and Leadership -+ Mentored graduate students and Outreachy and Google Summer of Code - interns -+ Former chair of Debian's Technical Committee +## Networking ++ Hardware, Linux routing and firewall experience, ferm, DHCP, + openvpn, bonding, NAT, DNHS, SNMP, IPv4, and IPv6. + +## Operating systems ++ GNU/Linux (Debian, Ubuntu, Red Hat) ++ Windows ++ MacOS ## Communication + Strong written communication skills as evidenced by publication record -+ Strong verbal and presentation skills as evidenced by presentation - and teaching record - -## Consortia Involvement -+ *H3A Bionet*: Generating workflows and cloud resources for H3 Africa -+ *Psychiatric Genomics Consortium*: Identification of epigenetic - variants which are correlated with PTSD. -+ *SLEGEN*: System lupus erythematosus genetics consortium. ++ Strong verbal and presentation skills as evidenced by presentation, + leadership, and teaching record -# Authored Software +# Authored Open Source Software + *[Debbugs](http://bugs.debian.org)*: Bug tracking software for the Debian GNU/Linux - distribution. [https://bugs.debian.org] -+ *[CairoHacks](https://git.donarmstrong.com/r/CairoHacks.git)*: Bookmarks and Raster images for large PDF plots in R. -+ *[Function2Gene](http://rzlab.ucr.edu/function2gene/)*: Gene selection tool based on literature mining which - enables Bayesian approaches to significance testing. -+ *[Helical Wheel Projections](http://rzlab.ucr.edu/scripts/wheel/wheel.cgi?sequence=ABCDEFGHIJLKMNOP&submit=Submit)*: Web-based tool to draw helical wheel - protein projections. [http://rzlab.ucr.edu/scripts/wheel] - -# Publications and Presentations -+ 24 peer-reviewed publications cited over 1800 times: + distribution. ++ *[CairoHacks](http://git.donarmstrong.com/r/CairoHacks.git)*: Bookmarks and Raster images for large PDF plots in R. +* Publications and Presentations ++ 24 peer-reviewed publications cited over 3000 times: https://dla2.us/pubs -+ H index of 11 -+ Numerous invited talks on EWAS of PTSD, genetics of SLE, and Open ++ Publication record in GWAS, transcriptomics, SLE, GBM, epigenetics, + comparative evolution of mammals, and lipid membranes ++ H index >= 20 ++ Multiple presentations on EWAS of PTSD, genetics of SLE, and Open Source: https://dla2.us/pres # Funding and Awards @@ -166,16 +176,16 @@ Role: Key Personnel ## Scholarships and Fellowships -+ 2001--2003: University of California, Riverside Doctoral Fellowship -+ 1997--2001: Regents of the University of California Scholarship. ++ 2001–2003: University of California, Riverside Doctoral Fellowship ++ 1997–2001: Regents of the University of California Scholarship. # Academic Information -You can also read my [curriculum_vitae](Curriculum Vitæ) -([dla-cv.pdf](pdf)), [research_statement](Research Statement) -([research_statement.pdf](pdf)), -and [teaching_statement](Teaching Statement) -([teaching_statement.pdf](pdf)). +You can also read my [Curriculum Vitæ](curriculum_vitae) +([pdf](dla-cv.pdf)), [Research Statement](research_statement) +([pdf](research_statement.pdf)), +and [Teaching Statement](teaching_statement) +([pdf](teaching_statement.pdf)). For my contact information or additional references, please e-mail