1 [[!meta title="Resumé"]]
4 ## Career Synopsis & Outlook
5 + A proven, transformative leader of teams that enable businesses to
6 harness the value of scientific and business data to achieve
7 business goals in biotechnology and other biology-adjacent
9 + Significant experience mentoring, coaching, managing, and leading
10 managers and individual contributors from the entry level to
11 principal level, enabling them to develop into their full potential
12 as leaders and contributors.
13 + Extensive scientific, computational, analytical, and business
14 background coupled with a history of effective communication with
15 diverse audiences enables bridging the needs and requirements of
16 challenging stakeholders and earning their trust and buy-in even in
17 complex, highly regulated environments.
18 + Seeking opportunities to grow, lead, and transform organizations
19 with a larger scope and greater impact.
21 ## Director of Data Science and Analytics at Ginkgo Bioworks 2022--Present
22 + Directed a geographically distributed team of managers who lead
23 teams of data engineers, system administrators, statisticians,
24 bioinformaticians, and scientists at the PhD level working within
25 the Ag business unit of [Ginkgo Bioworks](https://www.ginkgobioworks.com)
26 + Accountable for the data architecture, engineering, management, and
27 governance of all data within the Ag Business unit, including
28 complex modalities of research and development data from genomics to
29 complex phenotypic data, including chemistry, production, systems
30 biology, and business data.
31 + Accountable for cost centers totalling $10 M annually, including
32 budgeting, procurement, vendor relationships, and policy compliance.
33 + Hired and developed team members in data science, bioinformatics,
34 data engineering, software engineering, and statistics using
35 coaching, mentorship, and teaching approaches.
36 + Accountable (and frequently responsible) for all R&D IT applications
37 in a business unit, including vendor selection, architectural
38 decisions, deployment, and development where appropriate.
39 + Championed modern approaches to data governance and data stewardship
40 principles across multiple life-science and business functions.
41 + Lead the development of multiple cloud-based serverless and
42 container-based applications in AWS and GCP with multiple API and UI
43 interfaces written in python and javascript to enable the management
44 of data, with dbt, airflow, postgresql, and snowflake handling data
45 storage and plumbing roles.
46 + Key leadership role in multiple mergers and acquisitions,
47 specializing in R&D business applications and data-adjacent systems.
48 + Extensive collaborations with scientific, business, and customer
49 leaders attest to my excellent communication and interpersonal
52 ## Team Lead Data Engineering at Bayer Crop Science 2018--2022
53 + Hired, managed, and developed team of 5+ Data Engineers, Systems
54 Administrators, and Business Analysts working within the Biologics
55 R&D unit of Bayer Crop Science enabling data capture, data
56 integration, and operationalization of data analysis pipelines.
57 + Developed and supervised implementation of data capture,
58 integration, and analysis strategies to increase the value of
59 genomics, metabolomics, transcriptomics, spectroscopic, phenotypic
60 (/in vitro/ and /in planta/), and fermentation/formulation process.
61 data for discovery and development using AWS, python, postgresql, R, and
62 + Lead the development of multiple systems while coaching, mentoring,
63 and developing software and data engineers.
64 + Served as a key collaborator on multiple cross-function and
65 cross-divisional projects, including leading the architecture of a
66 life science collaboration using serverless architecture to provide
67 machine-learning estimates of critical parameters from
68 spectrographic measurements.
70 ## Debian Developer 2004--Present
71 + Maintained, managed configurations, and resolved issues in multiple
72 packages written in R, perl, python, scheme, C++, and C.
73 + Resolved technical conflicts, developed technical standards, and
74 provided leadership as the elected chair of the Technical Committee.
75 + Developer of [Debbugs](https://bugs.debian.org), a perl and SQL-based issue-tracker with ≥ 100
76 million entries with web, REST, and SOAP interfaces.
77 + Provided vendor-level support for complex systems integration issues
78 on Debian GNU/Linux systems.
80 ## Research Scientist at UIUC 2015--2017
81 + Architected and engineered systems to store, retrieve, and analyze
82 complex R&D data including behavioral healthcare data (PTSD),
83 genomic, epigenomic, and other phenotypic healthcare data
84 (pre-eclampsia), while maintaining compliance with data privacy
85 regulations including HIPAA and institutional review boards.
86 + Planning, design, organization, execution, and analysis of multiple
87 complex epidemiological studies involving epigenomics,
88 transcriptomics, and genomics of diseases of pregnancy and
89 post-traumatic stress disorder.
90 + Published results in scientific publications and presented results
91 orally at major scientific conferences.
92 + Wrote and completed grants, including budgeting, scientific
93 direction, project management, and reporting.
94 + Mentored graduate students and collaborated with internal and
96 + Performed literature review, training, and applied new techniques to
97 maintain abreast of current scientific literature, principles of
98 scientific research, and modern statistical methodology.
99 + Wrote software and designed relational databases using R, perl, C,
100 SQL, make, and very large computational systems ([[https://bluewaters.ncsa.illinois.edu/][Blue Waters]])
102 ## Postdoctoral Researcher at USC 2013--2015
103 + Design, execution, and analysis of an epidemiological study to
104 identify genomic variants associated with systemic lupus
105 erythematosus using targeted deep sequencing.
106 + Wrote multiple pieces of software to reproducibly analyze and
107 archive large datasets resulting from genomic sequencing.
108 + Coordinated with clinicians, molecular biologists, and biologists to
109 produce analyses and major reports.
111 ## Postdoctoral Researcher at UCR 2010–2012
112 + Executed and analyzed an epidemiological study to identify genomic
113 variants associated with systemic lupus erythematosus using prior
114 information and array based approaches in a trio and cross sectional
115 study of individuals from the Los Angeles and greater United States.
116 + Wrote and maintained multiple software components to reproducibly
117 perform the analyses.
120 + Doctor of Philosophy (PhD) in Cell, Molecular and Developmental Biology at UC Riverside
121 + Batchelor of Science (BS) in Biology at UC Riverside
124 ## Leadership and Mentoring
125 + Lead managers and teams of PhD-level scientists in multiple
126 scientific and industrial programs.
127 + Mentorship of multiple employees, graduate students, and
128 undergraduates throughout career, helping them to fully develop
129 their potential and thrive.
130 + Chair or lead of multiple initiatives and committees, including
131 aligning highly cross-functional and diverse stakeholders.
133 ## Data Governance/Management/Engineering
134 + Leadership and implementation of data governance and management
135 programs across multiple functions within Ginkgo and Bayer.
136 + Establishment of Metadata and master data management standards and
137 frameworks in life science and business domains.
138 + Snowflake, dbt, Airflow
140 ## Bioinformatics, Genomics, and Epigenomics
141 + NGS and array-based Genomics and Epigenomics of complex human
142 diseases using RNA-seq, targeted DNA sequencing, RRBS, Illumina
143 bead arrays, and Affymetrix microarrays from sample collection to
145 + Reproducible, scalable bioinformatics analysis using make,
146 nextflow, and cwl based workflows on cloud- and cluster-based
147 systems on terabyte-scale datasets
148 + Alignment, annotation, and variant calling using existing and custom
149 software, including GATK, bwa, STAR, and kallisto
150 + Using evolutionary genomics to identify causal human variants
153 + Statistical modeling (regression, inference, prediction, and machine
154 learning in very large (> 1TB) datasets) using R and python.
155 + Correcting & experimental design to overcome multiple testing,
156 confounders, and batch effects (both Bayesian and frequentist)
157 + Reproducible research
159 ## Software Development
160 + Languages: python, R, perl, C, C++, groovy, sh (bash, POSIX,
162 + Collaborative Development: git, Jira, gitlab CI/CD, github actions,
163 Aha!, continuous integration & deployment, automated testing
164 + Web, Mobile: Shiny, jQuery, JavaScript
165 + Databases: Postgresql (PL/SQL), SQLite, Mysql, NoSQL, RDS
166 + Cloud: AWS, Azure, GCP, OpenStack
167 + Infrastructure as Code: AWS Cloudformation, Terraform, puppet,
171 + Parallel and Cloud Computing (slurm, torque, AWS, OpenStack, Azure)
172 + Inter-process communication: MPI, OpenMP
173 + Filestorage: Gluster, CEFS, GPFS, Lustre
174 + Linux system administration
176 ## Applications and Daemons
177 + Web: apache, ngix, varnish (load balancing/caching), REST, SOAP,
179 + Build Tools: GNU make, cmake
180 + Virtualization: libvirt, KVM, qemu, VMware, docker
181 + VCS: git, mercurial, subversion
182 + Mail: postfix, exim, sendmail, spamassassin
183 + Configuration Infrastructure: puppet, hiera, etckeeper, git
184 + Documentation: \LaTeX, confluence, emacs, MarkDown, MediaWiki, ikiwiki, trac
185 + Monitoring: munin, nagios, icinga, prometheus
186 + Issue Tracking: Debbugs, Request Tracker, Trac, JIRA
187 + Office Software: Gnumeric, Libreoffice, \LaTeX, Word, Excel,
191 + Hardware, Linux routing and firewall experience, ferm, DHCP,
192 openvpn, bonding, NAT, DNHS, SNMP, IPv4, and IPv6.
194 + GNU/Linux (Debian, Ubuntu, Red Hat)
199 + Strong written communication skills as evidenced by publication
201 + Proven experience communicating with cross-functional and diverse
202 teams and stakeholders at all organizational levels.
203 + Strong verbal and presentation skills as evidenced by presentation,
204 leadership, and teaching record
206 # Authored Open Source Software
207 + *[Debbugs](http://bugs.debian.org)*: Bug tracking software for the Debian GNU/Linux
209 + *[CairoHacks](http://git.donarmstrong.com/r/CairoHacks.git)*: Bookmarks and Raster images for large PDF plots in R.
210 + *[Function2Gene](http://rzlab.ucr.edu/function2gene/)*: Gene selection tool based on literature mining which
211 enables Bayesian approaches to significance testing.
212 + *[Helical Wheel Projections](http://rzlab.ucr.edu/scripts/wheel/wheel.cgi?sequence=ABCDEFGHIJLKMNOP&submit=Submit)*: Web-based tool to draw helical wheel
215 # Publications and Presentations
216 + 24 peer-reviewed publications cited over 4000 times:
218 + Publication record in GWAS, transcriptomics, SLE, GBM, epigenetics,
219 comparative evolution of mammals, and lipid membranes
221 + Multiple presentations on EWAS of PTSD, genetics of SLE, and Open
222 Source: https://dla2.us/pres
226 + 2017 R Consortium: *[Adding Linux Binary Builders to R-Hub](https://www.r-consortium.org/blog/2017/04/03/q1-2017-isc-grants)* Role:
228 + 2015 Blue Waters Allocation Grant: *Making ancestral trees using Bayesian
229 inference to identify disease-causing genetic variants* Role:
231 + *Tracking placenta and uterine funciton using urinary extracellular vesicles* (R21
232 RFA-HD-16-037) Role: Key Personnel
233 + *NIAMS* R01-AR045650-04 *Genetics of Childhood Onset SLE* to Chaim O. Jacob.
236 ## Scholarships and Fellowships
237 + 2001–2003: University of California, Riverside Doctoral Fellowship
238 + 1997–2001: Regents of the University of California Scholarship.
240 # Academic Information
242 You can also read my [Curriculum Vitæ](curriculum_vitae)
243 ([pdf](dla-cv.pdf)), [Research Statement](research_statement)
244 ([pdf](research_statement.pdf)),
245 and [Teaching Statement](teaching_statement)
246 ([pdf](teaching_statement.pdf)).
248 For my contact information or additional references, please e-mail
249 <don@donarmstrong.com>