Building Repeatable and Automatable Scientific Computing Platforms with Modern Tools: Machine Learning as a Use Case

TITLE:

CSRC Colloquium

DATE:

Friday, May 8, 2020

TIME:

3:00 PM

LOCATION:

Virtual Zoom Conference

SPEAKER:

Dr. John Aven, Director of Engineering, Hashmap, Inc.,

ABSTRACT:

Repeatability (Reproducibility, Sharability, Auditabilty, and Immutability) in research, in academia and in industry, is not optional – it is a fundamental requirement for any modern solution, analysis or result set. Furthermore, running analysis on your personal devices is passe – monitoring such processes and the temporary loss of device functionality are not attractive. Fortunately, in scientific computing, as can be evidenced by the data science phenomenon, matures and adopts best practices from modern software engineering, the tools we use and how we operate will be and are changing. Also, as the software engineering field evolves, additional tools become available – and at our disposal. By combining these approaches and the demand for repeatability in our scientific work it is possible to design and implement frameworks that enable us to completely track all stages and versions of our investigations with cloud-native cross-platform (platform agnostic) deployability. We will present design concepts around such needs and provide an exemplar implementation.

Bio: John Aven is a Data Scientist/Data Engineer/Architect/Mathematician with a Ph.D. in Computational Science from SDSU’s CSRC – where he worked with Antonio Palacios and Visarath In working in Stochastic Dynamical Systems. He is currently the Director of Engineering at Hashmap – a Boutique Data and Cloud Computing Consulting firm – where some of his responsibilities include leading the development of all new consulting practices and technical solutions, their technical strategy and development, the internal infrastructure over all three major cloud vendors, internal book and presentation clubs, as well as the aggregation and dissemination of knowledge to all technical employees. Over the course of his career, he has worked in numerous areas and has worn many hats. He has spent time (as a researcher) in the medical sciences space – neuroscience at NIMH and Radiomics/Clinical Trial Design at MD Anderson Cancer Center – and in Oil and Gas – Data Science stylized Reservoir Characterization (and related) research and development. From there he moved more into the software engineering/data engineering/architecture space building solutions around clinical trial design and simulation, data collection & processing – streaming and batch processing of data – productization and industrialization of data science solutions and systems/enterprise level design of cloud computing solutions in various industries with a focus on Enterprise-wide solutions and XOps (DevOps/DataOps/MLOps/…) solutions/culture/processes.

HOST:

Jose Castillo

VIDEO: