CAGI Scientist Opening: apply here ☀️

CAGI MLD Bootcamp 2022

Start: August 22, 2022

End: September 2, 2022

Location: Online, synchronous.

Meeting times: 4-5pm EDT lecture; 5:30-6:30pm EDT code lab, each day. See Eastern Daylight Time.

Meeting links: directly emailed to participants.

Meeting materials: regularly posted after each lecture for registered CAGI users.




CAGI is pleased to announce a two-week free online genomics and coding bootcamp. This bootcamp aims to introduce undergraduate students to genetics and Python programming. 

The course itself will consist of two one-hour sessions per day for two weeks. Each day, there will be a one-hour lecture on topics ranging from genetics to machine learning with guest lecturers from Northeastern University, UC Berkeley, ClinGen, and the MLD Foundation. After each lecture, during one-hour code labs, participants will obtain hands-on experience learning the basics of Python programming, and will be guided through the process of generating their very own machine-learning predictors. We will use real-world datasets and solve problems with the potential to impact individuals with rare disease. This bootcamp will be held in parallel with the ARSA challenge of the Critical Assessment of Genome Interpretation (CAGI) on the data provided by BioMarin. It is our goal that, after this course, participants will be able to write and execute their own Python code and come away with a deeper understanding of genetics and rare disease. 

Undergraduates interested in participating in this bootcamp can apply here. Accepted participants will be announced on August 15th, one week before the course start date.

Please note that spaces are limited. Preference will be given to candidates with little to no coding experience and who are otherwise most likely to benefit from this course.


Week 1:

August 22:

(lecture, 4:00pm EDT) Introductions, syllabus overview, genetics basics I

(code lab, 5:30pm EDT) Python introduction, installing Anaconda, Jupyter notebooks

August 23:

(lecture, 4:00pm EDT) Genetics basics II

(code lab, 5:30pm EDT) Loading, parsing, and exploring data

August 24:

(lecture, 4:00pm EDT) Machine learning basics I

(code lab, 5:30pm EDT) Exploring data and model building

August 25:

(lecture, 4:00pm EDT) Machine learning basics II

(code lab, 5:30pm EDT) Model building and evaluation

August 26

(lecture, 4:00pm EDT) Genetics refresh, Critical Assessment of Genome Interpretation (CAGI)

(code lab, 5:30pm EDT) Forming prediction teams, handling ARSA CAGI challenge datasets

Week 2:

August 29:

(lecture, 4:00pm EDT) Data sources for genomics research, meta-predictor development

(code lab, 5:30pm EDT) Teams work on predictions in breakout rooms

August 30:

(lecture, 4:00pm EDT) ClinGen, biocuration, and genetic testing

(code lab, 5:30pm EDT) Teams work on predictions in breakout rooms

August 31:

(lecture, 4:00pm EDT) MLD Foundation and new-born screening

(code lab, 5:30pm EDT) Teams work on predictions in breakout rooms

September 1:

(lecture, 4:00pm EDT) Real-world application and impact of genetics research

(code lab, 5:30pm EDT) Teams submit predictions for the ARSA challenge

September 2:

4:00pm EDT: Predictor performance evaluation, discussion, concluding ceremony 


Courtney Astore, Constantina Bakolitsa, Steven Brenner, Wyatt Clark, Shantanu Jain, Akash Kamandula, Reet Mishra, Vikas Pejaver, Predrag Radivojac, Rashika Ramola, Marena Trinidad, Michelle Velyunskiy, Daniel Zeiberg.

Guest Speakers

Jenny Goldstein (University of North Carolina), Dean Suhr (MLD Foundation).

Last updated: October 8, 2022