Data Quality Assurance Internship

Center for Data Intensive Science

Mentors

Robert Grossman, Ph.D., Professor at UChicago and Chief Data Scientist at Open Data Group
Michael Fitsimons, Ph.D., Director of User Services and Outreach for UChicago Center for Data Intensive Science (CDIS)
Gajanan Ganji, M.S., Data Quality Engineer, UChicago

About Us

The Center for Translational Data Science is pioneering translational data science to advance biology, medicine, healthcare, and the environment. We’re a dedicated team of researchers and engineers drawing from different backgrounds and ideas to push the boundaries of data-intensive science.  We work closely with researchers at the University of Chicago, along with other research groups and consortia. We develop and operate large-scale open-source data clouds and data commons for the scientific research community, including the Bionimbus Protected Data Cloud, OCC Open Science Data Cloud, and the NCI Genomic Data Commons. These computational platforms support hundreds of users across the world with varying technical skills, backgrounds, and research objectives.

Internship Description

The intern(s) will act as data quality assurance liaison, engaging with multiple teams to build and automate frameworks such as anomaly detection, reporting, and alerting to ensure data quality. You shall gain expertise not only in the data itself, but the systems as well in order to interrogate the data and understand gaps in data quality management. Data and metadata quality has a broad scope therefore you are expected work collaboratively across teams to determine priorities and best methods for achieving objectives. Work will be performed in a fast-paced entrepreneurial-like setting where the intern(s) will be expected to drive efforts with a well-organized project management approach.

Learning Objectives

  • Receive training on GDC data test methodology, GDC data model/management, developing data quality tests to prepare you to assist with CTDS projects.
  • Receive hands on experience performing various tasks in test lifecycle – designing tests, execution, test results / defect reporting and task management using jira.
  • Deliverables of this initiative could include additions or improvements to existing data quality tests.

Qualifications

  • Eagerness to learn new skills and tools
  • Creativity, curiosity, empathy, and observational skills
  • Experience with testing is highly recommended
  • Experience with Python, SQL queries & basic shell scripting is highly recommended.
  • Experience with graph and NoSQL databases is a plus!
  • Experience with test automation is a plus!
  • Openness to receiving constructive feedback on assignments
  • Experience with cloud platforms and genomics is a plus!

Your application must include a current CV and a cover letter describing why this internship is relevant to your career goals.