Develop AI Chat Interfaces for Data Commons and Meshes
Mentors
Aarti Venkat, Robert Grossman
About Us
The Center for Translational Data Science is pioneering translational data science to advance biology, medicine, healthcare, and the environment. We’re a dedicated team of researchers and engineers drawing from different backgrounds and ideas to push the boundaries of data-intensive science. We work closely with researchers at the University of Chicago, along with other research groups and consortia. We develop and operate large-scale open-source data clouds and data commons for the scientific research community, including the Bionimbus Protected Data Cloud, OCC Open Science Data Cloud, and the NCI Genomic Data Commons. These computational platforms support thousands of users across the world with varying technical skills, backgrounds, and research objectives.
Externship Background
The Center for Translational Data Science (CTDS) is a leader in the development of data commons and data meshes to accelerate research and discovery from large biological datasets. It also develops large language models and generative AI models over this data. One of the first data commons built by CTDS is the Genomic Data Commons (GDC), the world’s largest data commons for cancer genomics researchers. We’ve built several other data commons, including BioData Catalyst (BDC) focusing on heart. lung, blood and sleep research, and Medical Imaging and Data Resource Center (MIDRC) focusing on medical imaging data. We are currently working on developing chat interfaces based on Retrieval Augmented Generation (RAG) and Retrieval Interleaved Generation (RIG) over data and metadata in data commons and meshes. This internship will focus on the development of components for these chat and other AI interfaces. This experience will be valuable for learning more about how AI tools can assist with responding to various user queries spanning different biological domains.
Externship Objectives
- Understand data model and API structure of individual data commons, including GDC and others.
- Teach LLMs to interoperate with external knowledgebase using RAG pipelines
- Use NLP approaches and postprocessing techniques to assist LLMs in answering user queries spanning data and metadata over data commons
The extern will receive specific training from mentors with structured weekly in-person time commitment in the CTDS office. Remote work will also be considered
Qualifications:
Doctoral/Postdoctoral