Developing Open-Source User Tools for Interacting with Cancer Data Models

Mentors

Mark A Jensen, Ph.D., Director of Data Management and Interoperability, Biomedical Informatics, and Data Science Directorate, Frederick National Lab for Cancer Research

About Us

Our team is responsible to the National Cancer Institute (NCI) for the management and development of the Cancer Research Data Commons (CRDC), a highly interoperable network of data resources that will enable cancer biologists and bioinformaticists to search and aggregate clinical, genomic, proteomic, image, and other data over thousands of cancer study participants. We believe the CRDC will catalyze breakthoughs in cancer research by enabling the aggregation and analysis of new and existing cancer research data on an unprecedented scale.

Internship Description

The CRDC is conceived around the idea that enabling exploration of “all the data” does not require complete reorganization or extraction of existing data collections. Creating “adaptors” that are able to translate between concepts and values among different databases can enable cross database queries from a single access point, and allow disparate data to be normalized and structured so that it can be analyzed robustly to yield new discoveries.

We are developing in-house tools that help us and others explore, understand, and interconnect disparate models of clinical and genomic cancer data. These tools vary in purpose from command-line file crunchers to graphical interfaces for exploration of highly structured standard models. These tools are open source from their inception and are made available to the community on the GitHub platform.

The intern would be actively coding new features into these tools, or creating web-based graphical user interfaces to their functionality. Depending on the skills and interests of the intern, there would be opportunities to participate in data model development and in researching/creating tools that map cancer data between different models.

This opportunity is fully remote. The intern will perform work and interact with the rest of the team using distributed team tools like GitHub, Slack, and Google Docs. The intern will meet weekly with the mentor, but will have immediate access to the entire team via Slack. The intern may also participate in relevant weekly team meetings, such as project scrum and data systems team.
.

Learning Objectives

There is a wide scope for learning the following highly marketable skills:

  • How to contribute productively to a project within a distributed team environment
  • How to find and apply new software technologies to a practical need
  • How to create software that is easy to use and modify by others
  • Open source software culture and conventions
  • Data modeling principles and techniques
  • Neo4j graph database querying and exploration
  • Javascript web development frameworks such as Express, JQuery, and d3

The successful intern will be able to point to their working software products online as open source software, and will be able to continue to participate in their development as desired after the internship.

Qualifications

  • Experience in purposeful coding outside of coursework, in any programming language
  • Curiosity about software technologies and a willingness to research and experiment with them with minimal external direction
  • Interest in structuring and organizing potentially large scientific datasets for practical ends

Your application must include a current CV and a cover letter describing why this internship is relevant to your career goals.

Follow Us