Genomic Data Commons offers the biggest asset in cancer genomics

Genomic Data Commons offers the biggest asset in cancer genomics

Overview

  • Post By :

  • Source: University of Chicago Medical Center

  • Date: 27 Feb,2021

The National Cancer Institute’s Genomic Data Commons (GDC), launched in 2016 by then-Vice President Joseph Biden and hosted at the University of Chicago, has become one of the largest and most widely used resources in cancer genomics, with over 3.3 petabytes of data from more than 65 jobs and more than 84,000 anonymized patient cases, serving over 50,000 unique users every month.

In new papers published Feb. 22 in Nature Communications and Nature Genetics, the UChicago-based research team shares fresh details about the GDC, which is funded by the National Cancer Institute (NCI), via subcontract with the Frederick National Laboratory for Cancer Research, presently operated by Leidos Biomedical Research, Inc..

Among the newspapers describes the design and performance of the GDC. The other explains the pipelines used by the GDC for the harmonization of information submitted to the GDC and the generation of datasets used by the GDC research community.

The goal of the GDC is to offer the cancer research community using a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis from the support of precision medicine.

Data creation for what could become the GDC started in June 2015 using a private cloud. After just a year, the GDC had analyzed more than 50,000 raw sequencing data inputs. The processing pipelines described in the Nature paper have generated over 1,660 TB of data on over two dozen types of primary cancers. These data are stored within the GDC Data Portal, where they are available for viewing and downloading.

Together with the information portal, the GDC also offers additional user resources, such as the GDC Data Analysis, Visualization, and Exploration (DAVE) Tools for interactive exploration of data by genomic variant or special alteration; the GDC Data Submission Portal for submitting data; the GDC Data Transfer Tool (DTT) for downloading large genomic datasets; and the GDC data harmonization system, which allows users to run data submitted to the GDC through the harmonizing processing pipelines.

These data have a critical role to play. As data accumulates, new signals will become easier to identify as important targets for understanding cancer biology. In addition, the data-sharing infrastructure can serve to inform research studies, providing new insight into genetic variation between individuals and how it may affect cancer patient outcomes.”

Robert Grossman, PhD, Principal Investigator, Genomic Data Commons, Director, Center for Translational Data Science, University of Chicago

Source:
Journal reference:

Zhang, Z., et al. (2021) Uniform genomic data analysis in the NCI Genomic Data Commons. Nature Communicationsdoi.org/10.1038/s41467-021-21254-9.

About Author