GDC demo shows interactive gene-expression clustering for cohort visualization

Bill, a presenter with the Genomic Data Commons, demonstrated the GDC data portal’s gene expression clustering tool during a live webinar and walked attendees through building cohorts, configuring gene sets and exporting visualization outputs.

The session showed how to create cohorts from TCGA data (examples used: a ‘lung NOS’ cohort of 37 cases and a ‘middle lobe’ cohort of 51 cases) and then use the clustering tool to generate an interactive heat map. Bill explained the matrix layout: “the rows represent genes and the columns represent cases,” and demonstrated that hovering over any cell reveals the case identifier, the gene identifier and the z-score-transformed expression value (examples given included –1.27 and 3.83). A dendrogram on each axis shows similarity relationships; Bill cautioned that adjacency alone does not guarantee similarity and advised users to pay attention to dendrogram structure when interpreting clusters.

Why this matters: the portal’s interactive views let researchers quickly test hypotheses about cohort-level expression patterns without writing code, while still supporting reproducible export of the underlying data. Key portal features highlighted in the demo included: - Gene-set selection: manually type gene symbols (validated by the tool), load the cohort’s top variably expressed genes (top N computed per cohort), or import curated sets from MSigDB (Bill used a lung-cancer survival gene set as an example). - Clinical and biospecimen overlays: add variables such as primary diagnosis and gender to the matrix legend; Bill showed adenocarcinoma vs. squamous cell carcinoma and gender coloring (female blue, male orange) to distinguish groups. - Clustering and display controls: toggle sample clustering (removes sample dendrogram), choose clustering and distance methods, and adjust dendrogram height/width. The demo included changing the z-score cap (default 5) to show how color intensity responds to that threshold. - Export options: download the visualization as an SVG image or export the matrix data as a TSV; Bill noted that TSVs are arranged with rows=cases and columns=genes (the transpose of the on-screen matrix) and recommended transposing if needed for publication figures.

Bill also pointed to complementary gene-level tools in the portal: lollipop plots (somatic mutations for individual genes), gene-summary pages, and case-level disco plots that visualize somatic mutations across the genome. He closed the demo by noting the portal’s customization options (cell outline, legend layout, renaming genes for publication) and where to find documentation at docs.gdc.cancer.gov and the GDC Apps documentation.

The webinar included questions on dendrogram orientation and UI details. An attendee reported a different dendrogram ordering and asked if the layout is random; Bill explained that branch rotation can change visual ordering without altering the underlying hierarchical relationships. He also confirmed the triangular teal “play” button in the analysis center launches the clustering tool.

Next steps and resources: Bill directed users to the GDC documentation and suggested joining the GDC user listserv for release notes and feature announcements. He also offered support via support@nci-gdc.datacommons.io.

GDC demo shows interactive gene-expression clustering for cohort visualization

Summary