Citizen Portal
Sign In

Lifetime Citizen Portal Access — AI Briefings, Alerts & Unlimited Follows

GDC showcases single-cell RNA‑seq portal and API in technical webinar

Genomic Data Commons (National Cancer Institute) Webinar · July 16, 2025

Loading...

AI-Generated Content: All content on this page was generated by AI to highlight key points from the meeting. For complete details and context, we recommend watching the full video. so we can fix them.

Summary

The Genomic Data Commons demonstrated its single-cell RNA‑seq harmonization workflow, downloadable file formats, portal visualization app, and a gene-expression API in a recorded webinar; all data (except BAMs) are open-access and accessible via Ensembl gene IDs and case/file UUIDs.

The Genomic Data Commons (GDC) held a webinar demonstrating tools for exploring single‑cell RNA sequencing data, including harmonization pipelines, downloadable analysis files and an expression API. Bill, director of user services for the GDC, opened the session and said the webinar and slides will be posted with captions and that questions would be handled at the end.

Zhenyu, director of bioinformatics at the University of Chicago, described the two-step harmonization workflow used for 10x Genomics single‑cell data. The first step uses Cell Ranger to align reads and generate an alignment BAM, a raw count matrix and a filtered count matrix; GDC applies additional filters to remove empty droplets and doublets to produce a high‑confidence cell matrix. The second step uses the Seurat R package to perform dimensionality reduction (PCA, t‑SNE, UMAP), graph‑based clustering and to output an analysis TSV, a full Seurat HDF5 (loom) file containing logs and intermediate data, and differential‑expression TSVs that compare each cluster to the rest of the sample to identify marker genes.

Bill reviewed the file formats users can download: a matrix‑market tarball containing barcodes, feature/gene lists and the expression matrix; an analysis TSV with per‑barcode read and gene counts, cluster assignments and UMAP/t‑SNE/PCA coordinates; differential‑expression TSVs listing genes, average log2 fold change, adjusted p values and proportions of cells in and out of clusters; and a Seurat HDF5 (loom) file that can be inspected with R and Python packages. He noted Mac users may need to use gunzip in a terminal to extract internal files.

Xinzhou (Sentrue), who collaborates with the University of Chicago and the GDC team, demoed the portal’s single‑cell visualization app. The app presents a table of available experiments and interactive UMAP/t‑SNE/PCA plots where each dot represents a cell and colors indicate Seurat clusters. Users can hide or isolate clusters via the legend, pan and zoom the plot, overlay gene expression as a customizable color gradient, generate contour maps from expression overlays, and switch to a summary tab that compares expression across clusters. Xinzhou also demonstrated clicking a differentially expressed gene in a table to overlay its expression on the UMAP.

Bill then demonstrated the single‑cell gene‑expression API (api.gdc.cancer.gov). The API accepts POST requests with Ensembl gene IDs and a case or file UUID (or submitter/barcode) and returns JSON objects mapping case and file IDs to per‑cell expression values. He recommended using JSON formatters such as jq for readability when handling large outputs. Bill emphasized that all data shown in the demo are open access except for BAM files.

For documentation and follow‑up, presenters directed users to gdc.cancer.gov, portal.gdc.cancer.gov and docs.gdc.cancer.gov, and encouraged signing up for the GDC user listserv for release notifications. The webinar concluded with no audience questions recorded; the recording and slides will be posted when processed.