Lifetime Citizen Portal Access — AI Briefings, Alerts & Unlimited Follows
NCI and VA’s BD STEP fellowship trains data scientists on national VA health data to boost cancer research
Loading...
Summary
The Big Data Scientist Training Enhancement Program (BD STEP), a collaboration between the National Cancer Institute and the Veterans Health Administration, places early-career data scientists in VA settings to work with large electronic health record repositories, with the goal of strengthening cancer research, retaining technical talent in the VA and preparing fellows to apply AI responsibly in clinical contexts.
The National Cancer Institute and the Veterans Health Administration’s partnership on the Big Data Scientist Training Enhancement Program (BD STEP) places early-career quantitative researchers inside the VA to work with its national electronic health record resources and build careers in cancer research and clinical data science.
Oliver Bobler, host of NCI’s Inside Cancer Careers, introduced BD STEP on the program’s episode featuring Dr. Michelle Birney Lang, director of BD STEP at NCI’s Center for Strategic Scientific Initiatives; Dr. Frank Meng, National Director for BD STEP at the Department of Veterans Affairs; and Dr. Ted Feldman, a BD STEP alumnus and current mentor and data scientist at a VA research center.
Frank Meng described the VA’s Corporate Data Warehouse (CDW) as “a centralized repository of all the EHR data of the patients that get their care within the VA,” spanning decades and millions of patients. He said CDW combines structured elements such as ICD diagnosis codes, lab results and vitals with unstructured clinical notes and imaging, a mix that allows trainees to test methods that operate on real-world clinical data. "The VA is actually the only national health care network in the U.S.," Meng said, calling the system a "powerful test bed" for research.
NCI’s role, Michelle Lang said, began with seed investment to help launch BD STEP. "From our perspective, we see tremendous opportunity for important cancer research analysis," she said, noting the program helps examine treatment delivery and outcomes in populations that can be more diverse than typical clinical trials. Lang said BD STEP also aims to prepare fellows for future NCI and NIH research careers.
The fellowship combines a national curriculum with site-based research and a required service project that Meng said occupies roughly 25% of a fellow’s time. Those service projects are intended to broaden fellows’ experience and connect them to VA stakeholders; Feldman described a project on early colorectal cancer detection that compared lab-based algorithms with risk-score approaches and drew guidance from NCI biostatisticians.
Program scale, Meng said, includes 14 fellows currently in place (the program’s capacity is 16), 58 fellows who received VA funding since the program began in 2015, and roughly 60 total participants when individuals supported by other funds are included. Meng said fellows come from a widening set of disciplines — from computer science and engineering to computational biology, epidemiology, health policy and clinical psychology — and that staff data scientists help fellows shore up database and SQL skills when needed.
Feldman, who returned to BD STEP as a mentor, said the fellowship’s value includes hands-on exposure to where data arise in clinical workflows and practical lessons on data reliability. "You get a front row seat," he said, pointing to the operational resources available in the VA, such as certain genetic datasets and the Million Veterans Program.
The guests discussed artificial intelligence as both a growing tool and a training priority. Meng said he expects BD STEP fellows to understand "the good and the bad" of AI — including biases in data — and to be able to choose appropriate methods. Lang and Feldman emphasized explainability and validation; Feldman said that sometimes "regression is good enough" and that part of training is learning when a complex model is actually needed.
The episode closed with career-advice takeaways for aspiring health-data scientists: build technical skills (databases, programming and statistics), cultivate communication and cross-discipline collaboration skills, and pursue diverse opportunities. Feldman and Lang also highlighted the value of mentorship and networking as mechanisms that help fellows move into sustainable research roles.
The podcast episode and its discussion reflect BD STEP’s dual goals: to leverage the VA’s large, longitudinal data resources for cancer research and to embed trainees in the clinical and operational environment so they can develop practical, ethically informed skills — including how to apply AI to healthcare questions. The episode ends with contact information and production credits.
The program announcement for the Warder McCaskill Stevens K12 career-development award was aired during the episode; application details were referred to in the show notes.

