Duke researchers report AI, image‑quality tools improved pocket colposcope performance in Kenya trial

Researchers presenting at a National Cancer Institute seminar described how algorithmic methods and image‑quality controls can improve diagnostic consistency for portable colposcopes.

Dr. Nimmi Ramanujan described an NCI‑funded program in Kenya that included a randomized trial comparing a pocket colposcope arm with a standard visual inspection with acetic acid (VIA) arm among HPV‑positive and HIV‑positive women. She said sensitivity and specificity between the pocket and VIA arms were not significantly different on the trial's metrics, "but what we did notice is that the positive predictive value was higher for the pocket arm versus the VIA arm." The trial was used as a baseline for developing more consistent decision‑making tools.

To address low disease prevalence and device‑to‑device image differences, the team used domain transfer approaches: they pretrained classifiers on standard colposcope images, then used generative models to synthesize pocket‑colposcope images—especially positive examples—to enrich the pocket dataset before fine‑tuning. Ramanujan said that on an internal test set the algorithm produced an area under the curve (AUC) around 0.73, a figure she called comparable to existing literature.

Image quality proved important. The group developed a blur‑detection pipeline (pseudo‑labeling with a fast Fourier transform threshold followed by a YOLO network) to label and prospectively flag blurry images. Ramanujan said removing blurry images from the test set raised AUC from 0.73 to 0.77 on their data, and that integrating blur‑feedback into routine review improved providers' image clarity over time in the field.

On dataset composition, Ramanujan reported that the primary training corpus included images from six countries but contained relatively few HIV‑positive cases. When Kenya HIV data were added to the training set, performance improved in some respects, but the algorithm's performance on a fully held‑out HIV‑positive test set lagged. "The percentage of that dataset that's HIV positive is really, really small," she said, and added that more accrual is needed to assess performance robustly for this subgroup.

The presentation emphasized validation across independent clinical sites and the need for operational protocols (daily/weekly/monthly reviews and automated quality checks) to maintain performance. Ramanujan concluded that AI can serve as a second or primary virtual reader in future workflows but that further validation, larger datasets and careful integration with providers and developers are required before deployment.

Duke researchers report AI, image‑quality tools improved pocket colposcope performance in Kenya trial

Summary