Librarians outline standards and cautious roadmap for using AI in systematic reviews

Margaret Foster, head of the Center for Systematic Reviews and Research Synthesis at the Texas A&M University Medical Sciences Library, told a Network of the National Library of Medicine Region 3 webinar that authors must remain “responsible for whatever, you know, the AI that they're incorporating.” She said that responsibility includes verifying AI outputs, reporting AI use transparently and ensuring ethical standards are met.

Foster framed the discussion around RAISE (Responsible AI and Evidence Synthesis) guidance, noting RAISE’s workstreams on building, evaluating and selecting AI for reviews. She said researchers should first define which review steps will use AI — for example, sorting, data extraction or question formulation — then validate tools and pilot them before full integration. “One of their working groups … is focusing on safe and responsible use of AI,” Foster said, adding that teams should give feedback to AI creators so tools improve over time.

Why it matters: The speakers argued that standards and validation are central because large language models (LLMs) can produce inconsistent results while some machine-learning (ML) sorting tools are more reproducible. Foster distinguished ML-based tools that can be validated against known training data from LLMs, which she said “are not transparent” and can return different answers to the same prompt. She recommended human-in-the-loop pilot stages, predefined protocols, and ongoing monitoring with clear quality checks.

Supporting details: Foster highlighted existing reporting communities — including PRISMA guidance and comments from the International Committee of Medical Journal Editors — and recommended following Cochrane resources for evidence on AI tool performance. She also advised teams to measure tool reliability (precision, recall) and to consider non-technical impacts such as energy and infrastructure costs.

Next steps and limitations: Foster said her team will evaluate a one-year trial subscription to the Consensus service and encouraged librarians to pilot tools, document protocols, and publish SOAR-style studies (standalone evaluations) to build evidence. She repeatedly cautioned that LLMs are not yet appropriate for decision-making steps that require reproducibility, urging restraint where wrong outputs could mislead reviewers.

The webinar session concluded with a pledge to make slides and resource links available via the NNLM YouTube channel.

Librarians outline standards and cautious roadmap for using AI in systematic reviews

Summary