AI in Literature Reviews: Maximising Benefits, Reducing Harms

In the evolving landscape of systematic literature reviews (SLRs), AI offers a promising toolkit for enhancing efficiency and effectiveness — but only when used wisely. Here we share our experience on how best to achieve efficiency and avoid the pitfalls.

Maximising Benefits

  • Targeted Application: Deploy AI where it can significantly cut down on time spent by human reviewers, such as in abstract screening for large, complex projects. For smaller projects or those with straightforward queries, the benefits gained from AI may be marginal when efficiency has already being pushed close to its limits.
  • Realistic Comparisons: While AI won’t achieve 100% accuracy, neither do human reviewers. We should compare AI’s results to those of an average human reviewer — not the most experienced or the newest team member — to gauge practical efficiency improvements.
  • Adherence to Guidelines: Ensure your AI models align with current standards and expectations, especially from HTA agencies or journals. Staying informed about industry developments is crucial to ensure that the AI tools we develop are both relevant and accepted. For example, the NICE “Use of AI In Evidence Generation” position statement stipulates that human-in-the-loop (HITL) is required in any AI-assisted evidence product.

Reducing Harms

  • Essential Human Oversight: Human involvement is critical in AI-driven evidence synthesis. A HITL approach ensures that human input guides AI throughout the review process. Human oversight is especially crucial in the early stages of the review to prevent significant errors, such as the incorrect removal of relevant studies. It should be expected that the role of the reviewer is more likely to be in input-checking rather than input generation.
  • Interdisciplinary Collaboration: RAISE (Responsible AI in Evidence SynthEsis) is a proposed statement of guidelines and recommendations, currently in consultation. It posits roles for various stakeholders in the evidence synthesis field. For example, development teams need to be respectful of the values of good evidence synthesis, the material the models are built on and be accountable for mistakes: mistakes should be fixed, and not published in models if they make the model substandard. Evidence synthesis teams need to communicate clearly, conduct robust evaluations and report transparently; they also have to assume ultimate responsibility for AI-assisted reviews. Successful AI deployment in literature reviews will depend on seamless collaboration between teams, which means there is a role for funders (and clients) in promoting multi-disciplinary teams in funding applications and tenders.
  • Quality over Quantity: As noted at the Global Evidence Summit 2024, there is a risk of flooding the field with low-quality reviews. It is the responsibility of all evidence synthesis stakeholders is to maintain high standards and ensure that AI-assisted reviews meet or exceed current quality benchmarks.

What Are We Doing About AI?

At Costello Medical, we’ve found success using AI to summarize abstracts into PICOS concepts, as demonstrated in our ISPOR poster (MSR107). In parallel, we’re comparing the use of a machine learning classifier against a traditional RCT search filter in a real-life SLR context. We want to test whether the classifier performs better or worse in metrics like sensitivity and specificity, time and user experience.

One of our ongoing initiatives focuses on developing sophisticated AI prompts for LLM (large language model)-assisted data extraction from studies underpinning the robust evidence base for HTA (Health Technology Assessment) submissions, like randomised controlled trials (RCTs), economic evaluations and health-related cost and resource use studies. In a much-needed approach, the prompts we are developing are being validated on a separate set of new, unseen data, to test and increase their generalisability. While not equal to that of a human, we are seeing some cases where performance is “good” or even “excellent” in terms of F1 score (a balance of precision and recall). Concomitantly, it is helping us pinpoint areas where the models are simply not there yet. The results are promising and a more distant future may see AI replacing a human reviewer for dual extractions, however, we can confidently say that we will not be giving the robots free reign any time soon.

The results from our research projects will be coming out soon, so watch this space!

Looking ahead, our efforts will expand into integrating AI technologies into our in-house SLR platform, which will become a one-stop-shop for the full literature review lifecycle, from protocol development through record review and data extraction, all the way to analysis and reporting of results.

In Conclusion

When thoughtfully integrated, AI can be a powerful ally in conducting literature reviews. AI-assistance has the potential to provide more time and head space to the human evidence experts to focus less on the process and more on the strategic decisions needed on an evidence synthesis project. If this is done while maintaining rigorous oversight, we firmly believe that can leverage AI’s capabilities without compromising on the quality and integrity of our work.

If you would like any further information on the themes presented above, please get in touch, or visit our Literature Reviews page to learn how our expertise can benefit you. Ania Bobrowska (UK Head of Literature Reviews) created this article on behalf of Costello Medical. The views/opinions expressed are her own and do not necessarily reflect those of Costello Medical’s clients or affiliated partners.

Discover more insights