ISPOR International 2025

Literature Reviews – The Promised ‘Low-Hanging Fruit’ Application of AI in HEOR?

The use of large language models (LLMs) to augment processes across literature searches, abstract and full-text screening, and the extraction and summarisation of data have demonstrated the potential to increase efficiency and accuracy in literature reviews, transforming them from the resource-intensive research projects they currently are to a living source of all current evidence, summarised and synthesised at the click of a button. Literature reviews have therefore been described as the ‘low hanging fruit’ of artificial intelligence (AI) applications in health economic and outcomes research (HEOR). Is this description justified?

Following on from ISPOR Europe 2024 in Barcelona, where research demonstrated early success in automating abstract screening, the use of AI in literature reviews again proved to be a hot topic at ISPOR 2025. The presentations and panels delved further into increasingly sophisticated AI techniques, with a greater focus on automated data extraction. However, the challenges discussed last year around hallucinations, legal considerations and the need for benchmarking remained a critical focus in ensuring the trustworthiness and regulatory acceptability of AI tools in this domain.

Current Status and Applications

There were multiple research presentations focusing on the use of AI in data extraction, demonstrating progress in this field:

AGHealth.ai, a tool which screens, extracts and synthesises biomedical literature using genAI, was demonstrated.¹ Results were presented for the accuracy of data extractions, compared with a human reviewer, on 61 records, with no mistakes made in the genAI extractions. However, extractions were performed for one outcome of interest only (complete response), therefore not directly comparable to the breadth of data required for extraction in most systematic literature reviews (SLRs).
Claude 3.7 Sonnet was used to read full articles, including PDFs with graphs and complex tables, to automate data extraction across oncology literature.² Almost 120,000 data points were extracted with the rate of true positives exceeding 90% across study characteristics, intervention characteristics, participant characteristics and outcomes. The reproducibility of these results in other disease areas was not reported in this presentation.
Our own innovation and literature review experts presented a research poster on prompt development and testing for data extraction in literature reviews using an AI model.³ Prompts were developed to extract data from economic publications reporting economic models, utility data and cost/resource use data in one disease area and then tested on another disease area to determine the transferability across indications. F1 scores (the harmonic mean of precision and recall) of greater than 0.7 were achieved across all study types, and these scores were generally maintained or exceeded (range 0.68-0.97) on testing the same prompts in a small number of records (n=4) in another disease area, indicating that at least in economic papers, prompts for data extraction could be used in different disease areas with minimal updates. However, testing was only performed in a small number of disease areas; further testing is needed to confirm this hypothesis and provide confidence in the results.

At least one piece of research focused on the use of AI in search strategy development. An LLM-based reasoning agent was evaluated in building Boolean search strings using chain-of-thought reasoning and an iterative agentic workflow whereby a ‘Generator’ LLM suggested search terms and the ‘Critic’ LLM evaluated the search results and provided changes.² Results were validated against 10 Cochrane SLRs, achieving a recall of 76.8%. This was deemed to be acceptable recall but given that almost a quarter of relevant records were not identified, it does pose the question of whether this could really be deemed acceptable for literature reviews being used to inform regulatory and/or health technology assessment (HTA) submissions.

The acceptability of using AI in literature reviews for submissions to key decision-making bodies, such as those submitted as part of HTA, was discussed on a broader level. A scoping review aimed to identify guidelines and recommendations for using AI in literature reviews, summarising the following:²

The Responsible AI in Evidence SynthEsis (RAISE) first draft recommendations require AI use to be reported in evidence synthesis manuscripts transparently, with researchers remaining ultimately responsible for the evidence synthesis and ensuring ethical, legal and regulatory standards are adhered to when using AI. In 2025, the recommendations will be revised to include more details, how-to guidance and create consensus-based guidelines for responsible AI use in evidence synthesis.
A new joint AI Methods Group between the Cochrane Collaboration, the Campbell Collaboration, JBI and the Collaboration for Environmental Evidence (CEE) has been established to focus on AI and automation in evidence synthesis. One of the first actions of this group was to provisionally endorse the next version of the RAISE recommendations and guidance for use.
Cochrane’s focus is on responsible use of AI in systematic reviews: encouraging studies within reviews and research into AI tools, public sharing of this research, and transparency in reporting when AI is used in writing.
The ISPOR Working Group on Generative AI publication highlights that SLRs can be augmented in almost every stage of the review, but not as an autonomous replacement for humans.
The National Institute for Health and Care Excellence (NICE) and Canada’s Drug Agency (CDA-AMC) are aligned in that LLMs may support in most literature review stages; the key area where this methodology is less established is in automated data extraction.
Gaps were identified in addressing how to adapt AI to the less structured nature of pragmatic literature reviews, where reference sets are generally smaller and more heterogeneous.

Implications

AI’s role in literature reviews is poised to grow, with ongoing developments aimed at addressing current limitations. Embracing structured prompting techniques, robust validation and audit trails will be essential to harness AI effectively in this domain. For organisations, the key takeaway is to balance technological innovation with adherence to validation standards, ensuring that automation complements human expertise. While AI is certainly poised to make great strides in this area, describing literature reviews as the ‘low-hanging fruit’ application of AI in HEOR perhaps oversimplifies the work that still needs to done: in the development of suitable AI tools; the rigorous testing of these; and the requirement for not only a human, but an expert human in the loop, to ensure continuation of the highest quality standards, such that systematic reviews can still be relied upon as the gold standard of evidence generation in HEOR.

References

Educational Symposium 030: Driving Evidence-Based Medicine Forward with Generative AI (GenAI). Presented at ISPOR International Congress, Montreal, Canada. 2025.
Research Podium 066: AI-Assisted Literature Reviews: Requirements and Advances. Presented at ISPOR International Congress, Montreal, Canada. 2025.
Lunn L, Cross S, Kumar S, Boulton E, Khan A, Magri G, Slater D, Tiwari S, Murton M. MSR84. Data Extraction in Literature Reviews Using an Artificial Intelligence Model: Prompt Development and Testing. Presented at ISPOR International Congress, Montreal, Canada. 2025.
Introduction to Applied Generative AI Short Course. ISPOR International Congress, Montreal, Canada. 2025.
Issue Panel 038: Accelerating the Adoption of Generative AI in HEOR: Lessons from Early Adopters. Presented at ISPOR International Congress, Montreal, Canada. 2025.
Breakout Session 015: From General to HEOR-Specific: Transforming LLMs into Reliable Research Tools. Presented at ISPOR International Congress, Montreal, Canada. 2025.
Issue Panel 070: AI Agents and Guardrails in HEOR: The Ultimate Solution to GenAI Shortcomings or Just Another Overhyped Tool? Presented at ISPOR International Congress, Montreal, Canada. 2025.

The Literature Reviews team at Costello Medical are actively working on developing and implementing AI tools to increase efficiency in literature reviews. If you would be interested in collaborating with us to test these on your literature review projects, or if you would like any further information on the themes presented above, please get in touch, or visit our Literature Reviews page to find out how our expertise can benefit you. Liz Lunn (Account Coordination Manager) created this article on behalf of Costello Medical. The views/opinions expressed are her own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.

Literature Reviews – The Promised ‘Low-Hanging Fruit’ Application of AI in HEOR?

Current Status and Applications

Challenges and Future Direction

1.

Hallucinations

2.

Assessing Performance

3.

Copyright Considerations

4.

Human Oversight

Implications

Discover more insights

Literature Reviews – The Promised ‘Low-Hanging Fruit’ Application of AI in HEOR?

Innovating Literature Reviews with AI: Our Approach

AI in Literature Reviews: Maximising Benefits, Reducing Harms

Better Together: The Impact of Joint Clinical Assessments on Systematic Literature Reviews

In Defence of the Targeted Literature Review

Systematic Literature Reviews: An Exercise in Compromise?