From Promise to Practice: The Evolving Role of GenAI in HEOR

Generative artificial intelligence (genAI) has taken the world by storm over the past year largely thanks to the advancement of large language models (LLMs) such as GPT-4 and LLaMA. Many industries are racing to adopt genAI into their workflow and the field of HEOR is no different. As a result, ISPOR International 2024 was brimming with posters, issue panels, and symposia discussing the topic. Much like ISPOR International and ISPOR EU in 2023, countless discussions explored the technology’s potential to revolutionize the field while highlighting the current limitations that must be overcome to best harness its capabilities.

Navigating the Hype Cycle

Feelings towards genAI in HEOR and market access can be described through the lens of the Gartner Hype Cycle, which describes attitudes towards a new technology over time. The cycle starts at the “Technology Trigger” – in this case, the release of genAI and LLMs. After this, visibility of the new technology increases until reaching the “Peak of Inflated Expectations.” As users begin to encounter limitations of the technology firsthand, frustration can occur, and the Hype Cycle descends into the “Trough of Disillusionment,” before climbing the “Slope of Enlightenment,” and finally hitting the “Plateau of Productivity,” where its benefits are widely recognised and utilised.1, 2

Considering most genAI tools initially gained popularity in late 2022, where are we in this Hype Cycle 18 months later? According to audience polling questions and the general sentiment of AI-related sessions throughout the conference, for most applications, genAI appears to have rapidly ascended to the “Peak of Inflated Expectations,” driven by the increased popularity of LLMs with the promise of revolutionizing various industries.1, 3, 4

Image showcasing the Gartner's Hype Cycle

Inflated Expectations vs. Reality: Limitations of GenAI

The potential benefits of LLMs have been discussed in depth previously, so instead we will focus on current challenges with LLMs: factuality (hallucination), consistency and computational requirements. To understand these issues, let us first look at how LLMs are constructed.

Some AI models are trained using labelled data, e.g. a cat recognition AI trained on cat images with a label “cat” attached to them. LLMs, on the other hand, are trained on a large amount of data (often measured in petabytes with mainstream models) in the form of texts and sentences without any labelling attached to the data, i.e., an unsupervised training without active human intervention. The program learns words and sentence structures through observing patterns in existing written sentences. When prompted, it generates responses by filling in gaps in the sentences based on the example texts it was trained on. The initial product from this first stage of training is a model that can understand and be generalised into various fields of study, often referred to as a foundation model. A foundation model can then be fine-tuned into different expert models using domain-specific data to fit the needs of a specific field. For example, Me-LLaMA is a LLM fine-tuned with biomedical data such as claims and electronic health records (EHR) that specialises in medical questions, disease diagnostics and biomedical literature reviews.5
 

Since the introduction of ChatGPT, LLMs have been accused of ‘hallucinating’, meaning that they would generate grammatically correct but factually incorrect responses. But to LLMs, because of the way they are trained (learning sentence structures through observations), responses are generated based on known sentence patterns rather than an awareness of factual accuracy.

They [LLMs] are dream machines. We direct their dreams with prompts. The prompts start the dream and based on the LLM’s hazy recollection of its training documents, most of the time the result goes someplace useful. It’s only when the dreams go into deemed factually incorrect territory that we label it a “hallucination”. It looks like a bug, but it’s just the LLM doing what it always does.
– Andrej Kaparthy, co-founder of OpenAI6

LLMs also suffer from run-to-run inconsistency when generating results due to similar reasons. Furthermore, just because a pattern is observed in the data does not necessarily mean it is appropriate, e.g., the prevalent prescriptions of opioids do not make them a one-size-fits-all solution for pain management.

The state-of-the-art LLMs contain parameters measured in the trillions (GPT-4 is estimated to have over 1.7 trillion parameters), and it is likely to grow even larger in the next generation. The complexity and size of the models mean that we are reliant on powerful supercomputers to train and deploy LLMs which may not be cost-effective for what we are currently trying to achieve. AI-focused computer clusters are available from cloud providers to rent for AI training and inferencing but may also introduce security and confidentiality concerns, especially for users dealing with sensitive medical data.

What’s Next?

While these models continue to improve in the coming years, what can we use them for in the meantime?
 

A recent report from the Deloitte AI Institute suggests that genAI may be farther along the Hype Cycle for applications in content generation, highlighting the ability to generate evidence-based content, tailor information to specific audiences, and ensure compliance with guidelines and standards.7 At the AMCP Annual Meeting earlier this spring we learned how essential the timely release of an up-to-date, comprehensive AMCP dossier can be to aid health care decision makers (HCDMs) in their consideration of a drug, especially within the guidance of the newly-released AMCP Format 5.0. While the development timelines necessary to ensure timely publication of AMCP dossiers and regular updates can lead to resourcing challenges at both a manufacturer and agency level, the current capabilities of genAI could be well suited to support rapid, evidence-based content development for use in these and other reimbursement dossiers, with an appropriate level of human oversight.

LLMs may not always be the best answer or the best AI for the problem at hand, but they can be integrated into the building process for other AI models to accelerate the development cycle. For example, for medical Q&A, most of the decision-making criteria (e.g. treatment eligibility, dosing schedules) are well-defined in product labels and clinical guidelines. This type of IF/THEN relationship and decision-making reasoning can be modelled with symbolic AI, which is computationally less intensive and more transparent in its mechanism as opposed to an LLM. However, we can use LLMs to prepare the necessary data by extracting key information from written materials to facilitate the development of symbolic AI models.8

While these kinds of current use cases for genAI were not as frequently discussed at ISPOR 2024, the concept of human involvement arose at nearly every genAI-related session. Dubbed “human in the loop,” the practice of incorporating human expertise and intervention within the AI-driven workflow is necessary to ensure that critical outputs – such as dossier content or data extractions – are reviewed, validated, and improved by a human to ensure accuracy, reliability, and high quality.1,3,4 Given the complexity and sensitivity of the data we encounter in our work, human involvement is critical to provide contextual insights and domain-specific knowledge that genAI might overlook, enhancing the quality and integrity of the output.

Here at Costello Medical, we are actively investigating the use of genAI in our work and have started to collaborate with clients to incorporate these technologies into projects. Early applications include the genAI-assisted development of summary content to distil complex papers and posters into an easily digestible format, as well as genAI-assisted literature searching to find impactful scientific publications to cite. We are also using genAI to accelerate internal processes and reduce administrative burden, both in basic applications such as email-drafting assistance as well as incorporating genAI into our internal apps.

 
While we watch genAI progress through the Hype Cycle, it will be critical to experiment with practical applications to find immediate utility and fuel the continued use of these tools in our field. As the HEOR community continues to explore and shape the trajectory of genAI, the collaborative effort between technologists and HEOR experts will be crucial. The coming years will undoubtedly witness significant advancements in how genAI is applied within HEOR, and we are excited to continue engaging with these technologies to figure out how they can best be put to use.

References

  1. Educational Symposium Session 134. Harnessing the Power of AI: Revolutionizing HEOR and Market Access. ISPOR International Congress, Atlanta, Georgia, 2024.
  2. Gartner Hype Cycle Research Methodology. Available here. Last accessed May 16, 2024.
  3. Issue Panel Session 103. Emerging Landscape of Health Economic Evaluation in the Era of Generative AI. ISPOR International Congress, Atlanta, Georgia, 2024.
  4. Symposium Session 235. From Hype to Reality: Applications of Generative AI in HEOR and Market Access. ISPOR International Congress, Atlanta, Georgia, 2024.
  5. Breakout Session 206. The Future of Data-Driven HEOR Decision-Making Powered by Generative AI: How Soon is Now? ISPOR International Congress, Atlanta, Georgia, 2024″, 2024.
  6. Karpathy A. On the “hallucination problem”. Available here: Twitter/X, 2023.
  7. Deloitte AI Institute. The Consumer Generative AI Dossier. Available here. Last accessed May 16, 2024.
  8. Educational Symposium Session 105. Accelerating Evidence Generation and Time to Insight With Clinical AI. ISPOR International Congress, Atlanta, Georgia, 2024.

Fittingly, genAI was used to assist in the development of this post. If you would like any further information on the themes presented above, please do not hesitate to contact Alex Emerson, Senior Analyst (LinkedIn) or Blake Liu, Health Economist (LinkedIn). Alex and Blake are employees at Costello Medical. The views/opinions expressed are their own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.