From Promise to Practice: The Evolving Role of GenAI in HEOR
Generative artificial intelligence (genAI) has taken the world by storm over the past year largely thanks to the advancement of large language models (LLMs) such as GPT-4 and LLaMA. Many industries are racing to adopt genAI into their workflow and the field of HEOR is no different. As a result, ISPOR International 2024 was brimming with posters, issue panels, and symposia discussing the topic. Much like ISPOR International and ISPOR EU in 2023, countless discussions explored the technology’s potential to revolutionize the field while highlighting the current limitations that must be overcome to best harness its capabilities.
Navigating the Hype Cycle
Feelings towards genAI in HEOR and market access can be described through the lens of the Gartner Hype Cycle, which describes attitudes towards a new technology over time. The cycle starts at the “Technology Trigger” – in this case, the release of genAI and LLMs. After this, visibility of the new technology increases until reaching the “Peak of Inflated Expectations.” As users begin to encounter limitations of the technology firsthand, frustration can occur, and the Hype Cycle descends into the “Trough of Disillusionment,” before climbing the “Slope of Enlightenment,” and finally hitting the “Plateau of Productivity,” where its benefits are widely recognised and utilised.1, 2
Considering most genAI tools initially gained popularity in late 2022, where are we in this Hype Cycle 18 months later? According to audience polling questions and the general sentiment of AI-related sessions throughout the conference, for most applications, genAI appears to have rapidly ascended to the “Peak of Inflated Expectations,” driven by the increased popularity of LLMs with the promise of revolutionizing various industries.1, 3, 4
Inflated Expectations vs. Reality: Limitations of GenAI
The potential benefits of LLMs have been discussed in depth previously, so instead we will focus on current challenges with LLMs: factuality (hallucination), consistency and computational requirements. To understand these issues, let us first look at how LLMs are constructed.
Some AI models are trained using labelled data, e.g. a cat recognition AI trained on cat images with a label “cat” attached to them. LLMs, on the other hand, are trained on a large amount of data (often measured in petabytes with mainstream models) in the form of texts and sentences without any labelling attached to the data, i.e., an unsupervised training without active human intervention. The program learns words and sentence structures through observing patterns in existing written sentences. When prompted, it generates responses by filling in gaps in the sentences based on the example texts it was trained on. The initial product from this first stage of training is a model that can understand and be generalised into various fields of study, often referred to as a foundation model. A foundation model can then be fine-tuned into different expert models using domain-specific data to fit the needs of a specific field. For example, Me-LLaMA is a LLM fine-tuned with biomedical data such as claims and electronic health records (EHR) that specialises in medical questions, disease diagnostics and biomedical literature reviews.5