The Promise and Pitfalls of Generative AI for Health Economics

Artificial intelligence (AI) is a rapidly evolving field that has the potential to transform health economics (HE) in various ways. At the ISPOR Europe 2023 conference in Copenhagen, several sessions and poster presentations showcased innovative applications of AI in HE.

One of the most intriguing and tangible examples of AI use in HE was a podium presentation which demonstrated how a large language model (LLM) may be utilised directly within the field of HE model development.1 GPT-4 was able to reliably replicate results (within 1–10% for the ICER) from an existing 3-state partitioned survival model for non-small cell lung cancer (NSCLC) by modifying a generic shell R script combined with around 35 prompts.

However, challenges remain in the application of AI to HE. Random variability in model design is inherent to the nature of LLMs, as rerunning the same series of prompts can generate different models. The above session highlighted this: with the GPT-4 model run 15 times, 15 distinct models were generated.1 Outcomes therefore may not be reproducible, representing an additional category of uncertainty when using AI-generated models. Furthermore, AI-produced models are unlikely to be completely accurate; several of the 15 models produced contained at least one error. Another common problem in LLMs is that of ‘hallucinations’, where the LLM may generate incorrect or misleading data, parameters, assumptions, or results that may bias or affect the validity and reliability of the model if not checked and remedied by humans.

The challenges around transparency of AI-generated HE models, highlighted by several speakers and panellists, are also an important consideration.2 Currently, the vast majority of economic models used in health technology assessment (HTA) are produced in Microsoft Excel, a platform that is generally easily interpreted and navigated by users. However, generative AI is far better suited to developing models in R or Python. Despite being heralded as the future of HE modelling, R- or Python-based models have been relatively slow to gain widespread traction in the industry, largely as a result of challenges to transparency and usability for end users. It is likely that AI-generated models may face this same barrier. However, there is potential that in addition to generating HE model code, LLMs such as GPT may aid interpretation of existing R- or Python-based models for users who are unfamiliar with the coding language.

A natural further question is whether R or Python should be the preferred language for LLM-generated models, or if both could become widely used? Python arguably has more potential in AI: it has a rich ecosystem of libraries and packages for AI and generative modelling, and is generally faster and more scalable than R, especially when dealing with large and complex data sets and models. These benefits were demonstrated by the patient simulation model utilising SimPy presented in a podium presentation at ISPOR 2023, however, this model did not directly produce HE outcomes such as ICERs.3 Conversely, R has many specialised health economics and domain-specific packages. These existing packages are likely to improve LLMs’ ability to tackle the nuanced features of health economic modelling, and will remain a clear advantage of R until similar features are available for Python users.

The potential benefits of LLMs for HE are clear – in particular, their ability to synthesise and interpret vast quantities of information rapidly – and these are becoming ever more apparent with the continual improvement of LLMs. However, the barrier to immediate adoption remains high; AI might not be the silver bullet in HE just yet. As noted by the speakers in session 114, the focus should be on using AI to augment the capabilities of health economists, aiding the efficiencies of model generation and allowing rapid validation or interpretation of existing models.4

For more on the ISPOR learnings on the use of generative AI in HEOR more broadly, please see our commentary “AI in HEOR: Pathways, Challenges and Future Directions”.


  1. Session 225. P1 Automating Economic Modelling: A Case Study of AI’s Potential With Large Language Models. ISPOR Europe Congress, Copenhagen, Denmark, 2023.
  2. Session 114. Artificial Intelligence and Machine Learning Tools at the Heart of Future NICE Evaluations: A Prospect or a Pipe Dream? ISPOR Europe Congress, Copenhagen, Denmark, 2023.
  3. Session 225. P4 Testing SimPy, a Library for Patient-Level Simulations in Python: An Application to Resource Management in Healthcare Systems. ISPOR Europe Congress, Copenhagen, Denmark, 2023.
  4. Session 114. Artificial Intelligence and Machine Learning Tools at the Heart of Future NICE Evaluations: A Prospect or a Pipe Dream? ISPOR Europe Congress, Copenhagen, Denmark, 2023.

If you would like any further information on the themes presented above, please do not hesitate to contact Jack Smith-Tilley, Senior Health Economist or Natalie Hearmon, Global Head of Health Economics and Statistics. Jack Smith-Tilley and Natalie Hearmon are employees at Costello Medical. The views/opinions expressed are their own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.