Catalyst for Change or Status Quo? Reflections on TSD 26 on Expert Elicitation

In March 2025, the National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) published Technical Support Document (TSD) 26. With a particular focus on oncology, TSD26 provides guidance on the application of expert elicitation to address the challenges faced in health technology assessments (HTAs) that require extrapolation of survival data over an extended modelled time horizon.

The use of structured expert elicitation (SEE) to validate survival extrapolations in oncology is not yet commonplace in HTA, with choice of extrapolation typically informed by statistical and visual fit to available data. Assessment of external validity through consultation with clinical experts is common, but this tends to take the form of individual consultations or advisory boards, focusing on retrospective validation of existing extrapolations or elicitation of subjective ‘best estimates’ of survival. Retrospective validation is inherently subjective and prone to personal bias, as highlighted in TSD21.

At Costello Medical, we have a long track record of published research focusing on methods for survival extrapolation, and we have always advocated for the use of structured methods to validate selection of survival outcomes in oncology.1-10 Only four of 35 appraisals reviewed in TSD26 were identified as using SEE for survival outcomes, and Costello Medical supported three out of the four appraisals using structured approaches. It should be noted that these elicitation exercises were conducted under high time-pressure and within limited budgets, necessitating modification of published elicitation protocols. Beyond our work supporting SEE for HTA submissions, we have also conducted SEE for survival outcomes using formal methods such as the Sheffield Elicitation Framework (SHELF).

We therefore welcome the publication of more detailed guidelines on SEE for survival outcomes, but we understand the challenges faced by manufacturers that may limit their implementation. A key challenge is timeline pressure, particularly the availability of clinical data often close to submission deadlines; these data are required for inclusion in the evidence dossier shared with experts. Structured elicitation exercises are also substantially more resource-intensive than the more pragmatic approaches outlined above, requiring more preparation and more time from experts. Manufacturers are increasingly facing budget pressures in the UK setting, and the added value of structured methods in terms of HTA outcomes remains unclear.

Based on our experience conducting SEE for survival outcomes, and supporting over 190 HTA submissions in oncology indications, we have reflected on the recommendations provided in TSD26 and shared our perspectives below.

Why Have the NICE DSU Published This TSD?

TSD26 provides detailed guidelines on the use of SEE for technology appraisals (TAs) in oncology, building on the recommendations from TSD14 and TSD21 regarding model-based extrapolations. A review within TSD26 highlighted that few NICE submissions so far have employed methods of structured elicitation, with general consultation remaining the preferred method of elicitation. These approaches are often met with criticism from healthcare decision-makers as they typically rely on retrospective validation of existing survival curves or elicitation of subjective ‘best estimates’ of survival, which are prone to bias and do not adequately capture uncertainty. TSD26 advocates for the broader adoption of SEE to improve methodological rigour in HTA submissions, following in the footsteps of updated guidance from the Zorginstituut Nederland (ZIN) last year in providing more clarity on best practices for conducting SEE, without going as far as making it obligatory for quantitative expert elicitation for NICE.11

What Are TSD26’s Key Recommendations?

Oakley et al. suggest that established protocols for eliciting probability distributions from experts in a structured manner should form the basis of how elicitation exercises are conducted, such as the example SHELF protocol below.

SHELF Adapted Protocol Figure
TSD26 makes the following recommendations for conducting SEE for long-term survival outcomes. These recommendations should be used by manufacturers when submitting appraisals and by NICE external assessment groups (EAGs) when evaluating the validity and credibility of elicitation exercises.

  • The quantities of interest and target population should be clearly defined at the outset, ensuring that they are aligned with the decision problem and positioning
  • A variety of standard SEE protocols are recommended including Cooke’s classical method,1 Delphi,2 Investigate, Discuss, Estimate, Aggregate (IDEA),3 Medical Research Council (MRC) reference protocol4 and Sheffield Elicitation Framework (SHELF)5
  • Delphi methods are only recommended as SEE if they are conducted via one-to-one interviews
  • A minimum of three experts is recommended such that stakeholders can have confidence in the use of SEE, ensuring a breadth of experience is captured across the group
  • Experts that have knowledge of the trial data can be included, however, those who have previously seen model-based extrapolations of the data should be excluded, if feasible
  • An evidence dossier should be prepared and circulated ahead of the elicitation exercise to ensure the experts are equally informed
  • Recommended content to present in the dossier includes:
    • Decision problem and clear definitions for the quantities of interest
    • Kaplan–Meier plots of the survival data with confidence intervals and number of patients at risk and trends in the empirical hazard
    • Prognostic patient characteristics
    • General population mortality data
    • Key external study data
  • Model-based extrapolations should not be included within the dossier so as to avoid “anchoring effects” amongst the experts
  • Experts should be trained in making probability judgements and in understanding survivor and hazard functions ahead of the elicitation exercise
  • A practice elicitation exercise involving survival extrapolation should be conducted
  • A probability distribution should be obtained to quantify expert uncertainty about the value of the survival function at a specific time point.
  • Experts should also be asked for qualitative opinions about how the hazard may change over time
    • Use of a ‘hazard checklist’ is recommended to encourage experts to consider factors that might influence the hazard over the extrapolation period such as patient characteristics, disease progression, and treatment mechanisms; the checklist should guide a structured discussion for the experts to provide qualitative judgements on how the hazard might change over time
    • Use of scenario testing could also be used in which experts should be asked to make initial judgements about the survivor function before being presented with scenarios and again asked to provide a judgement
  • Distributions for survivor function values at additional time points may be elicited, noting whether the experts are drawing on additional specific knowledge or whether the judgements are based primarily on expectations of a ‘smooth-looking’ survival curve
  • Appropriate survival extrapolations should be identified by firstly removing any with poor statistical fit, in line with the recommendations in TSDs 14 and 21, followed by excluding any with hazard functions in conflict with the qualitative expert judgements, and finally comparing with the quantitative expert survival estimates
    • The experts’ uncertainty should be considered rather than interpreting based on point estimates from the distribution (e.g., the median value)
  • Cost-effectiveness results should be presented for all relevant models that are consistent with data and expert judgement
  • Alternative approaches of synthesising expert opinion and data within a Bayesian framework are also described within the TSD
  • Full details of the design and conduct of the elicitation should be described to facilitate independent replication
  • Both individual and aggregated judgements should be reported for all elicitations
  • Details of the expert recruitment process, including experts’ names, expertise, and conflicts of interest, should be fully disclosed, whilst individual expert judgements or qualitative statements should be anonymised

Three Key Takeaways

1. As a minimum, a probability distribution quantifying expert uncertainty at a single time point should be elicited from experts

The primary recommendation is to elicit probability distributions from clinical experts, so that expert uncertainty is quantified, and experts are not merely asked to provide ‘best estimates’ or approve the clinical validity of pre-selected model-based survival extrapolations.

This should be elicited in line with structured expert elicitation protocols, such as modified Delphi methods or SHELF, which have been modified to reflect the unique challenges associated with elicitation of long-term survival estimates.

The timepoint to elicit survival inputs from should be carefully chosen such that it is not too close to the latest available timepoint of trial data, and not at a point where the proportion of survivors is likely to be negligibly small; the potential parametric curves should also be sufficiently diverged at the chosen timepoint such that not all models could be considered plausible to fit within the experts’ judgements.

It is possible to additionally elicit probability distributions at multiple timepoints; however, dependence resulting from an expert basing their survival estimate at the second timepoint on their assumed estimate at the first timepoint can influence the outputs. As such, a joint probability distribution that accounts for this dependence is required, and the TSD recommends that eliciting judgements from a single timepoint is preferred.

Our comments:

Whether taking a pragmatic or structured approach to expert elicitation, at Costello Medical, we always recommend quantifying uncertainty around clinician estimates of survival. At a minimum, in addition to ‘best estimates’, we suggest eliciting ‘highest plausible’ and ‘lowest plausible’ limits, where clinicians would judge it to be extremely unlikely that the true value of survival could be higher or lower than these values, respectively. TSD26 presents an example protocol using the quartile method, which recommends eliciting upper and lower quartiles in addition to a median value, upper and lower plausible limits, which facilitates generation of a probability distribution – this is likely to provide more robust information with which to select survivor functions, but extends the time required for the elicitation exercise.

2. Qualitative opinions on hazard function trends during the extrapolation period should also be elicited

The TSD recommends also incorporating expert qualitative opinions on hazard function trends during the extrapolation period to enhance the accuracy and credibility of survival models. It notes that this may assist with both choosing between survival models and with checking for internal consistency in an expert’s judgement. This approach also enables a clearer picture of the full survival curve to be elicited, to avoid the issues of dependence when eliciting inputs across multiple timepoints. The TSD suggests two methods by which this can be elicited, through the use of a recommended hazard checklist and optional scenario testing (see further details in the section on ‘Recommended Best Practices’ above).

This qualitative input helps in understanding potential hazard increases or decreases over time, and the TSD recommends a step-wise approach to selecting curves by firstly ruling out inappropriate curves based on statistical fit data, and then considering qualitative input to further rule out any incompatible survival curves before considering quantitative probability distributions from the experts.

Taking this approach ensures the extrapolated survival functions align closely with expert insights and clinical realities.

Our comments:

Whilst the guidance from NICE outlined above is a useful contribution to recommended best practices, several challenges remain unaddressed by the latest guidance:

  • It remains challenging for clinicians to qualitatively provide feedback on interventions that might be associated with complex hazard functions, for example, where flexible spline models or mixture-cure models [MCMs]) may be more appropriate approaches to extrapolation. This is particularly challenging for curative therapies, where hazard trends must be considered for two distinct populations simultaneously
  • TSD26 also does not address the challenging situation where qualitative and quantitative estimates from the experts are directly contradictory. For example, if an exponential function is favoured based on the description of hazards, but substantially underestimates survival based on the distribution elicited at the selected timepoint

3. Anchoring effects should be avoided by not presenting model-based extrapolations before experts provide their judgments

Anchoring occurs when experts are influenced by initial information or estimates, affecting their judgments. For example, if experts see an extrapolated survival curve from a model before making their own estimates, they might unconsciously base their judgments on this starting point, even if they intend to provide an independent assessment.

To mitigate this effect, the TSD recommends not showing model extrapolations to experts before they provide their own estimates. This practice helps ensure that experts’ judgments are based on their own knowledge and not biased by previous models.

Our comments:

Whether taking a pragmatic or structured approach to expert elicitation, at Costello Medical, we would always recommend to our clients to ensure that model-based extrapolations are reserved from presentation to experts until after any judgements have been provided. It is beneficial to avoid the risk of anchoring effects regardless of the elicitation method used.

Looking Forwards: Will TSD 26 Move the Needle?

Time and resource constraints are likely to remain a barrier to adoption

Having supported our clients with multiple different structured approaches to expert elicitation, we know that these approaches are the gold standard, producing the most robust estimates of long-term survival. That being said, in line with the DSU’s findings, we haven’t seen much uptake across submissions we have supported. Only four of 35 appraisals reviewed in TSD26 were identified as using SEE for survival outcomes, and Costello Medical supported three out of four appraisals using structured approaches.

More ‘pragmatic’ methods of elicitation tend to be less resource-and time-intensive, and as a result remain preferred by manufacturers, particularly whilst the added value of structured methods remains unclear; use of more pragmatic methods has not prohibited successful reimbursement from NICE to date. That being said, the most time-consuming step for manufacturers exploring any form of validation or elicitation is contracting the participating experts – the time requirements for structured methods for elicitation can therefore be similar to traditional advisory boards. It should be noted however that these structured elicitations exercises, given their narrow focus, typically don’t substitute for the breadth of opinion provided by a traditional advisory board, instead representing an additional exercise that would need to be considered.

As such, we appreciate that the combination of time and resource constraints may necessitate ‘pragmatic’ methods of elicitation – in such circumstances, we would urge manufacturers to bear in mind our key recommendations above:

  • Relevant clinical evidence should be provided to experts to characterise the population and support their judgements, as well as appropriate training regarding the elicitation process
  • Model-based extrapolations should not be presented to avoid anchoring effects
  • Effort should be made to quantify uncertainty around clinician estimates of survival

We have found that these steps can be accommodated within standard approaches to clinical validation without significant time or resource implications. Exercises directly aligned with TSD26 are likely to add the most value where long-term survival is a particularly important driver of uncertainty, for example where survival data are very immature.

A clear demonstration of the added value for HTA outcomes is required

It remains to be seen whether the publication of this guidance will result in a shift towards greater use of structured approaches. The key to this changing would need to be a clear indication that these approaches have a positive impact on either the time to reimbursement – by reducing post-submission timelines or avoiding managed access – or that the reduction in uncertainty is reflected in the NICE’s decision-making in a tangible way, for example when determining the appropriate willingness-to-pay threshold. Until that impact becomes tangible, it may continue to be challenging for manufacturers to justify the additional resource and time requirements of structured approaches.

References

  1. Micallef J, New E, Satija A, et al. Capturing The Value Of Potentially Curative Oncology Therapies: Lessons From The Use And Acceptance Of Cure Modelling Assumptions In NICE Technology Appraisals. Presented at ISPOR Europe 2021.
  2. Liu BL, Griffiths M. Adoption of Piecewise Modelling: A Review of NICE Health Technology Appraisals in Oncology. Presented at ISPOR 2022.
  3. Davies C, Emerson A, Porteous A. Are Landmark Survival Models Accepted in National Institute for Health and Care Excellence Health Technology Evaluations? Presented at ISPOR Europe 2022.
  4. Porteous A, van Hest N, Curteis T, et al. PCN4 Accuracy of Life Year Gain Predictions for Nivolumab Monotherapy in the Long Term: An Analysis Across Four Indications. Value in Health 2020;23:S22.
  5. Porteous A, Herbert K, Painter C. PCN20 Accurate Predictions Of Life Year Gains For Immuno-Oncology Therapies In The Long Term? An Analysis Based On Published Checkmate 057 Nivolumab Data. Value in Health 2019;22:S438.22.
  6. Davies C, Liu BL. P14 A Case Study Using Keynote-024 to Examine the Impact of Cut-Point Selection on Long-Term Survival Estimates from Piecewise Modeling. Oral Presentation. Presented at ISPOR Europe 2023.
  7. Porteous A, Hilton B, Gregori D. P49 Accuracy of Life Year Gains Predictions for CAR-T Therapy in the Long Term: An Analysis for Axicabtagene Ciloleucel in Refractory Large B-Cell Lymphoma. Oral Presentation. Presented at ISPOR Europe 2021.
  8. Harrington H, Madueke S, Sodiwala TA, et al. Forecasting the Long-Term Treatment Effect Duration of Immuno-Oncology Therapies: An Analysis of the Predictive Accuracy of Treatment Waning Methods Applied to Pembrolizumab in Non-Small Cell Lung Cancer. Presented at ISPOR Europe 2022.
  9. Harrington H, Madueke S, Vasilyeva AV, et al. Duration and Timing of Treatment Waning: Determining the Start and Stop Points Using Pembrolizumab in NSCLC as a Case Study. Presented at ISPOR 2023.
  10. Micallef J, Harrington H, van Hest N. When Does a Treatment Effect Really Stop? Exploration of Different Methods for Modelling Treatment Waning. Presented at ISPOR Europe 2022.
  11. Zorg Instituut Nederland. Guideline for economic evaluations in healthcare. Access this article. Last accessed: August 2025.
  12. Cooke RM. Experts in uncertainty: opinion and subjective probability in science. USA: Oxford University Press, 1991.
  13. Authority EFS. Guidance on expert knowledge elicitation in food and feed safety risk assessment. EFSA Journal 2014;12:3734.
  14. Hemming V, Walshe TV, Hanea AM, et al. Eliciting improved quantitative judgements using the IDEA protocol: A case study in natural resource management. PLoS One 2018;13:e0198468.
  15. Laura Bojke, Soares M, Claxton K, et al. Developing a reference protocol for structured expert elicitation in health-care decision-making: a mixed-methods study. Health Technology Assessment 2021;25:1.
  16. Oakley JE, O’Hagan A. SHELF: the Sheffield Elicitation Framework (version 4). Access this article. Last accessed: August 2025.

If you have any questions relating to the guidance, or would like to explore how we could support you with conducting any structured expert elicitation, or advice and recommendations on what approaches would best suit your individual submission challenges, please get in touch, or visit our HTA page. Alex Porteous (Head of HTA) and Alice Reading (Consultant) created this article on behalf of Costello Medical. The views/opinions expressed are their own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.

Cookies Overview
Costello Medical

Our website uses cookies to distinguish you from other users. This helps us to provide you with a good experience when you browse our website and also allows us to improve our site. Cookies are files saved on your phone, tablet or computer generated when you visit a website and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can select to accept or reject non-essential cookies using the toggle below. For full details of the cookies we use, please see our Cookies Policy and Privacy Notice.

Strictly Necessary Cookies

These essential cookies do things like: remembering the notifications you've seen so we do not show them to you again or your progress through a form. They always need to be on.

Non-essential Cookies

We use these to collect information on how our users engage with our website so that we can improve the experience of the website for our users. For example, we collect information about which of our pages are most frequently visited, and by which types of users.