In March 2025, the National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) published Technical Support Document (TSD) 26. With a particular focus on oncology, TSD26 provides guidance on the application of expert elicitation to address the challenges faced in health technology assessments (HTAs) that require extrapolation of survival data over an extended modelled time horizon.
The use of structured expert elicitation (SEE) to validate survival extrapolations in oncology is not yet commonplace in HTA, with choice of extrapolation typically informed by statistical and visual fit to available data. Assessment of external validity through consultation with clinical experts is common, but this tends to take the form of individual consultations or advisory boards, focusing on retrospective validation of existing extrapolations or elicitation of subjective ‘best estimates’ of survival. Retrospective validation is inherently subjective and prone to personal bias, as highlighted in TSD21.
At Costello Medical, we have a long track record of published research focusing on methods for survival extrapolation, and we have always advocated for the use of structured methods to validate selection of survival outcomes in oncology.1-10 Only four of 35 appraisals reviewed in TSD26 were identified as using SEE for survival outcomes, and Costello Medical supported three out of the four appraisals using structured approaches. It should be noted that these elicitation exercises were conducted under high time-pressure and within limited budgets, necessitating modification of published elicitation protocols. Beyond our work supporting SEE for HTA submissions, we have also conducted SEE for survival outcomes using formal methods such as the Sheffield Elicitation Framework (SHELF).
We therefore welcome the publication of more detailed guidelines on SEE for survival outcomes, but we understand the challenges faced by manufacturers that may limit their implementation. A key challenge is timeline pressure, particularly the availability of clinical data often close to submission deadlines; these data are required for inclusion in the evidence dossier shared with experts. Structured elicitation exercises are also substantially more resource-intensive than the more pragmatic approaches outlined above, requiring more preparation and more time from experts. Manufacturers are increasingly facing budget pressures in the UK setting, and the added value of structured methods in terms of HTA outcomes remains unclear.
Based on our experience conducting SEE for survival outcomes, and supporting over 190 HTA submissions in oncology indications, we have reflected on the recommendations provided in TSD26 and shared our perspectives below.
TSD26 provides detailed guidelines on the use of SEE for technology appraisals (TAs) in oncology, building on the recommendations from TSD14 and TSD21 regarding model-based extrapolations. A review within TSD26 highlighted that few NICE submissions so far have employed methods of structured elicitation, with general consultation remaining the preferred method of elicitation. These approaches are often met with criticism from healthcare decision-makers as they typically rely on retrospective validation of existing survival curves or elicitation of subjective ‘best estimates’ of survival, which are prone to bias and do not adequately capture uncertainty. TSD26 advocates for the broader adoption of SEE to improve methodological rigour in HTA submissions, following in the footsteps of updated guidance from the Zorginstituut Nederland (ZIN) last year in providing more clarity on best practices for conducting SEE, without going as far as making it obligatory for quantitative expert elicitation for NICE.11
Oakley et al. suggest that established protocols for eliciting probability distributions from experts in a structured manner should form the basis of how elicitation exercises are conducted, such as the example SHELF protocol below.
TSD26 makes the following recommendations for conducting SEE for long-term survival outcomes. These recommendations should be used by manufacturers when submitting appraisals and by NICE external assessment groups (EAGs) when evaluating the validity and credibility of elicitation exercises.
The primary recommendation is to elicit probability distributions from clinical experts, so that expert uncertainty is quantified, and experts are not merely asked to provide ‘best estimates’ or approve the clinical validity of pre-selected model-based survival extrapolations.
This should be elicited in line with structured expert elicitation protocols, such as modified Delphi methods or SHELF, which have been modified to reflect the unique challenges associated with elicitation of long-term survival estimates.
The timepoint to elicit survival inputs from should be carefully chosen such that it is not too close to the latest available timepoint of trial data, and not at a point where the proportion of survivors is likely to be negligibly small; the potential parametric curves should also be sufficiently diverged at the chosen timepoint such that not all models could be considered plausible to fit within the experts’ judgements.
It is possible to additionally elicit probability distributions at multiple timepoints; however, dependence resulting from an expert basing their survival estimate at the second timepoint on their assumed estimate at the first timepoint can influence the outputs. As such, a joint probability distribution that accounts for this dependence is required, and the TSD recommends that eliciting judgements from a single timepoint is preferred.
Whether taking a pragmatic or structured approach to expert elicitation, at Costello Medical, we always recommend quantifying uncertainty around clinician estimates of survival. At a minimum, in addition to ‘best estimates’, we suggest eliciting ‘highest plausible’ and ‘lowest plausible’ limits, where clinicians would judge it to be extremely unlikely that the true value of survival could be higher or lower than these values, respectively. TSD26 presents an example protocol using the quartile method, which recommends eliciting upper and lower quartiles in addition to a median value, upper and lower plausible limits, which facilitates generation of a probability distribution – this is likely to provide more robust information with which to select survivor functions, but extends the time required for the elicitation exercise.
The TSD recommends also incorporating expert qualitative opinions on hazard function trends during the extrapolation period to enhance the accuracy and credibility of survival models. It notes that this may assist with both choosing between survival models and with checking for internal consistency in an expert’s judgement. This approach also enables a clearer picture of the full survival curve to be elicited, to avoid the issues of dependence when eliciting inputs across multiple timepoints. The TSD suggests two methods by which this can be elicited, through the use of a recommended hazard checklist and optional scenario testing (see further details in the section on ‘Recommended Best Practices’ above).
This qualitative input helps in understanding potential hazard increases or decreases over time, and the TSD recommends a step-wise approach to selecting curves by firstly ruling out inappropriate curves based on statistical fit data, and then considering qualitative input to further rule out any incompatible survival curves before considering quantitative probability distributions from the experts.
Taking this approach ensures the extrapolated survival functions align closely with expert insights and clinical realities.
Whilst the guidance from NICE outlined above is a useful contribution to recommended best practices, several challenges remain unaddressed by the latest guidance:
Anchoring occurs when experts are influenced by initial information or estimates, affecting their judgments. For example, if experts see an extrapolated survival curve from a model before making their own estimates, they might unconsciously base their judgments on this starting point, even if they intend to provide an independent assessment.
To mitigate this effect, the TSD recommends not showing model extrapolations to experts before they provide their own estimates. This practice helps ensure that experts’ judgments are based on their own knowledge and not biased by previous models.
Whether taking a pragmatic or structured approach to expert elicitation, at Costello Medical, we would always recommend to our clients to ensure that model-based extrapolations are reserved from presentation to experts until after any judgements have been provided. It is beneficial to avoid the risk of anchoring effects regardless of the elicitation method used.
Having supported our clients with multiple different structured approaches to expert elicitation, we know that these approaches are the gold standard, producing the most robust estimates of long-term survival. That being said, in line with the DSU’s findings, we haven’t seen much uptake across submissions we have supported. Only four of 35 appraisals reviewed in TSD26 were identified as using SEE for survival outcomes, and Costello Medical supported three out of four appraisals using structured approaches.
More ‘pragmatic’ methods of elicitation tend to be less resource-and time-intensive, and as a result remain preferred by manufacturers, particularly whilst the added value of structured methods remains unclear; use of more pragmatic methods has not prohibited successful reimbursement from NICE to date. That being said, the most time-consuming step for manufacturers exploring any form of validation or elicitation is contracting the participating experts – the time requirements for structured methods for elicitation can therefore be similar to traditional advisory boards. It should be noted however that these structured elicitations exercises, given their narrow focus, typically don’t substitute for the breadth of opinion provided by a traditional advisory board, instead representing an additional exercise that would need to be considered.
As such, we appreciate that the combination of time and resource constraints may necessitate ‘pragmatic’ methods of elicitation – in such circumstances, we would urge manufacturers to bear in mind our key recommendations above:
We have found that these steps can be accommodated within standard approaches to clinical validation without significant time or resource implications. Exercises directly aligned with TSD26 are likely to add the most value where long-term survival is a particularly important driver of uncertainty, for example where survival data are very immature.
It remains to be seen whether the publication of this guidance will result in a shift towards greater use of structured approaches. The key to this changing would need to be a clear indication that these approaches have a positive impact on either the time to reimbursement – by reducing post-submission timelines or avoiding managed access – or that the reduction in uncertainty is reflected in the NICE’s decision-making in a tangible way, for example when determining the appropriate willingness-to-pay threshold. Until that impact becomes tangible, it may continue to be challenging for manufacturers to justify the additional resource and time requirements of structured approaches.
References
If you have any questions relating to the guidance, or would like to explore how we could support you with conducting any structured expert elicitation, or advice and recommendations on what approaches would best suit your individual submission challenges, please get in touch, or visit our HTA page. Alex Porteous (Head of HTA) and Alice Reading (Consultant) created this article on behalf of Costello Medical. The views/opinions expressed are their own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.