Navigating the Data Wilderness: Best Practices in Real-World Data Source Landscaping

As health technology assessment (HTA) processes become more complex, real-world evidence has become essential for a strong evidence package. Real-world data can provide vital insights for regulators, payers, and healthcare professionals by helping to characterise patient populations, healthcare resource use, real-world effectiveness of treatments and patient-reported outcomes – all essential components for advancing patient-centred care.

Leveraging structured datasets such as administrative healthcare databases, claims data or patient registries can enhance the reliability of real-world evidence, as these sources often provide large, representative samples and standardised data collection.

Dataset type Description Strengths Limitations
Administrative database/electronic health records Data collected mainly for operational or administrative purposes by health organisations
Usually covers large, defined populations over long periods, enabling population-level analyses
Often lacks clinical details, such as disease severity or patient-reported outcomes
May be affected by coding errors or variations
Claims data Records generated from billing and reimbursement processes in insurance or national health systems
Provides detailed use of healthcare services, costs, and provider information, useful for health economics
Limited clinical detail (see above), and may be influenced by billing practices
Patient registry Standardised data collected on individuals with specific conditions, used for multiple purposes including research and quality improvement
Rich clinical, demographic, and outcome data, often longitudinal
Useful for studying rare conditions or long-term outcomes
Participation is often voluntary, leading to potential selection bias
May lack linkage to broader administrative/ claims data
Observational study with primary data collection (for secondary use) Secondary use of a dataset generated for the purpose of a prior primary research study, gathering new data directly from participants
Can provide granular data tailored to specific research needs
May include patient-reported outcomes which are not routinely collected
May face bias from selection or measurement
Secondary use limits control over data quality and scope

When appropriately used and integrated, such datasets can strengthen the validity of evidence presented in HTA submissions. However, the fragmented nature of the real-world data landscape, along with variations in source populations, scope of content and data quality, can pose a challenge to researchers attempting to identify fit-for-purpose real-world data.

Evaluating Real-World Data Source Trade-Offs

In the process of selecting a suitable data source, it is unlikely that a single dataset will fully meet all requirements of the planned study, and compromises may be required. Various data sources tend to excel in specific aspects, such as population coverage, clinical detail or length of follow-up, and each possesses unique strengths and limitations.

A careful evaluation is essential to understand how well a dataset aligns with the study objectives, and combining multiple sources may sometimes be necessary to address gaps and ensure comprehensive results.

When evaluating a data source, the fitness-for-purpose can be summarised by:

  • Data quality: refers to the accuracy and completeness of key study variables
  • Data relevance: describes the alignment of the care settings, data content, population characteristics and sample size, intervention and timeframe of follow-up, ensuring they match the target population and research question
  • Data access: considerations such as availability of the dataset for commercial research, timelines for data delivery and associated costs are important to understand to ensure feasibility of the planned study

Building on these considerations, the UK National Institute for Health and Care Excellence (NICE) Real-World Evidence Framework acknowledges that compromises are often unavoidable and can be justified based on the context-specific challenges associated with real-world data collection, as well as the purpose of real-world evidence within the submission.

“We recognise the need for trade-offs between different characteristics of data sources including quality, size, clinical detail and locality. International data may be appropriate for some questions in the absence of sufficient national data or when results are expected to translate well between settings. We also recognise that there may be challenges in identifying or collecting the highest quality evidence in some applications including in rare diseases and for some medical devices and interventional procedures.”

NICE Real-World Evidence Framework: Assessing Data Suitability

Importance of Data Landscaping for Successful Integrated Evidence Planning

To ensure decision makers have confidence in the selected real-world data, and the evidence derived from this, it is essential that the data source selection is thoroughly justified. This should involve not only an assessment of the dataset’s fitness-for-purpose but also a consideration for the use of the data source over potential alternatives. A systematic assessment of the real-world data landscape is therefore recommended to effectively identify, characterise, and critically evaluate available datasets.

Failure to conduct this preliminary step may result in greater uncertainty and stakeholder scepticism in the evidence presented.

This can contribute to non-recommendation by reimbursement bodies, particularly if uncertainties regarding the data provenance, accuracy, and suitability of selected real-world data impacts on the confidence of the incremental cost-effectiveness ratio (ICER) presented in a submission. As such, a clearly defined and well implemented approach to real-world data landscaping and assessment provides a stable platform from which pharmaceutical manufacturers can determine which data sources best align with their strategic evidence generation goals.

Best Practice for Real-World Data Source Landscaping at Costello Medical

The case study above underscores the importance of a systematic and transparent landscape assessment, which can be achieved through the following structured approach. Applying these best practices can strengthen your HTA submissions and better support timely patient access.

Figure that shows some best practices for real-world data source landscaping

An essential first step is to assess the evidence generation needs and key aspects of the study design. This allows priority data source requirements to be identified, e.g. geography, identification of population and subgroups of interest, sample size requirements, setting and timeframe of follow-up, as well as the outcomes and covariates required for the analysis.

  • This initial step may involve listing key eligibility criteria if evidence generation planning is at an early stage or the hypothetical design of the study is well understood
  • For study designs requiring more in-depth consideration, such as comparative effectiveness studies to generate decision-grade evidence, it may be beneficial to develop a target trial protocol, guided by the use of frameworks such as the Structured Process to Identify Fit-For-Purpose Data (SPIFD, v2)

The development of the search protocol should be guided by the data source requirements identified in the preliminary step, and should include transparent eligibility criteria and a comprehensive range of search sources:

  • Medical literature databases (Medline, Embase) to identify published articles using relevant study designs studies in the indication of interest
  • Data source or real-world evidence study registers, including generic registers (e.g. the European Medicines Agency (EMA) Catalogue of Real-World Data Sources or ClinicalTrials.gov) and indication-specific registers (e.g. Orphanet for rare diseases)
  • Websites of relevant clinical centres or medical societies that may advertise or recruit for patient registries

A comprehensive data source inventory is developed, into which details of the data source characteristics and data content for each of the shortlisted data sources are extracted:

  • Relevant fields may include the design and provenance of the data source, population coverage, variable content and formatting, practical considerations for data access and strengths/limitations affecting the planned study
  • The extractions are informed by careful review of existing publications, data source websites and data dictionaries. It is often beneficial to contact data source administrators directly to obtain further information on data not readily available from public sources

Qualitative assessment of data source suitability should be conducted according to the priority criteria outlined at the outset of the project:

  • Overall feasibility should be determined on balance of data suitability/availability (are the required data recorded, or can they be obtained through proxies/algorithms?), data quality (are the data complete/reliable?), as well as the relative priority of any missing parameters (essential vs desired)
  • The suitability of individual data sources can be further informed by a review of previous HTA submissions in the indication, to assess precedence and use cases, as well as any opinions expressed by decision makers

Conclusion

Implementing a systematic and transparent approach to data source landscaping is integral to strategic evidence generation planning. By embracing best practices in evaluating and integrating diverse datasets, companies can better navigate data trade-offs, manage uncertainties and align evidence with strategic objectives. Ultimately, this groundwork enables the production of stronger, more credible evidence that supports payer confidence and timely access.

If you would like any further information on the themes presented above, please get in touch, or visit our Real-World Evidence page to learn how our expertise can benefit you. Audrey Artignan (Consultant) created this article on behalf of Costello Medical. The views/opinions expressed are her own and do not necessarily reflect those of Costello Medical’s clients or affiliated partners.

Discover more insights

Cookies Overview
Costello Medical

Our website uses cookies to distinguish you from other users. This helps us to provide you with a good experience when you browse our website and also allows us to improve our site. Cookies are files saved on your phone, tablet or computer generated when you visit a website and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can select to accept or reject non-essential cookies using the toggle below. For full details of the cookies we use, please see our Cookies Policy and Privacy Notice.

Non-essential Cookies

We use these to collect information on how our users engage with our website so that we can improve the experience of the website for our users. For example, we collect information about which of our pages are most frequently visited, and by which types of users.