Revolutionising Rare Disease Data Generation

The Persistent Data Challenge in Rare Diseases

Rare diseases, though individually uncommon, collectively affect about 263–446 million people globally, which is roughly 4–6% of the world’s population.1 Despite the considerable number of individuals affected by rare diseases around the world, data collection for rare disease research remains a challenge:

  • The small size of patient populations in each rare disease makes it difficult to gather statistically robust and generalisable data
  • Many studies rely on single-arm trials without control groups, which limits the strength and credibility of the evidence generated
  • Recruitment for both qualitative and quantitative studies is often hindered by the geographic dispersion of patients and their limited availability

Moreover, the high costs associated with rare disease research, especially for advanced therapies like gene and cell treatments, are compounded by a lack of reliable evidence, deterring investment from payers and stakeholders.2 These challenges create a vicious cycle: limited data leading to limited evidence, which in turn restricts funding and innovation. Addressing these issues requires a paradigm shift in how data are collected, integrated and analysed.

Use of Multimodal Data and Methodologies: A New Era for Rare Diseases

The fragmented nature of rare disease data is one of the most pressing obstacles in real-world evidence (RWE) generation. Data are scattered across electronic health records (EHRs), insurance claims, disease registries, provider notes and even genomic databases. This fragmentation complicates the assembly of patient cohorts, which is essential for observational studies. However, the use of data from different sources or modalities (multimodal data) offers a promising path forward as discussed in an ISPOR 2025 issue panel.3

The issue panel highlighted the importance of leveraging various real-world databases, including closed and open claims, chargemaster data and clinical-genomic records. These sources, when combined, can overcome the limitations of any single dataset. For instance, claims data provide longitudinal insights into healthcare utilisation, while EHRs offer clinical depth. Disease registries, on the other hand, often contain detailed phenotypic and genotypic information specific to rare conditions.

Linking different databases and tokenising patient data can offer significant advantages in generating RWE for rare diseases. These methods enable researchers to integrate fragmented data sources, providing a more complete and longitudinal view of patient journeys while preserving privacy through pseudonymisation. This approach enhances the ability to study treatment effects in diverse populations and supports regulatory submissions with robust, privacy-compliant data.

Despite these advances, data standardisation across sources is still lacking. Additionally, the cost and expertise required to manage and analyse multimodal data can be prohibitive for smaller research teams. Nonetheless, the feasibility of this approach is improving with the advent of cloud-based platforms and open-source tools. The potential for change is significant as multimodal methodologies would not only enhance data availability in rare diseases but also promote the approval and availability of treatments, addressing the unmet needs of patients.

The Potential of AI for Data Generation in Rare Diseases

A second issue panel on the potential of generative AI (genAI) for data generation in rare diseases also offered a promising outlook in the rare diseases space.4

One of the most promising applications is the ability of large language models (LLMs) to unlock hidden data within unstructured EHRs. By extracting and synthesising clinical narratives, AI can reveal patterns and phenotypes that traditional methods might miss.

Knowledge graph development is another transformative application. These graphs map relationships between symptoms, genetic markers, treatments and outcomes, offering a dynamic representation of disease pathways. For rare diseases, where natural histories are often poorly understood, such tools can fill critical knowledge gaps. AI-driven knowledge graphs can also support hypothesis generation and guide the design of clinical trials.

genAI can mine scientific literature to identify emerging trends, unmet needs and potential biomarkers. This capability is particularly valuable for early diagnosis and trial recruitment, where timely insights can significantly impact patient outcomes. Moreover, AI can simulate patient populations and predict treatment responses, aiding in the design of more efficient and targeted studies.

The feasibility of these applications is increasingly supported by advancements in computational power, natural language processing and data availability. However, challenges such as algorithmic bias, data quality, and regulatory acceptance must be addressed. Ensuring transparency and explainability in AI models is essential for building trust among clinicians, regulators, and patients.

References

  1. Nguengang Wakap, S, Lambert, DM, Olry, A, et al. Estimating Cumulative Point Prevalence of Rare Diseases: Analysis of the Orphanet Database. European Journal of Human Genetics. 2020;28:165–173.
  2. Berry, D, Hickey, C, Kahlman, L, et al. Ensuring Patient Access to Gene Therapies for Rare Diseases: Navigating Reimbursement and Coverage Challenges. American Society of Gene and Cell Therapy. Access from ASGTC. Last accessed: May 2025.
  3. Issue Panel Session 113: How Do We Generate RWE in Rare Diseases or Targeted Subgroups? Use of Multimodal Data and Methodologies. ISPOR International Congress, Montreal, Quebec, Canada, 2025.
  4. Issue Panel Session 128: Rare but Common: Generative AI’s Potential on Data, Evidence, and Insight Generation in Rare Diseases. ISPOR International Congress, Montreal, Quebec, Canada, 2025.

If you would like any further information on the themes presented above, please get in touch, or visit our Rare Diseases page to find out how our expertise can benefit you. Jose Medrano (Analyst) created this article on behalf of Costello Medical. The views/opinions expressed are his own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.

Discover more insights

Cookies Overview
Costello Medical

Our website uses cookies to distinguish you from other users. This helps us to provide you with a good experience when you browse our website and also allows us to improve our site. Cookies are files saved on your phone, tablet or computer generated when you visit a website and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can select to accept or reject non-essential cookies using the toggle below. For full details of the cookies we use, please see our Cookies Policy and Privacy Notice.

Non-essential Cookies

We use these to collect information on how our users engage with our website so that we can improve the experience of the website for our users. For example, we collect information about which of our pages are most frequently visited, and by which types of users.