We cannot continue to ignore heterogeneity in ALS
The tools exist, it’s time to embrace them. Long-read sequencing is just one of the force-multipliers at hand
It’s looking increasingly likely that ending the failure streak in drug development for ALS won’t be possible until biotech and pharma companies address the incredible heterogeneity in the patient population. The path to that outcome must include a rethink of how genetic data are collected and analyzed.
Could there be a one-size-fits-all solution to this disease? It’s theoretically possible. But the diversity in clinical presentation, pattern and rate of progression, and the different protein pathologies in patients’ brains suggest it’s extremely unlikely.
It's no surprise that the most significant clinical advancement in amyotrophic lateral sclerosis (ALS) in recent years came from a precision medicine therapy targeting patients with pathogenic variants in a specific gene. Last year, FDA granted accelerated approval to SOD1-targeting antisense therapy tofersen, marketed as Qalsody, to treat ALS patients with SOD1 mutations, a subgroup that’s more homogeneous than the general patient population yet still includes many genetic variants.
Before the field can systematically match patients to therapeutic mechanisms, we first need to widen the aperture on the genetics to get a broader view of the disease biology. So far, the roughly 50 genes linked to ALS account for approximately 30% of cases, and 21 of those genes are defined as “definitive” pathogenic variants by the ClinGen ALS Expert Panel.
Much of the genetics of CNS diseases remains to be discovered.
In the biopharma industry’s ongoing quest to understand complex diseases, particularly those affecting the CNS, the tools we use to decode genomes are as nearly as critical as the insights we seek to uncover. Yet, as we assess the genetic landscape of ALS, we must ask ourselves: Are we using the right sequencing tools to capture the true complexity of the human genome? Are we adequately applying what we know about genetics and epigenetics? And are our genomic datasets reflective of the diverse populations we aim to serve?
The answer to these questions is no — change is needed on all three fronts.
The solutions involve adopting long-read sequencing technologies to capture not only single nucleotide variants but also other complex forms of genetic variation, such as structural variants, that contribute to the disease. To do this right, bioinformatics workflows must be deeply integrated with cutting-edge genetics and tailored specifically to ALS for more precise and comprehensive analysis.
Moreover, progress on the genetics must be accompanied by epigenetic profiling to understand the contribution of environmental factors to disease biology, and blind spots will persist on both fronts without datasets that accurately capture the diversity of the patient population.
These remedies, while not exhaustive, would go a long way toward catalyzing precision medicine not only for ALS but for many complex diseases. ALS has been called the “cruellest disease” because its progressive paralysis robs people of autonomy while leaving their minds sharp to experience the process. But Alzheimer's, Parkinson’s and other neurodegenerative diseases are all cruel in different ways. They all suffer from the heterogeneity problem, and all stand to benefit from taking the steps outlined below.
The case for long-read sequencing
Most sequencing efforts to date have relied on short-read sequencing, a technology that, while powerful, limits our understanding of the intricacies of the human genome as it primarily outputs single nucleotide variants (SNVs). While valuable, SNVs represent only a small fraction of the genetic variation within a genome. The reality is that a genome is much more than a simple table of SNVs.
A prime example is structural variations, which involve larger segments of the genome and can have profound effects on gene function and regulation. Tandem repeats, a common type of structural variation, are particularly relevant in CNS diseases given that expansions in these repeats are critical features in many neurodegenerative diseases. Yet, short-read sequencing is largely blind to repeat expansions, leaving significant gaps in our understanding of the etiology of diseases.
Long-read sequencing suffered technical and cost limitations in the past, but those issues have largely been resolved. Today, the technique offers a practical approach to obtaining a much more comprehensive view of patients’ genetics than short-read sequencing provides. Yet, due in part to the inertia that comes from standardizing around short-read sequencing, long-read sequencing is still the exception rather than the norm in genetics research.
Long-read sequencing technology can capture the full range of genomic variation and is particularly adept at resolving complex regions of the genome that short reads struggle with. In my experience with one long-read sequencing provider, which my portfolio company Genieus Genomics Pty. Ltd. works with, a recent dataset yielded 10 times more prioritized variants in genes associated with neurodegeneration than short-read sequencing, including more SNVs. This was surprising, given that SNVs have always been short-read sequencing's stronghold.
The observation that shared genetics translate into different phenotypes in different people demands looking beyond the genomic variants themselves.
This difference is not just academic — it represents a significant leap forward in identifying potential genetic drivers of disease.
The highly variable tandem repeat in the CACNA1C gene, as discussed in Song et al. (2018) and the more recent Moya et al. paper, was discovered through the use of long-read sequencing technologies. This approach enabled researchers to identify the size and sequence variability of the tandem repeat, which had not been fully appreciated using earlier methods. This discovery has advanced our understanding of the genetic factors contributing to neuropsychiatric conditions such as bipolar disorder and schizophrenia.
The GBA gene, a major risk factor for Parkinson’s disease, contains complex variants, including structural changes and repetitive regions, that are often missed by traditional sequencing methods. These variants can significantly affect gene function. Long-read sequencing provides the detailed resolution necessary to capture the complexity of GBA variants, offering a more comprehensive understanding of their role in Parkinson's disease.
Another case study comes from repeat expansions in the ATXN2 gene, which are associated with both ALS and spinocerebellar ataxia (SCA2), with intermediate expansions of 27–33 repeats increasing ALS risk, and larger expansions of 35 or more repeats causing SCA2. These expansions disrupt protein function, contributing to neurodegeneration in both conditions. Long-read sequencing is needed to accurately detect the expansions, as it can capture the full length and variability of the repeat regions.
These examples are likely the tip of the iceberg — much of the genetics of CNS diseases remains to be discovered.
The critical interplay between genetics and epigenetics
Unraveling disease heterogeneity will also require looking beyond genetic drivers. A patient’s genetic background can make them more or less vulnerable to environmental insults; therefore, understanding how genetic variations manifest as disease is deepened by integrating genetics with insights from epigenetics. Without both components, our view of the biology will continue to be incomplete.
A striking example of how genetics and epigenetics intersect to influence disease progression comes from Huntington’s disease, where research into trinucleotide-repeat expansions has shown the repeats can grow longer in certain neurons over time. These somatic expansions range from 36 to over 150 CAG repeats in the HTT gene, lead to profound changes in gene expression and ultimately result in neuronal apoptosis.
In ALS, there are numerous cases where, within a family, the same pathogenic variant yields different disease phenotypes. In other cases, the same pathogenic variant will cause ALS in some patients and ataxia in others. Similarly, C9orf72 is the most common pathogenic variant in ALS and is associated with cases of frontotemporal dementia.
The observation that shared genetics translate into different phenotypes in different people demands looking beyond the genomic variants themselves to the intersection of polygenic risk and epigenetic factors.
Tools are emerging to address this need, including epigenetic analysis tools that detect changes in DNA methylation, histone modifications and chromatin accessibility. Bioinformatics tools that enable deep variant hunting, annotation and prioritization can help predict the functional consequences of epigenetic changes, and bioinformatics platforms have emerged to mine large datasets across genomics, transcriptomics and clinical information.
The importance of diverse datasets
While refining our tools is critical, it won’t be enough to fill the knowledge gaps unless the reference datasets that underpin research are updated to reflect the diversity in the patient population. Without this, we’ll simply never get a full picture of the disease biology and its heterogeneity. And when the breakthroughs come, some patients will be left behind.
Many of the biobanks we rely on are not only built on short-read sequencing data, which imposes a certain set of limitations on progress, but also lack the genetic diversity, further limiting the representation needed to usher in an era of precision medicine.
A broad population approach is needed to increase the chances of identifying the underlying causes of ALS for the 70% of patients who do not know why they’ve developed the disease.
If companies working in the ALS space aren’t looking beyond SNVs and, even beyond genetics to epigenetics, in diverse datasets, they aren’t seeing the full picture.
For example, we already know that C9orf72 expansions are more common in Europeans, SOD1 mutations in Asians, and VAPB mutations are often found in South Americans. Certain SOD1 mutations, such as A4V, are more commonly found in North Americans of European descent and are associated with rapidly progressing ALS. In contrast, mutations like D90A are particularly prevalent in Scandinavian populations, where they can be inherited in either a dominant or recessive manner, typically resulting in a milder form of ALS with slower disease progression. Without reference datasets that respect this diversity — datasets that are harmonized for ancestry, race and ethnicity — we risk missing critical insights.
The emergence of human pangenome references, which capture a broader spectrum of genetic variation, is a step in the right direction. Projects including the draft human pangenome reference, the pangenome reference of Chinese ethnicities, a draft Arab pangenome reference , and a Pacific Islanders pangenome, are beginning to fill this gap. But much more work is needed to ensure that the reference datasets are comprehensive and inclusive.
Biotech and pharmaceutical companies should support these efforts and seek to use the most complete datasets they can obtain in discovering targets and biomarkers.
The vision: putting it all together
To truly advance our understanding of complex diseases such as ALS and achieve the promise of precision medicine for patients, we need to embrace the diversity in front of us and use best-in-class approaches and tools.
While dozens of genetic variants have been linked to ALS, we’re still at the beginning of reaping the learnings promised by genetics, let alone the force multipliers of other omics tools and phenotyping becoming available in electronic health records.
If companies working in the ALS space aren’t looking beyond SNVs and, even beyond genetics to epigenetics, in diverse datasets, they aren’t seeing the full picture of disease and progress in precision medicine will continue to be blocked.
Providers of long-read sequencing are out there, and affordable. Bioinformatics workflows designed to hunt pathogenic variants and adapted to validate their role in diseases such as ALS are becoming more agile and capable of mining large datasets. Other omics tools, including epigenetics profiling and transcriptomics, are becoming more robust and accessible, and increasingly diverse datasets from populations across the world are emerging. It’s time to embrace these tools.
Masha Stromme is co-chair of family office PAACS Invest, executive chair of GenieUs Genomics, and advisor to the Perron Institute for Neurological and Translational Science.
Signed commentaries do not necessarily reflect the views of BioCentury.