Small-Molecule Design for Historically 'Undruggable' Targets

"Undruggable" has always been a temporary classification. It describes a target where the binding sites identified by structural analysis lack the geometry, charge distribution, or hydrophobic character needed to engage a typical drug-like small molecule with sufficient affinity and selectivity. The word captures a difficulty at a specific point in time with specific tools, not a permanent physical impossibility.

Over the past decade, several historically intractable target classes have started yielding to approaches that combine improved structural characterization (cryo-EM at near-atomic resolution, MD-based cryptic site identification) with more sophisticated hit identification methods. We have worked on programs in a few of these spaces. What follows is our honest account of what changes the odds — and what does not.

Why These Targets Were Called Undruggable

The label has been applied to several structurally distinct problem classes that are worth separating:

Flat protein-protein interfaces (PPIs): Protein interaction surfaces are typically 1,000-3,000 Å² in area, shallow, and dominated by backbone contacts rather than the deep hydrophobic pockets that accommodate small molecules at sub-micromolar affinity. Classical docking against PPI surfaces produces many low-confidence poses and few reliable hits.

Transcription factors: Many oncogenic transcription factors (MYC-MAX family, NF-κB family subunits) lack conventional deep binding pockets. Their surfaces are dynamic and present different conformational states in different cellular contexts, making structure-based design against a single apo or partner-bound structure unreliable.

RAS family GTPases: The GTP binding site is highly conserved across the RAS family and has very high GTP affinity (picomolar K_D), making competitive displacement essentially impossible. Early programs targeting KRAS focused on this site and consistently failed. The biology is clear; the pocket geometry is not druggable by the conventional approach.

Intrinsically disordered proteins (IDPs): Some therapeutically relevant proteins are substantially disordered in their functional state. Designing a small molecule to bind a protein with no stable tertiary structure requires either exploiting transiently populated folded states (identified by NMR or MD) or working with the disordered region directly, which typically means accepting weaker and less selective binding.

The common thread: conventional structure-based drug design assumes a well-defined, stable binding pocket. These targets violate that assumption in different ways.

What Has Actually Changed

Two developments have meaningfully expanded the tractability of difficult targets, and they are worth separating from the noise of overpromised approaches.

Cryptic binding site identification from MD simulations and cryo-EM ensembles: Cryptic sites are pockets that do not appear in the ground-state protein structure but transiently open during conformational dynamics. Extended MD simulations (microsecond-scale) frequently reveal these sites in proteins that appear structureless in crystal forms. Cryo-EM datasets, by capturing multiple conformational states, can identify the same transient pockets from experimental data rather than simulation. For RAS family proteins, cryptic allosteric sites discovered this way have been the starting point for the covalent and non-covalent inhibitor programs that have progressed further than any GTP-competitive approach.

Fragment-based lead discovery (FBLD) at low-affinity binding sites: Fragment screening — using small molecules (150-250 Da) at high concentrations (0.1-1 mM) in SPR, NMR, or X-ray crystallographic assays — can detect binding events too weak to be detected by conventional HTS. Fragment hits at flat PPI surfaces at K_D = 1-10 mM have been grown into leads with K_D values below 100 nM through iterative structure-guided elaboration. This is a validated path, not a theoretical one.

What has not fundamentally changed: the thermodynamic reality that flat, polar, solvent-exposed binding sites are genuinely harder to target with small molecules than buried hydrophobic pockets. Computational approaches can find cryptic sites and accelerate fragment expansion, but they cannot change the underlying physical chemistry of protein-ligand binding. A target with no accessible cryptic sites and no hot spot residues contributing disproportionately to interface energy may genuinely require non-small-molecule modalities.

Where Generative Chemistry Changes the Search

For targets with cryptic or allosteric binding sites, the generative chemistry advantage is the same as for conventional targets — but the stakes are higher. Because the chemical space of viable binders for these unusual sites is smaller and less well-explored than for conventional pockets, commercial libraries cover it even less well than they cover conventional druggable targets. This is exactly where enumerative library screening fails most acutely and where generative exploration offers the largest benefit.

Take a flat PPI surface with a known hot spot: two key hydrophobic residues on one face of the interface contributing most of the binding energy, surrounded by polar contacts that are less critical. A fragment hit at K_D = 800 µM engages the hot spot, confirmed by X-ray crystallography showing the aromatic fragment stacking against the hydrophobic side chains. Now the goal is to grow this fragment into a lead-like molecule (MW 350-450 Da) while maintaining or improving the hot spot contacts and adding interactions with the peripheral polar residues.

Fragment growing on a flat surface is constrained differently from growing in a traditional binding pocket: you cannot simply extend vectors into available volume, because the "volume" is solvent-exposed and not enclosed by protein on all sides. Effective growing requires adding contacts with the peripheral protein surface while the steric and conformational constraints are less forgiving than in a deep pocket.

Generative models trained on fragment elaboration data, guided by predicted binding affinity from a GNN trained on PPI-focused ChEMBL data, can systematically explore growing directions that are not obvious from the fragment crystal structure alone. The key constraint: every grown candidate must maintain the core fragment contacts with the hot spot residues confirmed by the crystallographic data. We implement this as a required pharmacophore constraint in the generative objective — any candidate that does not present the key aromatic moiety in the correct geometric relationship to engage the hot spot residues is penalized out of the generation trajectory.

The Covalent Modifier Consideration

For targets with specific reactive cysteines near an accessible site — which describes several kinase mutants and some PPI-engaged surfaces — covalent targeting is a mechanistically distinct option that deserves separate consideration. Covalent modifiers can achieve effective inhibition at binding sites where the non-covalent affinity of a similarly sized molecule would be insufficient for useful occupancy.

We flag this here not because we specialize in covalent design — we do not — but because the decision of whether to pursue a covalent approach should happen before the hit identification campaign, not after a non-covalent campaign has failed. For target cysteines with pKa values suggesting reasonable reactivity with mild electrophiles (acrylamides, vinyl sulfones, cyanoacrylates), a parallel covalent fragment screen alongside the non-covalent campaign is worth the incremental cost at the hit identification stage.

Calibrating Ambition Against Target Biology

The honest limitation of everything described above: not all targets once labeled undruggable have accessible cryptic sites, tractable hot spots, or available cysteines. Some are genuinely difficult in a way that current small-molecule methods cannot address efficiently. For KRAS G12C, a reactive cysteine adjacent to the switch II pocket enabled the covalent allosteric inhibitor programs. Not every KRAS mutation has that cysteine. KRAS G12D remains substantially harder — the analogous allosteric site is shallower and the chemistry is less tractable.

We treat the "undruggable" question as an empirical one at the level of the specific target, specific mutation state, and specific structural characterization available. When a team brings us a target with no precedent and limited structural data, our first recommendation is not to launch a generative chemistry campaign — it is to invest in structural characterization (cryo-EM if the protein permits, MD simulations to identify cryptic sites, NMR fragment screening if the protein is <30 kDa) before committing compute resources to hit generation.

Generative chemistry on a poorly characterized target produces novel compounds with unknown binding modes and predicted properties that are less reliable than usual. The computational investment is better spent after the target biology is understood well enough to define meaningful constraints. That sequencing is the difference between a generative campaign that produces actionable candidates and one that produces expensive noise.