Why Brute-Force Hit Identification Fails Small Biotech Teams

The pitch for high-throughput screening is intuitive: run millions of compounds against a target, measure binding or activity, filter the actives. It sounds like brute force should work, and at large pharma with a 2-million compound library and liquid-handling robotics that can process 100,000 wells per day, it does — sometimes. The problem is that the same math that makes HTS workable at industrial scale becomes actively hostile to small teams.

We work with medicinal chemistry teams running programs on a handful of targets with tight budgets and tighter timelines. When they come to us, they have usually already tried the obvious approaches — purchased a screening set, sent compounds to a CRO for a biochemical assay panel, maybe run a fragment screen — and come back with either nothing, or actives that fall apart at the first ADMET pass. Understanding why this keeps happening requires looking at the geometry of chemical space.

The Chemical Space Problem Is Worse Than You Think

Drug-like chemical space is commonly estimated at 10⁶⁰ molecules when you enumerate all structurally reasonable, low-molecular-weight compounds satisfying basic Lipinski criteria. Even generous estimates put the number of synthesized and catalogued compounds at around 10⁸ — the commercial libraries, ChEMBL entries, PubChem records, the works. That means every HTS campaign, regardless of how large the library, is sampling less than 0.000001% of the relevant space.

Big pharma accepts this. Their libraries are curated over decades to cluster near known bioactive chemotypes — they are not random samples of chemical space, they are biased toward scaffolds that have historically worked. This bias is actually useful when your target class is well-precedented. But for small teams working on novel targets, or targets with no validated chemotype, that historical bias becomes a liability. You are searching a well-excavated region of chemical space for something you have never found there before.

The Enamine REAL library currently contains around 6 billion make-on-demand compounds. That sounds enormous until you recognize it represents perhaps 0.000006% of drug-like chemical space — and virtual screening against it still requires docking every compound, which at realistic compute costs means either tolerating crude scoring functions or spending weeks of CPU time.

Where Small Teams Bleed Time and Budget

The hit identification phase is not where small teams spend most of their money. It is where they spend most of their delay. Consider a typical scenario: a biotech team working on a GPCR target with an allosteric binding site identified from cryo-EM data in 2023. They purchase a 50,000-compound diversity set, run a fluorescence polarization assay, and pull out 200 apparent actives at >30% inhibition at 10 µM. After counterscreening for promiscuous binders and aggregators, they are left with 40 compounds. Of those, 8 show dose-response IC₅₀ < 1 µM. Good so far.

But then the ADMET pass happens. Microsomal stability: 4 of the 8 have T_1/2 < 20 minutes in human liver microsomes. CYP3A4 inhibition: 3 of the remaining 4 show IC₅₀ < 1 µM in the fluorescence-based CYP inhibition assay — that is a clinical DDI concern at any reasonable projected dose. hERG: one more flags at IC₅₀ = 0.8 µM. You started with 50,000 compounds and you are left with one chemically tractable hit, likely carrying structural liabilities that explain why it survived both screening and the first ADMET filter.

This is not an unusual outcome. It is the expected outcome when hit identification is decoupled from property prediction. The screen finds binders; the ADMET cascade removes them.

Why the Library Diversity Argument Is Partially Misleading

The standard response to this problem is "buy a better library." Suppliers sell diversity sets curated to maximize pharmacophore coverage, molecular framework diversity, 3D shape variation. These are real improvements over random compound collections. We are not dismissing library design — it matters.

The limitation is that diversity metrics for commercial libraries are computed in descriptor space (ECFP4 Tanimoto distance, shape-based clustering), not in the multi-dimensional property space you actually care about: binding affinity × metabolic stability × permeability × synthetic accessibility. A library that looks diverse in fingerprint space can still present correlated ADMET failure modes because the underlying chemotypes share common metabolic liabilities. Amide hydrolysis, aldehyde oxidation by AO, glucuronidation of phenols — these are scaffold-level properties, and a library curated for structural diversity can still cluster badly in metabolic space.

We have seen this in practice with benzimidazole-containing diversity sets. Structurally, they look diverse. In terms of CYP1A2 inhibition potential, they cluster tightly — the fused heterocycle is the dominant predictor, and the appendages vary around a common liability. Screening a set like this gives you an inflated sense of diversity that does not survive the CYP panel.

The Assay Throughput Illusion

Modern biochemical assay platforms are genuinely impressive. AlphaScreen, TR-FRET, and SPR-based platforms can run hundreds of thousands of data points per week at CROs. The throughput numbers get cited in slide decks as evidence that brute-force screening is viable at small scale if you just pay for the right assay service.

The problem is that assay throughput does not address chemical space coverage. Running 500,000 compounds through an SPR assay is not exploring 500,000 distinct regions of chemical space — it is testing 500,000 representatives from a highly non-uniform distribution of known, purchasable compounds. The regions of chemical space most likely to contain your ideal hit may have no representatives in any commercial library, because those molecules have never been synthesized.

This is particularly acute for targets with cryptic or allosteric binding sites that do not resemble known druggable pockets. Fragment-based lead discovery addresses this partially — fragments sample chemical space more efficiently per compound than lead-like molecules — but fragment hits at K_D = 500 µM to 1 mM require significant elaboration before they become usable leads, and the elaboration step is where most fragment programs stall.

What Changes When You Compute Before You Synthesize

The argument for computational-first hit identification is not that computation is more accurate than experiment. It is not. Experimental binding affinity measurements are ground truth; computational predictions carry uncertainty that we report explicitly and do not paper over. The argument is about decision ordering.

When you run HTS and then ADMET, you are discarding compounds after you have paid to test them. When you predict ADMET before selecting candidates to synthesize or purchase, you are discarding compounds before you pay to test them. For a team with a $500K annual chemistry budget, the difference between filtering 50,000 compounds experimentally versus filtering 2 million candidates computationally — and then synthesizing the 20 best — is the difference between having a lead and not having one by the end of the year.

The practical caveat: computational property models are most reliable in well-populated chemical space and most uncertain in novel scaffold classes. A VAE-generated scaffold with no close analogs in ChEMBL will have wider confidence intervals on its predicted pIC₅₀, its LogP, and its CYP inhibition profile. We flag this explicitly in every output package — not as a disclaimer but as a prioritization signal. Compounds with tight confidence intervals on all 11 ADMET dimensions should go to the front of the synthesis queue.

Where Brute Force Is Still the Right Call

We are not arguing that HTS is obsolete or that every hit identification program should start computationally. There are target classes where brute-force screening is genuinely the right first move: when you have a high-throughput phenotypic assay with no structural hypothesis, when your target has no known binders to bias a computational model, or when your binding pocket is so unusual that available docking force fields are known to perform poorly (certain RNA-targeting programs fall into this category).

The failure mode we are describing is specifically a resource allocation problem for small teams: spending six months and $300K on a screen that yields a single chemically tractable hit carrying known liabilities, when a four-week computational campaign could have delivered 50 candidates scored across binding, ADMET, and synthetic accessibility simultaneously. The HTS approach is not wrong in principle — it is wrong in context when the team does not have the compound library, the assay infrastructure, or the iteration budget that makes it work at the large-pharma scale it was designed for.

The chemistry is the same. The constraints are different. Designing around those constraints is what changes the outcome.