Why Synthetic Accessibility Must Be a First-Class Model Objective

Early generative molecular design papers share a characteristic blind spot: they report binding affinity predictions and ADMET scores but present synthetic accessibility as an afterthought — a column in a supplementary table, an SA score annotation, occasionally a note that "the proposed compounds appear synthetically feasible." The implication is that synthesis is a downstream concern, something for CRO chemists to handle once the computational pipeline delivers its results.

This framing is wrong in a way that costs programs real time. A molecule with predicted pIC₅₀ of 8.5 and excellent ADMET scores is worthless if it requires 14 synthetic steps, non-commercial intermediates, and reaction conditions that no medicinal chemistry CRO will run at hit validation scale. The synthesis feasibility problem has to be part of the generation objective, not a post-hoc annotation.

What SA Score Measures and What It Misses

The RDKit SA score, introduced by Ertl and Schuffenhauer in 2009, is the most widely used computational proxy for synthetic accessibility. It assigns a score from 1 (easiest to synthesize) to 10 (most difficult) based on fragment frequency in a training corpus of known synthesized compounds — the intuition being that molecules built from common fragments and simple ring systems should score better than molecules with unusual structural features.

SA score is useful as a coarse filter. Molecules scoring above 6 reliably contain structural features that most CRO chemists will flag immediately: unusual fused ring systems, high stereocentre density, multiple uncommon functional groups in combination. For our purposes, we use SA < 4.5 as a generative constraint and SA < 4.0 as the preferred range for the top synthesis queue.

What SA score does not capture: the availability of specific starting materials at commercial scale, the feasibility of specific reaction steps in sequence, and the cost of chiral resolution or asymmetric synthesis when stereocenters are required. A molecule scoring SA = 3.8 might still be impractical if the final step requires a Pd-catalyzed cross-coupling with a boronate ester that is not commercially available and takes two weeks to prepare.

This is the fundamental limitation of fragment-frequency-based accessibility scores: they assess structural features of the target molecule but not the practical synthetic route to reach it. SCScore (synthetic complexity score), which was trained on reaction databases to learn which molecules appear as products versus starting materials in published chemistry, captures some of this directional information. We use SA score for coarse filtering and SCScore for secondary ranking of scaffolds above SA = 4.0.

Integrating Accessibility into the Generative Loop

Treating SA score as a post-hoc filter has a known failure mode: the generative model, unconstrained, will drift toward structures that optimize binding affinity and ADMET while gradually accumulating structural complexity. This is not a bug in the model — it is expected behavior. From the model's perspective, adding a ring closure or a stereocenter might improve binding affinity by 0.3 log units, and if SA score is not penalizing that move, the model will make it repeatedly. By the time you filter at SA < 4.5, you may have discarded 70% of the output.

The better approach: include SA score as a penalty term in the objective function during gradient-guided latent space navigation. Each gradient step that would increase SA above the threshold incurs a penalty proportional to the excess. This keeps the generative trajectory within the accessible region of chemical space rather than letting it drift toward complexity and then filtering it back.

In practice, we parameterize the SA penalty as a soft constraint: no penalty for SA < 4.0, linear penalty for 4.0–5.0, quadratic penalty above 5.0. The quadratic tail is important — without it, the model sometimes finds ways to hover just above the linear penalty threshold, generating structures at SA = 4.9 that score marginally acceptable but that consistently fail CRO quoting.

We also incorporate a second accessibility signal: Enamine REAL space proximity. If a generated structure's Murcko scaffold is present in Enamine REAL with Tanimoto similarity above 0.5, we treat this as a positive accessibility signal independent of SA score. The reasoning: Enamine REAL compounds are pre-validated for make-on-demand synthesis; proximity to their scaffolds suggests the structural features are achievable with standard CRO reagent catalogs.

The Reagent Availability Problem

A layer below SA score is reagent availability: whether the building blocks needed to synthesize the target molecule are commercially available at CRO-compatible prices and lead times. A molecule might score SA = 3.2 because its structure looks simple, but if the key heterocyclic building block is available from a single supplier with a 10-week lead time, it is not practically accessible for a hit validation program.

We address this through a catalog cross-reference step. After generating and filtering candidates by SA score and SCScore, we run the proposed synthesis routes through a building block availability check against Enamine and Sigma-Aldrich commercial catalogs. Compounds where all key intermediates are available in catalog quantities (>1 g, <$200/g) are flagged as "catalog-accessible." Compounds requiring non-catalog intermediates are flagged with the specific missing building block.

This additional step reduces the synthesis queue to a tractable size. In our experience, about 60-65% of SA < 4.0 compounds pass catalog availability checks for routes up to 4 synthetic steps. That 35-40% failure rate — at what we consider the most accessible end of our output — reflects how much practical accessibility information SA score leaves on the table.

When to Relax Accessibility Constraints

There are programs where accepting higher synthetic complexity is the right call. If you are working on a target where no tractable lead exists, and the generative model can only find solutions at SA = 5.5-6.0, the question is whether those are worth pursuing at the cost of longer and more expensive synthesis. The answer depends on program stage and resource availability.

For hit identification, our strong recommendation is to hold to SA < 4.5. At this stage, you need to validate binding and basic pharmacology quickly; complex syntheses slow iteration to the point where the information value of each cycle drops below the cost. Accept that some potentially excellent binders will be set aside at this stage — they are not lost, they are deferred to lead optimization where synthesis complexity is more justified.

For lead optimization, relaxing to SA < 5.5 is reasonable when the binding affinity data justifies the investment. At that stage, a medicinal chemist is evaluating a much smaller set of candidates with confirmed experimental binding data, and the cost-benefit calculation for a complex synthesis shifts.

We are not suggesting that accessibility should always trump binding affinity — that would eliminate whole categories of useful chemistry. The argument is specifically about where the accessibility constraint belongs in the decision process: as part of the generation objective, not as a filter applied to the results. The molecules that survive a well-designed generative run with integrated SA penalty are a better starting point for a CRO synthesis campaign than the subset that survives post-hoc filtering of a larger, unconstrained run.

Validating the Accessibility Claims

We track synthesis outcomes from programs we have contributed to, specifically looking at the gap between predicted SA score and actual CRO synthesis success. For compounds with SA < 3.5, CRO quoting success (receipt of a confirmed synthesis route within 3 weeks) is above 88%. For compounds at SA 3.5–4.5, success drops to 71%. Above SA 4.5, success drops to 44%.

These numbers come from a small set of programs and should not be treated as statistically sound benchmarks — we report them as directional evidence for calibrating the SA threshold, not as validated performance metrics. The key observation is that the relationship between SA score and CRO quoting success is real and monotonic, even if the exact numbers will vary by CRO and target class. That relationship is why SA score belongs inside the optimization objective, not in a post-processing column.