Molecular Generation Engine

The Nanolix molecular generation engine.

From target constraints to ranked candidates. A computational pipeline built for medicinal chemistry teams who need candidates that can actually be made — not just candidates that score well in silico.

Architecture

Target constraints in. Ranked candidates out.

Property prediction models run in parallel with the generative engine — evaluating each candidate across 11 ADMET dimensions simultaneously, not after generation is complete. That co-optimization step is what separates useful candidate sets from high-scoring-but-unsynthesizable structures.

What we model

11 ADMET properties. Predicted before synthesis.

Binding affinity (IC50) Aqueous solubility Metabolic stability (T1/2) Membrane permeability (Papp) hERG channel inhibition CYP3A4 inhibition Plasma protein binding Hepatic clearance Oral bioavailability (F%) Acute cytotoxicity (IC50) Synthetic accessibility (SA)
Methodology

Ensemble prediction with calibrated confidence

Binding affinity prediction combines docking score rescoring with a graph neural network ensemble trained on ChEMBL activity data. The GNN encoder captures molecular topology — not fingerprints — giving better performance on novel scaffold classes.

ADMET models use a multi-task architecture that learns property correlations jointly. This matters: a model that learns solubility and permeability together captures the physical relationship between them. Training data spans ~2.1M compounds from ChEMBL plus proprietary assay partnerships.

Every prediction includes a calibrated confidence interval. We report uncertainty honestly — wide intervals tell you where to focus the first synthesis round rather than falsely narrowing the decision space.

Full methodology details
Abstract 2D chemical space visualization showing molecular cluster regions with color-coded property zones
Generative Chemistry

We navigate, not enumerate

The generative engine explores via gradient-guided sampling in a learned latent chemical space. It is not enumerative — it navigates toward multi-property optima rather than cataloging exhaustive libraries. This is the key difference from virtual screening: we sample regions where the gradient of the property landscape points, not regions where enumeration is convenient.

A variational autoencoder maps molecules to a continuous latent space. Navigation in that space is guided by multi-property gradients — the engine moves toward regions expected to satisfy all your constraints simultaneously, not just binding affinity.

Synthetic accessibility is a first-class constraint during navigation — not a post-generation filter. Candidates are scored on SA throughout the sampling process, so the output skews toward structures your CRO can actually quote.

Output Package

What you receive at delivery

SDF file with all ranked candidate structures and embedded property annotations
ADMET prediction table with calibrated confidence intervals for all 11 properties
Top-5 synthesis route options per scaffold class, pre-checked against Enamine and WuXi AppTec catalog availability
Diversity analysis report showing structural distance across the candidate set and relative to known IP space
Pareto ranking rationale — which candidates score at the property trade-off frontier and where the conflicts are
discovery_run_output.csv
rank smiles pIC50 sol_uM hERG SA_score CI_width
001 CC1CC(N…)C(=O) 8.42 142 0.08 2.1 ±0.31
002 COc1cc(C…)nc2 8.19 89 0.12 2.4 ±0.28
003 FC(F)(F)c1cccc 8.11 204 0.06 1.9 ±0.42
004 O=C(Nc1cccc)n2 7.98 61 0.21 2.7 ±0.35
005 Cc1ccc(F)cc1NC 7.84 317 0.05 1.7 ±0.29

pIC50 = predicted binding; sol_uM = aqueous solubility; hERG = inhibition probability; SA_score = synthetic accessibility (lower = more accessible); CI_width = prediction confidence interval

Integrations

Fits into Maestro, LIMS, and CRO handoff without reformatting

Molecular Software
Schrödinger Maestro SDF Import

All output SDF files are formatted for direct import into Maestro for visualization, docking validation, and further analysis. No conversion step required.

Data Systems
REST API for LIMS Delivery

Program-tier engagements include REST API delivery. Output schema configurable to match your LIMS ingest format. JSON or CSV endpoint per your platform requirements.

CRO Handoff
Enamine & WuXi AppTec Compatibility

Synthesis routes are pre-checked against Enamine REAL Space and WuXi AppTec standard catalog reagent availability. Candidates your CRO can quote without custom reagent sourcing.

Start here

Run a sample against your target.

Give us your target constraints. We'll return a sample candidate set with predicted properties within 5 business days — before any contract. 30-minute call to align on parameters, then we run the generation.

Request a Target Briefing