Most clinical microbiologists who encounter nanopore sequencing for the first time encounter it through a vendor presentation focused on clinical applications and time-to-result numbers. The underlying technology — what the device is actually measuring and how those measurements become base calls — is rarely explained in clinical terms. This matters for laboratory adoption decisions, because the technology's limitations and the points at which human judgment are required are direct consequences of how nanopore sequencing physically works. This is a technology explanation for clinical laboratory professionals, not for genomics engineers.
What a nanopore is and what it measures
A nanopore is a protein pore — a molecular channel approximately 1–2 nm in diameter — embedded in a synthetic membrane. The protein pores used in current clinical-grade devices are engineered variants of naturally occurring pore-forming proteins (the specific proteins are proprietary to device manufacturers, but they are derived from biological pore-forming structures). The membrane separates two chambers, and an electrical potential is applied across it.
When the membrane is intact (no pore), current cannot flow. When a pore is inserted into the membrane, a small ionic current flows through it — typically on the order of 100–300 pA. When a DNA or RNA strand is threaded through the pore by a motor protein (a helicase attached to the pore), the nucleotides passing through the pore partially obstruct it, reducing the ionic current in a base-dependent way. Different nucleotides — and different combinations of nucleotides in the vicinity of the pore constriction — produce different current levels.
This is the raw measurement: a time series of ionic current values, sampled at 4000–6000 Hz, as a DNA strand passes through the pore at a controlled rate. The current trace is called a squiggle. A basecaller converts the squiggle to a nucleotide sequence.
Why the error rate is what it is
The relationship between ionic current level and nucleotide identity is not a clean one-to-one mapping. Several factors introduce ambiguity:
k-mer context effects: The current at any point in the squiggle reflects not just the nucleotide immediately at the pore constriction, but the approximately 5–6 surrounding bases that are physically present in the pore at the same time. The same central base produces different current levels depending on its neighbors. The basecaller must learn this entire context-dependent lookup table from training data, and it is imperfectly learned.
Current noise: Thermal and electronic noise in the ionic current measurement adds moment-to-moment variability. A single pore observing a homogeneous run of the same nucleotide will not produce a perfectly flat current trace — it will produce a noisy trace that the basecaller must interpret as a continuous run rather than a sequence of multiple different bases.
Motor protein stepping variability: The helicase motor protein that controls DNA translocation through the pore steps in approximately 1-base increments, but the stepping rate is not perfectly uniform. Variable dwell times per base mean that some bases produce longer current observations and others shorter, adding uncertainty about how many bases correspond to a given segment of the current trace.
These three factors together explain why nanopore reads carry 5–15% per-base error rates in current implementations, and why homopolymer runs are the most systematically miscalled context: the flat current signal combined with motor protein stepping variability makes homopolymer length estimation genuinely difficult.
The flow cell as a clinical consumable
A flow cell contains a synthetic membrane with pores embedded in it, along with the electrical circuitry to apply voltage and measure current from each pore independently. Current clinical-grade flow cells contain hundreds to thousands of individually addressable pores (the exact count is device-specific and proprietary). Each pore operates as an independent sequencing channel — the device runs many pores in parallel to achieve practical throughput.
Flow cells are single-use consumables. They have a maximum run life measured in hours, after which pore quality degrades sufficiently that further sequencing is unproductive. They must be stored under controlled temperature conditions and have a defined shelf life. For clinical laboratory operations, this means flow cell management — lot tracking, expiration monitoring, storage compliance — is part of the quality system in the same way reagent management is for any clinical assay.
Pore occupancy and active pore count are reported during a run and are key QC metrics. A flow cell with very low active pore count (early pore loss from contaminated sample or damaged library) will produce insufficient data for confident identification. The basecalling software should flag runs where active pore count falls below a minimum threshold that would compromise the minimum coverage requirement.
From squiggle to base call: the basecalling software layer
Basecalling is the conversion of raw ionic current (squiggle) to nucleotide sequence. It is performed by a neural network trained on known sequences paired with their observed current traces. The network learns the complex mapping from current segment patterns to base sequences, including the k-mer context effects described above.
From a clinical laboratory perspective, the most important thing to understand about basecalling is that it is a software component with a specific version and a specific training data provenance, and its performance is not immutable. Different basecalling models produce different accuracy levels for different sequence contexts. Updates to the basecalling model change the output of the analytical pipeline in ways that may require re-validation under the LDT framework. Basecalling software version is a material component of the clinical test system, in the same way that a lot number is material for a reagent.
Per-base quality scores output by the basecaller are Phred-scaled: Q10 means 90% probability of being correct; Q20 means 99% probability. For context, Illumina short-read sequencing routinely achieves Q30 (99.9%) or higher per base. Current nanopore reads typically range from Q10–Q20 mean quality with adaptive basecalling, compared to 5–15% error rates with early models. This difference in per-base accuracy has direct implications for how downstream analysis must be performed — statistical consensus over multiple reads, rather than single-read analysis, is required for confident identification and variant calling.
What Phred quality scores mean for clinical interpretation
Per-base quality scores are not the clinical result — they are inputs to the analysis. A clinical microbiologist reviewing a sequencing report does not need to interpret Q-scores directly. However, understanding what they represent helps when reviewing QC metrics or troubleshooting unexpected results.
Mean Q-score of a read is a summary of how confident the basecaller was across all positions in that read. A run with mean read Q-score of Q12–Q14 is operating within the expected range for current nanopore chemistry. A run with mean Q-score below Q8 suggests a quality problem — degraded DNA, contaminated sample, or flow cell issue — that warrants investigation before reporting the result.
The confidence score reported in the organism identification result is a higher-level abstraction: it expresses the probability that the identified organism is correct, given all the reads that contributed to the identification, accounting for their individual quality scores, coverage depth, and the specificity of the alignment. This is the number the clinical microbiologist should engage with. Mean Q-score is instrument QC; identification confidence is clinical result quality.
Common questions from clinical lab directors
"How does this compare to Sanger sequencing for organism identification?" Sanger sequencing of the 16S rRNA gene is a gold standard for bacterial identification from pure culture. Nanopore-based 16S is faster but noisier — appropriate for metagenomics where no pure culture is available, not necessarily superior to Sanger for single-organism identification from culture. The workflows serve different purposes.
"If the reads are noisy, how can we trust the result?" The clinical result is derived from consensus across many reads, not from any single read. The same logic applies here as in PCR: a single reaction with a borderline Ct is less reliable than triplicate reactions at the same Ct. Depth and confidence scoring are the clinical quality control mechanism.
"What happens when there are two organisms present?" Metagenomic analysis will identify both — the analysis reports on all organisms above the detection threshold, with separate confidence scores. Mixed results must be reviewed carefully; the relative abundance of each organism (inferred from read counts) provides context about whether each is likely to be clinically significant or background contamination.
The technology is genuinely accessible to clinical laboratories with the right software and operational framework. The learning curve is steeper than for PCR panels but shallower than it often appears from the outside — primarily because most of the technical complexity is abstracted into the software layer, which is where it belongs for a clinical tool.