onnx_batch.py — Bottle-End Detection and Per-Cell Vision Inference¶
Overview¶
onnx_batch.py is the per-cell bottle-end detection worker of the Wine Platform vision pipeline.
It processes the individual crop images generated by crop_with_json.py, runs ONNX inference for bottle-end detection, applies additional rule-based validation, generates annotated debug images, and writes a structured CSV summary for later database comparison.
This script is the core machine-vision inference stage of the automated stock system.
Its job is not only to detect bottle ends, but also to:
- classify shelf cell shape
- clamp impossible counts by geometry
- estimate image blur
- flag uncertain cells for human review
- produce a detector results table for downstream reconciliation
Position in the Vision Pipeline¶
flowchart TD
CROP["crop_with_json.py"]
CELLPNG["cell crop images"]
ONNX["onnx_batch.py"]
CSV["results.csv"]
DBG["annotated debug images"]
DB["DB_compare.py"]
CROP --> CELLPNG
CELLPNG --> ONNX
ONNX --> CSV
ONNX --> DBG
CSV --> DB
Purpose of the Module¶
Each crop image represents one physical shelf cell.
The script processes each crop independently and answers the operational question:
How many bottles are present in this cell?
To do that, it combines:
- ONNX object detection
- geometry-based cell-type interpretation
- confidence filtering
- image-quality inspection
- optional fallback circle counting for review support
This combination is more robust than raw object detection alone.
Inputs¶
The script reads:
ONNX model¶
<cfg.paths["model_dir"]>/best_V4.onnx
Input crop images¶
<cfg.paths["crop_out"]>
These images are produced by the crop stage and are expected to be one image per cell.
Outputs¶
Main CSV output¶
<cfg.paths["det_out"]>/results.csv
This file is the authoritative vision output used later by DB_compare.py.
Debug images¶
Annotated per-cell images are written into:
<cfg.paths["det_out"]>
with the same base filenames as the input crops.
These images show: - grey rectangles for raw detections discarded by the stricter trust filter - blue rectangles for kept detections that contribute to the final ONNX count - green rectangles for kept detections whose box also contains a circle-helper match - summary overlay - review flag
Operational Design¶
The script is designed to run:
- headless
- on Raspberry Pi or Mac
- using CPU-only ONNX Runtime
- in batch over all crop images
It avoids network calls and focuses entirely on local inference.
Main Runtime Configuration¶
Inside run(cfg), the script defines several important parameters.
Model and directories¶
onnx_model¶
cfg.paths["model_dir"] / "best_V4.onnx"
Path to the trained bottle-end detector.
input_dir¶
cfg.paths["crop_out"]
Directory containing per-cell crop images.
out_dir¶
cfg.paths["det_out"]
Directory where results and debug artifacts are written.
Inference parameters¶
imgsz = 416¶
Model input size for letterboxed inference.
Role: - all input images are resized and padded to this square size before inference
conf_thres = 0.35¶
Initial detector confidence threshold.
Role: - weak detections below this are removed before NMS
Comment in code notes that this may be lowered in difficult lighting conditions.
iou_thres = 0.45¶
Non-maximum suppression overlap threshold.
Role: - suppress duplicate overlapping boxes
max_det = 20¶
Maximum allowed detections kept after NMS.
This is a safety upper bound.
Review heuristics¶
min_mean_conf = 0.45¶
If detections exist but average kept confidence is below this threshold, the image is flagged for review.
blur_thresh = 35.0¶
If blur score falls below this threshold, the image is flagged for review.
flag_uncertain_shape = True¶
Whether ambiguous shape classification near the decision boundary should mark the cell for review.
High-Level Processing Flow¶
flowchart TD
START["run(cfg)"]
LOAD["load ONNX model"]
LIST["list crop images"]
LOOP["for each crop image"]
TYPE["detect cell type"]
PRED["run ONNX detector"]
FILTER["confidence filter"]
CLAMP["clamp count by cell type"]
QUALITY["blur + confidence + shape review checks"]
CIRCLE["optional circle fallback"]
DBG["write annotated debug image"]
ROW["append CSV row"]
CSV["write results.csv"]
RET["return summary dict"]
START --> LOAD
LOAD --> LIST
LIST --> LOOP
LOOP --> TYPE
TYPE --> PRED
PRED --> FILTER
FILTER --> CLAMP
CLAMP --> QUALITY
QUALITY --> CIRCLE
CIRCLE --> DBG
DBG --> ROW
ROW --> LOOP
LOOP --> CSV
CSV --> RET
Cell Type Detection from Black Background¶
detect_cell_type_from_black_bg(bgr)¶
This helper determines whether the crop corresponds to:
- a
triangle - or a
rhombus
This is important because the maximum physically possible bottle count differs by cell shape.
Why This Works¶
Because crop_with_json.py masks pixels outside the polygon to black, the visible non-black region represents the actual cell footprint.
This means the detector can infer shape from the crop silhouette.
Processing Steps¶
- convert BGR to grayscale
- threshold non-black pixels
- apply morphological close and open
- find largest contour
- compute convex hull
- compute rotated minimum-area rectangle
- compare shape fill extent
Variable Roles¶
mask = (gray > 10) * 255¶
Identifies non-black pixels as foreground.
This assumes black background is meaningful and intentional.
Morphological kernel¶
cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
Used to stabilize the foreground mask.
rot_extent¶
Computed as:
area_of_convex_hull / area_of_rotated_rect
Interpretation:
- more compact fill suggests rhombus
- lower fill suggests triangle
verts¶
Number of vertices after approximating the hull.
Used as a tie-breaker in the uncertain zone.
Threshold Logic¶
If rot_extent >= 0.72¶
Cell classified as:
rhombus
If rot_extent <= 0.66¶
Cell classified as:
triangle
If in between¶
Fallback logic:
- verts == 3 → triangle
- otherwise → rhombus
This makes shape classification more robust in ambiguous cases.
Count Clamping by Cell Type¶
clamp_count_by_type(count, cell_type)¶
This function imposes physical constraints.
Triangle cell¶
Maximum count:
3
Rhombus cell¶
Maximum count:
9
This is a structural prior from the shelf design.
It prevents implausible model outputs from being accepted directly.
Blur Estimation¶
blur_score(bgr)¶
Uses Laplacian variance on grayscale image.
Interpretation: - high variance = sharper image - low variance = blurrier image
This score does not directly change count, but contributes to:
needs_review
This is important because blur can cause false negatives.
Preprocessing for ONNX Inference¶
letterbox(img, new_shape=(320, 320))¶
This helper resizes and pads the input image while preserving aspect ratio.
Output:
- padded image
- scale ratio r
- padding offsets (dw, dh)
This is standard YOLO-style preprocessing.
Variable Roles¶
r¶
Scale ratio between original image and model input size.
Used later to map detections back to original image coordinates.
dw, dh¶
Horizontal and vertical padding.
These must be undone after inference.
Non-Maximum Suppression¶
nms_xyxy(boxes_xyxy, scores, iou_thres)¶
This helper applies OpenCV NMS to suppress duplicate detections.
The function internally converts boxes from:
xyxy
to:
xywh
because OpenCV expects width-height format.
Output: - list of kept indices
Circle-Based Cap Detection¶
detect_caps(gray, minR, maxR, minDistCoeff)¶
This is a fallback / auxiliary heuristic based on Hough circles.
It attempts to detect bright circular bottle caps.
This function is not the primary detector.
It is mainly used when:
- mean confidence is zero
- or the sample already needs review
So it supports diagnostic reasoning rather than replacing the model.
Variable Roles¶
minR, maxR¶
Minimum and maximum allowed circle radii.
minDistCoeff¶
Controls minimum spacing between detected circles.
param1, param2¶
Internal Hough transform thresholds.
Brightness filter¶
Candidate circle centers must satisfy:
gray[y, x] > mean + 0.3 * std
This filters out circles that are not bright enough to plausibly be bottle ends.
count_bottles()¶
This wrapper uses detect_caps() and optionally draws debug circles.
It returns:
- cap count
- method string
- debug image
- detected radii
In the current pipeline, it is only invoked when the primary detection result is uncertain or absent.
When debug coloring is enabled, circle detections are also associated with kept ONNX boxes.
Association rule:
- compute circle centers (x, y) from detect_caps()
- test whether the center falls inside, or very near, a kept ONNX box
- upgrade that box color from blue to green
This association is used only for debug visualization. It does not change the CSV count logic.
YOLOv8 ONNX Detector Class¶
YoloV8OnnxDetector¶
This class encapsulates model loading, preprocessing, output decoding, and prediction.
Constructor¶
__init__(onnx_path, imgsz=320)¶
Loads the ONNX model with:
providers=["CPUExecutionProvider"]
This is aligned with Raspberry Pi operation and avoids GPU dependency.
It also caches:
- input tensor name
- output tensor name
_prepare(bgr)¶
Preprocessing steps:
- letterbox resize/pad to square
- convert BGR to RGB
- normalize pixel range to
[0, 1] - transpose to CHW
- add batch dimension
Output tensor shape:
(1, 3, H, W)
_decode_output(out)¶
This helper makes the script tolerant to multiple common YOLOv8 ONNX output layouts.
Supported general forms:
(1, N, 5)(1, 5, N)(1, N, 4+nc)(1, 4+nc, N)
It normalizes these into:
boxes_xywhscores
This makes the worker more robust to export variations.
predict(...)¶
This is the main inference method.
Steps:
- preprocess image
- run ONNX session
- decode raw output
- filter by confidence
- convert
xywh -> xyxy - sort by score descending
- apply NMS
- limit to
max_det - map boxes back to original image coordinates
- clip to image bounds
Outputs:
- boxes_xyxy
- confs
These are the raw model detections before additional pipeline heuristics.
Main Runtime Loop¶
Listing input images¶
The script processes all files in crop_out with suffix:
.png.jpg.jpeg
If no images are found, it raises RuntimeError.
Per-image processing stages¶
For each crop image:
- load image
- determine cell type
- determine max physical count
- run ONNX detector
- apply secondary confidence filter
- compute final count
- evaluate review status
- optionally run circle helper
- draw debug overlay
- append CSV row
Secondary Detection Filtering¶
After the detector returns boxes and confidences, the script applies an additional threshold:
conf_min_det = 0.50
This is stricter than the initial conf_thres.
Purpose¶
- keep weaker predictions during model stage
- but trust only stronger detections for final count
This two-level approach is helpful because: - it preserves uncertainty information - it still produces stable final counts
Variable Roles¶
raw_count¶
Number of detections after detector thresholding and NMS, before strict trust filtering.
keep¶
Boolean mask for detections with confidence >= conf_min_det.
boxes_f, confs_f¶
Final trusted detections.
For debug image generation, these boxes are drawn in blue by default and may be upgraded to green if a circle-helper detection matches the same box.
filtered_count¶
Number of trusted detections.
count¶
Final count after physical clamp by cell type.
mean_conf¶
Average confidence of kept detections.
Review Logic¶
The script calculates:
needs_review
This is one of its most important output signals.
A cell is flagged if any of the following conditions apply.
1. Uncertain shape¶
If rot_extent lies between:
0.66 and 0.72
and shape uncertainty flagging is enabled, the sample is marked for review.
2. Raw count exceeds physical max¶
If the detector produced more detections than physically plausible before clamping, this is suspicious.
3. Low mean confidence¶
If detections exist but mean trusted confidence is below:
min_mean_conf = 0.45
the result is considered uncertain.
4. Low blur score¶
If image sharpness is below:
blur_thresh = 35.0
the result is flagged because false negatives become more likely.
5. Weak or absent detections¶
If detections existed but all were discarded by the trust filter, or if uncertainty is otherwise high, the circle-based fallback is run for additional evidence.
Debug Image Generation¶
For each input crop, the script creates an annotated debug image.
Grey boxes¶
Drawn for discarded detections.
Meaning:
- model proposed them
- they are included in raw_count
- but they were rejected by the stricter trust threshold and do not contribute to count
Blue boxes¶
Drawn for kept detections by default.
Meaning:
- they survived the stricter trust filter
- they contribute to final ONNX count
- no circle-helper match was found for that specific box
Green boxes¶
Drawn for kept detections that also match the circle helper.
Meaning:
- they survived the stricter trust filter
- they contribute to final ONNX count
- a detected circle center falls inside, or very near, the same box
Text overlay¶
Shows:
- cell type
rot_extent- raw count
- final count
- mean confidence
- blur score
If review is needed, the image also contains:
NEEDS_REVIEW
in red.
These debug images are extremely valuable for validation and tuning.
CSV Output¶
The script writes results.csv with rows containing:
filebin_IDcell_typerot_extentraw_countcountmean_confblurneeds_reviewcirle_count
CSV Output Field Explanation¶
The results.csv file produced by onnx_batch.py contains several diagnostic fields that describe how the final bottle count decision was reached.
These fields allow the system (and operators) to evaluate geometry, detector output, confidence, and image quality simultaneously.
The most important diagnostic fields are:
rot_extentraw_countmean_confblur
Together they determine whether a result is trustworthy or requires human review.
rot_extent — Shape Confidence (Triangle vs Rhombus)¶
What it is¶
rot_extent measures how much of the rotated bounding rectangle of the cell is actually filled by the visible non-black cell area.
Technically:
rot_extent = (area of cell shape) / (area of rotated bounding box)
This value is computed during cell-type detection using the convex hull of the cell silhouette.
Why the Rotated Bounding Box Matters¶
Your shelf cells are rotated diamonds (rhombuses).
If a normal axis-aligned bounding box were used, the bounding area would be much larger than the real cell footprint.
Using the minimum rotated rectangle ensures the ratio reflects the true shape occupancy.
Typical Values in the System¶
| Shape | rot_extent range | Meaning |
|---|---|---|
| Triangle | ~0.45 – 0.65 | Sparse triangular footprint |
| Rhombus | ~0.75 – 0.95 | Dense diamond-shaped footprint |
| Borderline | ~0.66 – 0.72 | Ambiguous geometry |
Values inside the borderline zone are flagged for review.
Why This Field Is Important¶
rot_extent is used to:
- determine cell geometry
- prevent wrong bottle-count limits
- flag uncertain shapes for review
Example:
triangle → max bottles = 3
rhombus → max bottles = 9
Without this geometry detection the detector could produce physically impossible counts.
This field is the reason triangle/rhombus misclassification errors were eliminated.
raw_count — What the Detector Actually Saw¶
What it is¶
raw_count is the number of bottle ends detected before any correction or clamping.
Example:
raw_count = 6
cell_type = triangle
final_count = min(6,3) = 3
Why It Matters¶
This value represents the pure ML model output.
It allows the system to differentiate between:
| Situation | Meaning |
|---|---|
| raw_count > final count | detections were clamped by geometry |
| raw_count == 0 but bottles exist | likely false negative |
| raw_count very high | detector confusion |
Operational Use¶
raw_count is used to detect suspicious conditions:
- excessive detections
- detector confusion
- geometry conflicts
It is the key variable when comparing ML perception vs physical shelf constraints.
mean_conf — Average Detector Confidence¶
What it is¶
mean_conf is the average confidence score of all trusted bottle-end detections in the cell.
Example:
detections = [0.81, 0.76, 0.72]
mean_conf = 0.763
Interpretation¶
| mean_conf | Meaning |
|---|---|
| > 0.75 | Very reliable detection |
| 0.55 – 0.75 | Normal reliability |
| 0.35 – 0.55 | Suspicious |
| < 0.35 | Likely noise |
Why It Matters¶
Low confidence typically indicates difficult visual conditions:
- bottle reflections
- glare
- partial occlusion
- weak lighting
- label interference
Operational Use¶
mean_conf is used to:
- flag potential false positives
- identify uncertain detections
- support review decisions
This field represents ML certainty.
blur — Image Quality / Focus Indicator¶
What it is¶
blur measures image sharpness using Laplacian variance.
Interpretation:
High variance → sharp image
Low variance → blurry image
Blur directly affects detection reliability.
Typical Values for the Current Camera Setup¶
| blur value | Meaning |
|---|---|
| > 80 | Very sharp |
| 40 – 80 | Acceptable |
| 25 – 40 | Risky |
| < 25 | Likely missed bottles |
Why Blur Is Critical¶
False negatives (missed bottles) often correlate with:
- motion blur
- lighting instability
- compression artifacts
- camera exposure changes
Several previously observed detection errors were caused by this exact situation.
Operational Use¶
blur allows the system to distinguish between:
| Situation | Interpretation |
|---|---|
| low blur + low count | likely image-quality issue |
| high blur + low confidence | unreliable detection |
| sharp image + normal detection | trustworthy result |
This metric represents image reliability, not ML reliability.
How These Fields Work Together¶
The power of the system comes from combining all four diagnostics.
| Field | Role |
|---|---|
| rot_extent | Geometry truth |
| raw_count | Model perception |
| mean_conf | Model certainty |
| blur | Image reliability |
Together they answer the key operational question:
Is this bottle count trustworthy,
or should a human review this bin?
This multi-layer evaluation explains why the system now achieves very high accuracy in final counts: the ML detector is supported by geometry constraints, confidence analysis, and image-quality diagnostics rather than acting alone.
Important Note on bin_ID¶
The script derives bin_ID from the filename stem:
stem[:5]
Given filenames like:
01_01.png
this produces:
01_01
So stable crop naming from crop_with_json.py is essential.
Return Summary¶
At the end, run(cfg) returns:
- step name
- number of processed cells
- CSV path
- debug directory
- number of review-flagged cells
This summary is later captured by stock_runtime.py.
Manual Debug Entry Point¶
main()¶
Standalone mode:
- configure logging
- load config
- run batch detection
- print summary
Useful for: - detector tuning - model replacement - crop validation - review analysis
Relationship with Upstream Crop Stage¶
This script assumes:
- crops already exist
- polygon masking has been applied
- filenames encode stable bin IDs
- black background outside polygon is meaningful
In particular, the shape-classification stage depends on that black background behavior.
So onnx_batch.py and crop_with_json.py are tightly coupled by design.
Relationship with Downstream DB Comparison¶
The main operational output is:
results.csv
Later, DB_compare.py reads this file and compares count grouped by bin_ID against database quantities.
This means onnx_batch.py is the authoritative vision-count generator.
Failure Modes¶
Typical failures include:
| Failure | Meaning |
|---|---|
| missing ONNX model | model deployment issue |
| empty crop directory | upstream crop stage failed |
| unreadable crop image | storage/path issue |
| unsupported ONNX output shape | model export mismatch |
| write failure for debug/CSV | filesystem problem |
| low-quality images causing review spikes | illumination or focus problem |
The script raises on critical pipeline failures, but preserves per-image tolerance where possible.
Summary¶
onnx_batch.py is the per-cell inference engine of the Wine Platform vision pipeline.
Its responsibilities include:
- loading the ONNX detector
- running bottle-end detection over all crop images
- classifying cell shape from masked geometry
- constraining counts by physical shelf rules
- flagging uncertain cases
- writing annotated debug images
- exporting structured results for stock reconciliation
It combines machine learning with shelf-specific heuristics, which makes the final stock estimation more robust and more auditable than raw detector output alone.