onnx_batch.py — Bottle-End Detection and Per-Cell Vision Inference¶

Overview¶

onnx_batch.py is the per-cell bottle-end detection worker of the Wine Platform vision pipeline.

It processes the individual crop images generated by crop_with_json.py, runs ONNX inference for bottle-end detection, applies additional rule-based validation, generates annotated debug images, and writes a structured CSV summary for later database comparison.

This script is the core machine-vision inference stage of the automated stock system.

Its job is not only to detect bottle ends, but also to:

classify shelf cell shape
clamp impossible counts by geometry
estimate image blur
flag uncertain cells for human review
produce a detector results table for downstream reconciliation

Position in the Vision Pipeline¶

flowchart TD

CROP["crop_with_json.py"]
CELLPNG["cell crop images"]
ONNX["onnx_batch.py"]
CSV["results.csv"]
DBG["annotated debug images"]
DB["DB_compare.py"]

CROP --> CELLPNG
CELLPNG --> ONNX
ONNX --> CSV
ONNX --> DBG
CSV --> DB

Purpose of the Module¶

Each crop image represents one physical shelf cell.

The script processes each crop independently and answers the operational question:

How many bottles are present in this cell?

To do that, it combines:

ONNX object detection
geometry-based cell-type interpretation
confidence filtering
image-quality inspection
optional fallback circle counting for review support

This combination is more robust than raw object detection alone.

Inputs¶

The script reads:

ONNX model¶

<cfg.paths["model_dir"]>/best_V4.onnx

Input crop images¶

<cfg.paths["crop_out"]>

These images are produced by the crop stage and are expected to be one image per cell.

Outputs¶

Main CSV output¶

<cfg.paths["det_out"]>/results.csv

This file is the authoritative vision output used later by DB_compare.py.

Debug images¶

Annotated per-cell images are written into:

<cfg.paths["det_out"]>

with the same base filenames as the input crops.

These images show: - grey rectangles for raw detections discarded by the stricter trust filter - blue rectangles for kept detections that contribute to the final ONNX count - green rectangles for kept detections whose box also contains a circle-helper match - summary overlay - review flag

Operational Design¶

The script is designed to run:

headless
on Raspberry Pi or Mac
using CPU-only ONNX Runtime
in batch over all crop images

It avoids network calls and focuses entirely on local inference.

Main Runtime Configuration¶

Inside run(cfg), the script defines several important parameters.

Model and directories¶

`onnx_model`¶

cfg.paths["model_dir"] / "best_V4.onnx"

Path to the trained bottle-end detector.

`input_dir`¶

cfg.paths["crop_out"]

Directory containing per-cell crop images.

`out_dir`¶

cfg.paths["det_out"]

Directory where results and debug artifacts are written.

Inference parameters¶

`imgsz = 416`¶

Model input size for letterboxed inference.

Role: - all input images are resized and padded to this square size before inference

`conf_thres = 0.35`¶

Initial detector confidence threshold.

Role: - weak detections below this are removed before NMS

Comment in code notes that this may be lowered in difficult lighting conditions.

`iou_thres = 0.45`¶

Non-maximum suppression overlap threshold.

Role: - suppress duplicate overlapping boxes

`max_det = 20`¶

Maximum allowed detections kept after NMS.

This is a safety upper bound.

Review heuristics¶

`min_mean_conf = 0.45`¶

If detections exist but average kept confidence is below this threshold, the image is flagged for review.

`blur_thresh = 35.0`¶

If blur score falls below this threshold, the image is flagged for review.

`flag_uncertain_shape = True`¶

Whether ambiguous shape classification near the decision boundary should mark the cell for review.

High-Level Processing Flow¶

flowchart TD

START["run(cfg)"]
LOAD["load ONNX model"]
LIST["list crop images"]
LOOP["for each crop image"]
TYPE["detect cell type"]
PRED["run ONNX detector"]
FILTER["confidence filter"]
CLAMP["clamp count by cell type"]
QUALITY["blur + confidence + shape review checks"]
CIRCLE["optional circle fallback"]
DBG["write annotated debug image"]
ROW["append CSV row"]
CSV["write results.csv"]
RET["return summary dict"]

START --> LOAD
LOAD --> LIST
LIST --> LOOP
LOOP --> TYPE
TYPE --> PRED
PRED --> FILTER
FILTER --> CLAMP
CLAMP --> QUALITY
QUALITY --> CIRCLE
CIRCLE --> DBG
DBG --> ROW
ROW --> LOOP
LOOP --> CSV
CSV --> RET

Cell Type Detection from Black Background¶

`detect_cell_type_from_black_bg(bgr)`¶

This helper determines whether the crop corresponds to:

a triangle
or a rhombus

This is important because the maximum physically possible bottle count differs by cell shape.

Why This Works¶

Because crop_with_json.py masks pixels outside the polygon to black, the visible non-black region represents the actual cell footprint.

This means the detector can infer shape from the crop silhouette.

Processing Steps¶

convert BGR to grayscale
threshold non-black pixels
apply morphological close and open
find largest contour
compute convex hull
compute rotated minimum-area rectangle
compare shape fill extent

Variable Roles¶

`mask = (gray > 10) * 255`¶

Identifies non-black pixels as foreground.

This assumes black background is meaningful and intentional.

Morphological kernel¶

cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

Used to stabilize the foreground mask.

`rot_extent`¶

Computed as:

area_of_convex_hull / area_of_rotated_rect

Interpretation: - more compact fill suggests rhombus - lower fill suggests triangle

`verts`¶

Number of vertices after approximating the hull.

Used as a tie-breaker in the uncertain zone.

Threshold Logic¶

If `rot_extent >= 0.72`¶

Cell classified as:

rhombus

If `rot_extent <= 0.66`¶

Cell classified as:

triangle

If in between¶

Fallback logic: - verts == 3 → triangle - otherwise → rhombus

This makes shape classification more robust in ambiguous cases.

Count Clamping by Cell Type¶

`clamp_count_by_type(count, cell_type)`¶

This function imposes physical constraints.

Triangle cell¶

Maximum count:

Rhombus cell¶

Maximum count:

This is a structural prior from the shelf design.

It prevents implausible model outputs from being accepted directly.

Blur Estimation¶

`blur_score(bgr)`¶

Uses Laplacian variance on grayscale image.

Interpretation: - high variance = sharper image - low variance = blurrier image

This score does not directly change count, but contributes to:

needs_review

This is important because blur can cause false negatives.

Preprocessing for ONNX Inference¶

`letterbox(img, new_shape=(320, 320))`¶

This helper resizes and pads the input image while preserving aspect ratio.

Output: - padded image - scale ratio r - padding offsets (dw, dh)

This is standard YOLO-style preprocessing.

Variable Roles¶

`r`¶

Scale ratio between original image and model input size.

Used later to map detections back to original image coordinates.

`dw`, `dh`¶

Horizontal and vertical padding.

These must be undone after inference.

Non-Maximum Suppression¶

`nms_xyxy(boxes_xyxy, scores, iou_thres)`¶

This helper applies OpenCV NMS to suppress duplicate detections.

The function internally converts boxes from:

xyxy

to:

xywh

because OpenCV expects width-height format.

Output: - list of kept indices

Circle-Based Cap Detection¶

`detect_caps(gray, minR, maxR, minDistCoeff)`¶

This is a fallback / auxiliary heuristic based on Hough circles.

It attempts to detect bright circular bottle caps.

This function is not the primary detector.
It is mainly used when:

mean confidence is zero
or the sample already needs review

So it supports diagnostic reasoning rather than replacing the model.

Variable Roles¶

`minR`, `maxR`¶

Minimum and maximum allowed circle radii.

`minDistCoeff`¶

Controls minimum spacing between detected circles.

`param1`, `param2`¶

Internal Hough transform thresholds.

Brightness filter¶

Candidate circle centers must satisfy:

gray[y, x] > mean + 0.3 * std

This filters out circles that are not bright enough to plausibly be bottle ends.

`count_bottles()`¶

This wrapper uses detect_caps() and optionally draws debug circles.

It returns:

cap count
method string
debug image
detected radii

In the current pipeline, it is only invoked when the primary detection result is uncertain or absent.

When debug coloring is enabled, circle detections are also associated with kept ONNX boxes.

Association rule: - compute circle centers (x, y) from detect_caps() - test whether the center falls inside, or very near, a kept ONNX box - upgrade that box color from blue to green

This association is used only for debug visualization. It does not change the CSV count logic.

YOLOv8 ONNX Detector Class¶

`YoloV8OnnxDetector`¶

This class encapsulates model loading, preprocessing, output decoding, and prediction.

Constructor¶

`init(onnx_path, imgsz=320)`¶

Loads the ONNX model with:

providers=["CPUExecutionProvider"]

This is aligned with Raspberry Pi operation and avoids GPU dependency.

It also caches:

input tensor name
output tensor name

`_prepare(bgr)`¶

Preprocessing steps:

letterbox resize/pad to square
convert BGR to RGB
normalize pixel range to [0, 1]
transpose to CHW
add batch dimension

Output tensor shape:

(1, 3, H, W)

`_decode_output(out)`¶

This helper makes the script tolerant to multiple common YOLOv8 ONNX output layouts.

Supported general forms:

(1, N, 5)
(1, 5, N)
(1, N, 4+nc)
(1, 4+nc, N)

It normalizes these into:

boxes_xywh
scores

This makes the worker more robust to export variations.

`predict(...)`¶

This is the main inference method.

Steps:

preprocess image
run ONNX session
decode raw output
filter by confidence
convert xywh -> xyxy
sort by score descending
apply NMS
limit to max_det
map boxes back to original image coordinates
clip to image bounds

Outputs: - boxes_xyxy - confs

These are the raw model detections before additional pipeline heuristics.

Main Runtime Loop¶

Listing input images¶

The script processes all files in crop_out with suffix:

.png
.jpg
.jpeg

If no images are found, it raises RuntimeError.

Per-image processing stages¶

For each crop image:

load image
determine cell type
determine max physical count
run ONNX detector
apply secondary confidence filter
compute final count
evaluate review status
optionally run circle helper
draw debug overlay
append CSV row

Secondary Detection Filtering¶

After the detector returns boxes and confidences, the script applies an additional threshold:

conf_min_det = 0.50

This is stricter than the initial conf_thres.

Purpose¶

keep weaker predictions during model stage
but trust only stronger detections for final count

This two-level approach is helpful because: - it preserves uncertainty information - it still produces stable final counts

Variable Roles¶

`raw_count`¶

Number of detections after detector thresholding and NMS, before strict trust filtering.

`keep`¶

Boolean mask for detections with confidence >= conf_min_det.

`boxes_f`, `confs_f`¶

Final trusted detections.

For debug image generation, these boxes are drawn in blue by default and may be upgraded to green if a circle-helper detection matches the same box.

`filtered_count`¶

Number of trusted detections.

`count`¶

Final count after physical clamp by cell type.

`mean_conf`¶

Average confidence of kept detections.

Review Logic¶

The script calculates:

needs_review

This is one of its most important output signals.

A cell is flagged if any of the following conditions apply.

1. Uncertain shape¶

If rot_extent lies between:

0.66 and 0.72

and shape uncertainty flagging is enabled, the sample is marked for review.

2. Raw count exceeds physical max¶

If the detector produced more detections than physically plausible before clamping, this is suspicious.

3. Low mean confidence¶

If detections exist but mean trusted confidence is below:

min_mean_conf = 0.45

the result is considered uncertain.

4. Low blur score¶

If image sharpness is below:

blur_thresh = 35.0

the result is flagged because false negatives become more likely.

5. Weak or absent detections¶

If detections existed but all were discarded by the trust filter, or if uncertainty is otherwise high, the circle-based fallback is run for additional evidence.

Debug Image Generation¶

For each input crop, the script creates an annotated debug image.

Grey boxes¶

Drawn for discarded detections.

Meaning: - model proposed them - they are included in raw_count - but they were rejected by the stricter trust threshold and do not contribute to count

Blue boxes¶

Drawn for kept detections by default.

Meaning: - they survived the stricter trust filter - they contribute to final ONNX count - no circle-helper match was found for that specific box

Green boxes¶

Drawn for kept detections that also match the circle helper.

Meaning: - they survived the stricter trust filter - they contribute to final ONNX count - a detected circle center falls inside, or very near, the same box

Text overlay¶

Shows:

cell type
rot_extent
raw count
final count
mean confidence
blur score

If review is needed, the image also contains:

NEEDS_REVIEW

in red.

These debug images are extremely valuable for validation and tuning.

CSV Output¶

The script writes results.csv with rows containing:

file
bin_ID
cell_type
rot_extent
raw_count
count
mean_conf
blur
needs_review
cirle_count

CSV Output Field Explanation¶

The results.csv file produced by onnx_batch.py contains several diagnostic fields that describe how the final bottle count decision was reached.

These fields allow the system (and operators) to evaluate geometry, detector output, confidence, and image quality simultaneously.

The most important diagnostic fields are:

rot_extent
raw_count
mean_conf
blur

Together they determine whether a result is trustworthy or requires human review.

`rot_extent` — Shape Confidence (Triangle vs Rhombus)¶

What it is¶

rot_extent measures how much of the rotated bounding rectangle of the cell is actually filled by the visible non-black cell area.

Technically:

rot_extent = (area of cell shape) / (area of rotated bounding box)

This value is computed during cell-type detection using the convex hull of the cell silhouette.

Why the Rotated Bounding Box Matters¶

Your shelf cells are rotated diamonds (rhombuses).
If a normal axis-aligned bounding box were used, the bounding area would be much larger than the real cell footprint.

Using the minimum rotated rectangle ensures the ratio reflects the true shape occupancy.

Typical Values in the System¶

Shape	rot_extent range	Meaning
Triangle	~0.45 – 0.65	Sparse triangular footprint
Rhombus	~0.75 – 0.95	Dense diamond-shaped footprint
Borderline	~0.66 – 0.72	Ambiguous geometry

Values inside the borderline zone are flagged for review.

Why This Field Is Important¶

rot_extent is used to:

determine cell geometry
prevent wrong bottle-count limits
flag uncertain shapes for review

Example:

triangle → max bottles = 3
rhombus → max bottles = 9

Without this geometry detection the detector could produce physically impossible counts.

This field is the reason triangle/rhombus misclassification errors were eliminated.

`raw_count` — What the Detector Actually Saw¶

What it is¶

raw_count is the number of bottle ends detected before any correction or clamping.

Example:

raw_count = 6
cell_type = triangle
final_count = min(6,3) = 3

Why It Matters¶

This value represents the pure ML model output.

It allows the system to differentiate between:

Situation	Meaning
raw_count > final count	detections were clamped by geometry
raw_count == 0 but bottles exist	likely false negative
raw_count very high	detector confusion

Operational Use¶

raw_count is used to detect suspicious conditions:

excessive detections
detector confusion
geometry conflicts

It is the key variable when comparing ML perception vs physical shelf constraints.

`mean_conf` — Average Detector Confidence¶

What it is¶

mean_conf is the average confidence score of all trusted bottle-end detections in the cell.

Example:

detections = [0.81, 0.76, 0.72]
mean_conf = 0.763

Interpretation¶

mean_conf	Meaning
> 0.75	Very reliable detection
0.55 – 0.75	Normal reliability
0.35 – 0.55	Suspicious
< 0.35	Likely noise

Why It Matters¶

Low confidence typically indicates difficult visual conditions:

bottle reflections
glare
partial occlusion
weak lighting
label interference

Operational Use¶

mean_conf is used to:

flag potential false positives
identify uncertain detections
support review decisions

This field represents ML certainty.

`blur` — Image Quality / Focus Indicator¶

What it is¶

blur measures image sharpness using Laplacian variance.

Interpretation:

High variance → sharp image
Low variance → blurry image

Blur directly affects detection reliability.

Typical Values for the Current Camera Setup¶

blur value	Meaning
> 80	Very sharp
40 – 80	Acceptable
25 – 40	Risky
< 25	Likely missed bottles

Why Blur Is Critical¶

False negatives (missed bottles) often correlate with:

motion blur
lighting instability
compression artifacts
camera exposure changes

Several previously observed detection errors were caused by this exact situation.

Operational Use¶

blur allows the system to distinguish between:

Situation	Interpretation
low blur + low count	likely image-quality issue
high blur + low confidence	unreliable detection
sharp image + normal detection	trustworthy result

This metric represents image reliability, not ML reliability.

How These Fields Work Together¶

The power of the system comes from combining all four diagnostics.

Field	Role
rot_extent	Geometry truth
raw_count	Model perception
mean_conf	Model certainty
blur	Image reliability

Together they answer the key operational question:

Is this bottle count trustworthy,
or should a human review this bin?

This multi-layer evaluation explains why the system now achieves very high accuracy in final counts: the ML detector is supported by geometry constraints, confidence analysis, and image-quality diagnostics rather than acting alone.

Important Note on `bin_ID`¶

The script derives bin_ID from the filename stem:

stem[:5]

Given filenames like:

01_01.png

this produces:

01_01

So stable crop naming from crop_with_json.py is essential.

Return Summary¶

At the end, run(cfg) returns:

step name
number of processed cells
CSV path
debug directory
number of review-flagged cells

This summary is later captured by stock_runtime.py.

Manual Debug Entry Point¶

`main()`¶

Standalone mode:

configure logging
load config
run batch detection
print summary

Useful for: - detector tuning - model replacement - crop validation - review analysis

Relationship with Upstream Crop Stage¶

This script assumes:

crops already exist
polygon masking has been applied
filenames encode stable bin IDs
black background outside polygon is meaningful

In particular, the shape-classification stage depends on that black background behavior.

So onnx_batch.py and crop_with_json.py are tightly coupled by design.

Relationship with Downstream DB Comparison¶

The main operational output is:

results.csv

Later, DB_compare.py reads this file and compares count grouped by bin_ID against database quantities.

This means onnx_batch.py is the authoritative vision-count generator.

Failure Modes¶

Typical failures include:

Failure	Meaning
missing ONNX model	model deployment issue
empty crop directory	upstream crop stage failed
unreadable crop image	storage/path issue
unsupported ONNX output shape	model export mismatch
write failure for debug/CSV	filesystem problem
low-quality images causing review spikes	illumination or focus problem

The script raises on critical pipeline failures, but preserves per-image tolerance where possible.

Summary¶

onnx_batch.py is the per-cell inference engine of the Wine Platform vision pipeline.

Its responsibilities include:

loading the ONNX detector
running bottle-end detection over all crop images
classifying cell shape from masked geometry
constraining counts by physical shelf rules
flagging uncertain cases
writing annotated debug images
exporting structured results for stock reconciliation

It combines machine learning with shelf-specific heuristics, which makes the final stock estimation more robust and more auditable than raw detector output alone.

onnx_batch.py — Bottle-End Detection and Per-Cell Vision Inference¶

Overview¶

Position in the Vision Pipeline¶

Purpose of the Module¶

Inputs¶

ONNX model¶

Input crop images¶

Outputs¶

Main CSV output¶

Debug images¶

Operational Design¶

Main Runtime Configuration¶

Model and directories¶

onnx_model¶

input_dir¶

out_dir¶

Inference parameters¶

imgsz = 416¶

conf_thres = 0.35¶

iou_thres = 0.45¶

max_det = 20¶

Review heuristics¶

min_mean_conf = 0.45¶

blur_thresh = 35.0¶

flag_uncertain_shape = True¶

High-Level Processing Flow¶

Cell Type Detection from Black Background¶

detect_cell_type_from_black_bg(bgr)¶

Why This Works¶

Processing Steps¶

Variable Roles¶

mask = (gray > 10) * 255¶

Morphological kernel¶

rot_extent¶

verts¶

Threshold Logic¶

If rot_extent >= 0.72¶

If rot_extent <= 0.66¶

If in between¶

Count Clamping by Cell Type¶

clamp_count_by_type(count, cell_type)¶

Triangle cell¶

Rhombus cell¶

Blur Estimation¶

blur_score(bgr)¶

Preprocessing for ONNX Inference¶

letterbox(img, new_shape=(320, 320))¶

Variable Roles¶

r¶

dw, dh¶

Non-Maximum Suppression¶

nms_xyxy(boxes_xyxy, scores, iou_thres)¶

Circle-Based Cap Detection¶

detect_caps(gray, minR, maxR, minDistCoeff)¶

Variable Roles¶

minR, maxR¶

minDistCoeff¶

param1, param2¶

Brightness filter¶

count_bottles()¶

YOLOv8 ONNX Detector Class¶

YoloV8OnnxDetector¶

Constructor¶

__init__(onnx_path, imgsz=320)¶

_prepare(bgr)¶

_decode_output(out)¶

predict(...)¶

Main Runtime Loop¶

Listing input images¶

Per-image processing stages¶

Secondary Detection Filtering¶

Purpose¶

Variable Roles¶

raw_count¶

keep¶

boxes_f, confs_f¶

filtered_count¶

count¶

mean_conf¶

Review Logic¶

`onnx_model`¶

`input_dir`¶

`out_dir`¶

`imgsz = 416`¶

`conf_thres = 0.35`¶

`iou_thres = 0.45`¶

`max_det = 20`¶

`min_mean_conf = 0.45`¶

`blur_thresh = 35.0`¶

`flag_uncertain_shape = True`¶

`detect_cell_type_from_black_bg(bgr)`¶

`mask = (gray > 10) * 255`¶

`rot_extent`¶

`verts`¶

If `rot_extent >= 0.72`¶

If `rot_extent <= 0.66`¶

`clamp_count_by_type(count, cell_type)`¶

`blur_score(bgr)`¶

`letterbox(img, new_shape=(320, 320))`¶

`r`¶

`dw`, `dh`¶

`nms_xyxy(boxes_xyxy, scores, iou_thres)`¶

`detect_caps(gray, minR, maxR, minDistCoeff)`¶

`minR`, `maxR`¶

`minDistCoeff`¶

`param1`, `param2`¶

`count_bottles()`¶

`YoloV8OnnxDetector`¶

`init(onnx_path, imgsz=320)`¶

`_prepare(bgr)`¶

`_decode_output(out)`¶

`predict(...)`¶

`raw_count`¶

`keep`¶

`boxes_f`, `confs_f`¶

`filtered_count`¶

`count`¶

`mean_conf`¶

`rot_extent` — Shape Confidence (Triangle vs Rhombus)¶

`raw_count` — What the Detector Actually Saw¶

`mean_conf` — Average Detector Confidence¶

`blur` — Image Quality / Focus Indicator¶

Important Note on `bin_ID`¶

`main()`¶