Skip to content

onnx_batch.py — Bottle-End Detection and Per-Cell Vision Inference

Overview

onnx_batch.py is the per-cell bottle-end detection worker of the Wine Platform vision pipeline.

It processes the individual crop images generated by crop_with_json.py, runs ONNX inference for bottle-end detection, applies additional rule-based validation, generates annotated debug images, and writes a structured CSV summary for later database comparison.

This script is the core machine-vision inference stage of the automated stock system.

Its job is not only to detect bottle ends, but also to:

  • classify shelf cell shape
  • clamp impossible counts by geometry
  • estimate image blur
  • flag uncertain cells for human review
  • produce a detector results table for downstream reconciliation

Position in the Vision Pipeline

flowchart TD

CROP["crop_with_json.py"]
CELLPNG["cell crop images"]
ONNX["onnx_batch.py"]
CSV["results.csv"]
DBG["annotated debug images"]
DB["DB_compare.py"]

CROP --> CELLPNG
CELLPNG --> ONNX
ONNX --> CSV
ONNX --> DBG
CSV --> DB

Purpose of the Module

Each crop image represents one physical shelf cell.

The script processes each crop independently and answers the operational question:

How many bottles are present in this cell?

To do that, it combines:

  • ONNX object detection
  • geometry-based cell-type interpretation
  • confidence filtering
  • image-quality inspection
  • optional fallback circle counting for review support

This combination is more robust than raw object detection alone.


Inputs

The script reads:

ONNX model

<cfg.paths["model_dir"]>/best_V4.onnx

Input crop images

<cfg.paths["crop_out"]>

These images are produced by the crop stage and are expected to be one image per cell.


Outputs

Main CSV output

<cfg.paths["det_out"]>/results.csv

This file is the authoritative vision output used later by DB_compare.py.

Debug images

Annotated per-cell images are written into:

<cfg.paths["det_out"]>

with the same base filenames as the input crops.

These images show: - grey rectangles for raw detections discarded by the stricter trust filter - blue rectangles for kept detections that contribute to the final ONNX count - green rectangles for kept detections whose box also contains a circle-helper match - summary overlay - review flag


Operational Design

The script is designed to run:

  • headless
  • on Raspberry Pi or Mac
  • using CPU-only ONNX Runtime
  • in batch over all crop images

It avoids network calls and focuses entirely on local inference.


Main Runtime Configuration

Inside run(cfg), the script defines several important parameters.


Model and directories

onnx_model

cfg.paths["model_dir"] / "best_V4.onnx"

Path to the trained bottle-end detector.

input_dir

cfg.paths["crop_out"]

Directory containing per-cell crop images.

out_dir

cfg.paths["det_out"]

Directory where results and debug artifacts are written.


Inference parameters

imgsz = 416

Model input size for letterboxed inference.

Role: - all input images are resized and padded to this square size before inference

conf_thres = 0.35

Initial detector confidence threshold.

Role: - weak detections below this are removed before NMS

Comment in code notes that this may be lowered in difficult lighting conditions.

iou_thres = 0.45

Non-maximum suppression overlap threshold.

Role: - suppress duplicate overlapping boxes

max_det = 20

Maximum allowed detections kept after NMS.

This is a safety upper bound.


Review heuristics

min_mean_conf = 0.45

If detections exist but average kept confidence is below this threshold, the image is flagged for review.

blur_thresh = 35.0

If blur score falls below this threshold, the image is flagged for review.

flag_uncertain_shape = True

Whether ambiguous shape classification near the decision boundary should mark the cell for review.


High-Level Processing Flow

flowchart TD

START["run(cfg)"]
LOAD["load ONNX model"]
LIST["list crop images"]
LOOP["for each crop image"]
TYPE["detect cell type"]
PRED["run ONNX detector"]
FILTER["confidence filter"]
CLAMP["clamp count by cell type"]
QUALITY["blur + confidence + shape review checks"]
CIRCLE["optional circle fallback"]
DBG["write annotated debug image"]
ROW["append CSV row"]
CSV["write results.csv"]
RET["return summary dict"]

START --> LOAD
LOAD --> LIST
LIST --> LOOP
LOOP --> TYPE
TYPE --> PRED
PRED --> FILTER
FILTER --> CLAMP
CLAMP --> QUALITY
QUALITY --> CIRCLE
CIRCLE --> DBG
DBG --> ROW
ROW --> LOOP
LOOP --> CSV
CSV --> RET

Cell Type Detection from Black Background

detect_cell_type_from_black_bg(bgr)

This helper determines whether the crop corresponds to:

  • a triangle
  • or a rhombus

This is important because the maximum physically possible bottle count differs by cell shape.


Why This Works

Because crop_with_json.py masks pixels outside the polygon to black, the visible non-black region represents the actual cell footprint.

This means the detector can infer shape from the crop silhouette.


Processing Steps

  1. convert BGR to grayscale
  2. threshold non-black pixels
  3. apply morphological close and open
  4. find largest contour
  5. compute convex hull
  6. compute rotated minimum-area rectangle
  7. compare shape fill extent

Variable Roles

mask = (gray > 10) * 255

Identifies non-black pixels as foreground.

This assumes black background is meaningful and intentional.

Morphological kernel

cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

Used to stabilize the foreground mask.

rot_extent

Computed as:

area_of_convex_hull / area_of_rotated_rect

Interpretation: - more compact fill suggests rhombus - lower fill suggests triangle

verts

Number of vertices after approximating the hull.

Used as a tie-breaker in the uncertain zone.


Threshold Logic

If rot_extent >= 0.72

Cell classified as:

rhombus

If rot_extent <= 0.66

Cell classified as:

triangle

If in between

Fallback logic: - verts == 3 → triangle - otherwise → rhombus

This makes shape classification more robust in ambiguous cases.


Count Clamping by Cell Type

clamp_count_by_type(count, cell_type)

This function imposes physical constraints.

Triangle cell

Maximum count:

3

Rhombus cell

Maximum count:

9

This is a structural prior from the shelf design.

It prevents implausible model outputs from being accepted directly.


Blur Estimation

blur_score(bgr)

Uses Laplacian variance on grayscale image.

Interpretation: - high variance = sharper image - low variance = blurrier image

This score does not directly change count, but contributes to:

needs_review

This is important because blur can cause false negatives.


Preprocessing for ONNX Inference

letterbox(img, new_shape=(320, 320))

This helper resizes and pads the input image while preserving aspect ratio.

Output: - padded image - scale ratio r - padding offsets (dw, dh)

This is standard YOLO-style preprocessing.


Variable Roles

r

Scale ratio between original image and model input size.

Used later to map detections back to original image coordinates.

dw, dh

Horizontal and vertical padding.

These must be undone after inference.


Non-Maximum Suppression

nms_xyxy(boxes_xyxy, scores, iou_thres)

This helper applies OpenCV NMS to suppress duplicate detections.

The function internally converts boxes from:

xyxy

to:

xywh

because OpenCV expects width-height format.

Output: - list of kept indices


Circle-Based Cap Detection

detect_caps(gray, minR, maxR, minDistCoeff)

This is a fallback / auxiliary heuristic based on Hough circles.

It attempts to detect bright circular bottle caps.

This function is not the primary detector.
It is mainly used when:

  • mean confidence is zero
  • or the sample already needs review

So it supports diagnostic reasoning rather than replacing the model.


Variable Roles

minR, maxR

Minimum and maximum allowed circle radii.

minDistCoeff

Controls minimum spacing between detected circles.

param1, param2

Internal Hough transform thresholds.

Brightness filter

Candidate circle centers must satisfy:

gray[y, x] > mean + 0.3 * std

This filters out circles that are not bright enough to plausibly be bottle ends.


count_bottles()

This wrapper uses detect_caps() and optionally draws debug circles.

It returns:

  • cap count
  • method string
  • debug image
  • detected radii

In the current pipeline, it is only invoked when the primary detection result is uncertain or absent.

When debug coloring is enabled, circle detections are also associated with kept ONNX boxes.

Association rule: - compute circle centers (x, y) from detect_caps() - test whether the center falls inside, or very near, a kept ONNX box - upgrade that box color from blue to green

This association is used only for debug visualization. It does not change the CSV count logic.


YOLOv8 ONNX Detector Class

YoloV8OnnxDetector

This class encapsulates model loading, preprocessing, output decoding, and prediction.


Constructor

__init__(onnx_path, imgsz=320)

Loads the ONNX model with:

providers=["CPUExecutionProvider"]

This is aligned with Raspberry Pi operation and avoids GPU dependency.

It also caches:

  • input tensor name
  • output tensor name

_prepare(bgr)

Preprocessing steps:

  1. letterbox resize/pad to square
  2. convert BGR to RGB
  3. normalize pixel range to [0, 1]
  4. transpose to CHW
  5. add batch dimension

Output tensor shape:

(1, 3, H, W)

_decode_output(out)

This helper makes the script tolerant to multiple common YOLOv8 ONNX output layouts.

Supported general forms:

  • (1, N, 5)
  • (1, 5, N)
  • (1, N, 4+nc)
  • (1, 4+nc, N)

It normalizes these into:

  • boxes_xywh
  • scores

This makes the worker more robust to export variations.


predict(...)

This is the main inference method.

Steps:

  1. preprocess image
  2. run ONNX session
  3. decode raw output
  4. filter by confidence
  5. convert xywh -> xyxy
  6. sort by score descending
  7. apply NMS
  8. limit to max_det
  9. map boxes back to original image coordinates
  10. clip to image bounds

Outputs: - boxes_xyxy - confs

These are the raw model detections before additional pipeline heuristics.


Main Runtime Loop

Listing input images

The script processes all files in crop_out with suffix:

  • .png
  • .jpg
  • .jpeg

If no images are found, it raises RuntimeError.


Per-image processing stages

For each crop image:

  1. load image
  2. determine cell type
  3. determine max physical count
  4. run ONNX detector
  5. apply secondary confidence filter
  6. compute final count
  7. evaluate review status
  8. optionally run circle helper
  9. draw debug overlay
  10. append CSV row

Secondary Detection Filtering

After the detector returns boxes and confidences, the script applies an additional threshold:

conf_min_det = 0.50

This is stricter than the initial conf_thres.

Purpose

  • keep weaker predictions during model stage
  • but trust only stronger detections for final count

This two-level approach is helpful because: - it preserves uncertainty information - it still produces stable final counts


Variable Roles

raw_count

Number of detections after detector thresholding and NMS, before strict trust filtering.

keep

Boolean mask for detections with confidence >= conf_min_det.

boxes_f, confs_f

Final trusted detections.

For debug image generation, these boxes are drawn in blue by default and may be upgraded to green if a circle-helper detection matches the same box.

filtered_count

Number of trusted detections.

count

Final count after physical clamp by cell type.

mean_conf

Average confidence of kept detections.


Review Logic

The script calculates:

needs_review

This is one of its most important output signals.

A cell is flagged if any of the following conditions apply.


1. Uncertain shape

If rot_extent lies between:

0.66 and 0.72

and shape uncertainty flagging is enabled, the sample is marked for review.


2. Raw count exceeds physical max

If the detector produced more detections than physically plausible before clamping, this is suspicious.


3. Low mean confidence

If detections exist but mean trusted confidence is below:

min_mean_conf = 0.45

the result is considered uncertain.


4. Low blur score

If image sharpness is below:

blur_thresh = 35.0

the result is flagged because false negatives become more likely.


5. Weak or absent detections

If detections existed but all were discarded by the trust filter, or if uncertainty is otherwise high, the circle-based fallback is run for additional evidence.


Debug Image Generation

For each input crop, the script creates an annotated debug image.


Grey boxes

Drawn for discarded detections.

Meaning: - model proposed them - they are included in raw_count - but they were rejected by the stricter trust threshold and do not contribute to count


Blue boxes

Drawn for kept detections by default.

Meaning: - they survived the stricter trust filter - they contribute to final ONNX count - no circle-helper match was found for that specific box


Green boxes

Drawn for kept detections that also match the circle helper.

Meaning: - they survived the stricter trust filter - they contribute to final ONNX count - a detected circle center falls inside, or very near, the same box


Text overlay

Shows:

  • cell type
  • rot_extent
  • raw count
  • final count
  • mean confidence
  • blur score

If review is needed, the image also contains:

NEEDS_REVIEW

in red.

These debug images are extremely valuable for validation and tuning.


CSV Output

The script writes results.csv with rows containing:

  • file
  • bin_ID
  • cell_type
  • rot_extent
  • raw_count
  • count
  • mean_conf
  • blur
  • needs_review
  • cirle_count

CSV Output Field Explanation

The results.csv file produced by onnx_batch.py contains several diagnostic fields that describe how the final bottle count decision was reached.

These fields allow the system (and operators) to evaluate geometry, detector output, confidence, and image quality simultaneously.

The most important diagnostic fields are:

  • rot_extent
  • raw_count
  • mean_conf
  • blur

Together they determine whether a result is trustworthy or requires human review.


rot_extent — Shape Confidence (Triangle vs Rhombus)

What it is

rot_extent measures how much of the rotated bounding rectangle of the cell is actually filled by the visible non-black cell area.

Technically:

rot_extent = (area of cell shape) / (area of rotated bounding box)

This value is computed during cell-type detection using the convex hull of the cell silhouette.


Why the Rotated Bounding Box Matters

Your shelf cells are rotated diamonds (rhombuses).
If a normal axis-aligned bounding box were used, the bounding area would be much larger than the real cell footprint.

Using the minimum rotated rectangle ensures the ratio reflects the true shape occupancy.


Typical Values in the System

Shape rot_extent range Meaning
Triangle ~0.45 – 0.65 Sparse triangular footprint
Rhombus ~0.75 – 0.95 Dense diamond-shaped footprint
Borderline ~0.66 – 0.72 Ambiguous geometry

Values inside the borderline zone are flagged for review.


Why This Field Is Important

rot_extent is used to:

  • determine cell geometry
  • prevent wrong bottle-count limits
  • flag uncertain shapes for review

Example:

triangle → max bottles = 3
rhombus → max bottles = 9

Without this geometry detection the detector could produce physically impossible counts.

This field is the reason triangle/rhombus misclassification errors were eliminated.


raw_count — What the Detector Actually Saw

What it is

raw_count is the number of bottle ends detected before any correction or clamping.

Example:

raw_count = 6
cell_type = triangle
final_count = min(6,3) = 3

Why It Matters

This value represents the pure ML model output.

It allows the system to differentiate between:

Situation Meaning
raw_count > final count detections were clamped by geometry
raw_count == 0 but bottles exist likely false negative
raw_count very high detector confusion

Operational Use

raw_count is used to detect suspicious conditions:

  • excessive detections
  • detector confusion
  • geometry conflicts

It is the key variable when comparing ML perception vs physical shelf constraints.


mean_conf — Average Detector Confidence

What it is

mean_conf is the average confidence score of all trusted bottle-end detections in the cell.

Example:

detections = [0.81, 0.76, 0.72]
mean_conf = 0.763

Interpretation

mean_conf Meaning
> 0.75 Very reliable detection
0.55 – 0.75 Normal reliability
0.35 – 0.55 Suspicious
< 0.35 Likely noise

Why It Matters

Low confidence typically indicates difficult visual conditions:

  • bottle reflections
  • glare
  • partial occlusion
  • weak lighting
  • label interference

Operational Use

mean_conf is used to:

  • flag potential false positives
  • identify uncertain detections
  • support review decisions

This field represents ML certainty.


blur — Image Quality / Focus Indicator

What it is

blur measures image sharpness using Laplacian variance.

Interpretation:

High variance → sharp image
Low variance → blurry image

Blur directly affects detection reliability.


Typical Values for the Current Camera Setup

blur value Meaning
> 80 Very sharp
40 – 80 Acceptable
25 – 40 Risky
< 25 Likely missed bottles

Why Blur Is Critical

False negatives (missed bottles) often correlate with:

  • motion blur
  • lighting instability
  • compression artifacts
  • camera exposure changes

Several previously observed detection errors were caused by this exact situation.


Operational Use

blur allows the system to distinguish between:

Situation Interpretation
low blur + low count likely image-quality issue
high blur + low confidence unreliable detection
sharp image + normal detection trustworthy result

This metric represents image reliability, not ML reliability.


How These Fields Work Together

The power of the system comes from combining all four diagnostics.

Field Role
rot_extent Geometry truth
raw_count Model perception
mean_conf Model certainty
blur Image reliability

Together they answer the key operational question:

Is this bottle count trustworthy,
or should a human review this bin?

This multi-layer evaluation explains why the system now achieves very high accuracy in final counts: the ML detector is supported by geometry constraints, confidence analysis, and image-quality diagnostics rather than acting alone.

Important Note on bin_ID

The script derives bin_ID from the filename stem:

stem[:5]

Given filenames like:

01_01.png

this produces:

01_01

So stable crop naming from crop_with_json.py is essential.


Return Summary

At the end, run(cfg) returns:

  • step name
  • number of processed cells
  • CSV path
  • debug directory
  • number of review-flagged cells

This summary is later captured by stock_runtime.py.


Manual Debug Entry Point

main()

Standalone mode:

  • configure logging
  • load config
  • run batch detection
  • print summary

Useful for: - detector tuning - model replacement - crop validation - review analysis


Relationship with Upstream Crop Stage

This script assumes:

  • crops already exist
  • polygon masking has been applied
  • filenames encode stable bin IDs
  • black background outside polygon is meaningful

In particular, the shape-classification stage depends on that black background behavior.

So onnx_batch.py and crop_with_json.py are tightly coupled by design.


Relationship with Downstream DB Comparison

The main operational output is:

results.csv

Later, DB_compare.py reads this file and compares count grouped by bin_ID against database quantities.

This means onnx_batch.py is the authoritative vision-count generator.


Failure Modes

Typical failures include:

Failure Meaning
missing ONNX model model deployment issue
empty crop directory upstream crop stage failed
unreadable crop image storage/path issue
unsupported ONNX output shape model export mismatch
write failure for debug/CSV filesystem problem
low-quality images causing review spikes illumination or focus problem

The script raises on critical pipeline failures, but preserves per-image tolerance where possible.


Summary

onnx_batch.py is the per-cell inference engine of the Wine Platform vision pipeline.

Its responsibilities include:

  • loading the ONNX detector
  • running bottle-end detection over all crop images
  • classifying cell shape from masked geometry
  • constraining counts by physical shelf rules
  • flagging uncertain cases
  • writing annotated debug images
  • exporting structured results for stock reconciliation

It combines machine learning with shelf-specific heuristics, which makes the final stock estimation more robust and more auditable than raw detector output alone.