Skip to content

DB_compare.py — Vision vs Database Stock Reconciliation

Overview

DB_compare.py is the final comparison stage of the Wine Platform automated vision lane.

Its task is to compare:

  • the vision-derived bottle counts produced by onnx_batch.py
  • against the current stock quantities stored in the backend database

It performs this comparison bin by bin and writes outputs containing only the mismatches.

This script is the final analytical step of the scheduled detection pipeline.

Its role is not to update the DB automatically.
Instead, it produces structured discrepancy outputs for review or later action.


Position in the Vision Pipeline

flowchart TD

ONNX["onnx_batch.py"]
CSV["results.csv"]
API["GET /bins/summary"]
CMP["DB_compare.py"]
DIFFCSV["diff_only.csv"]
DIFFJSON["diff_only.json"]
POST["POST /vision/proposals"]
VALIHA["Home Assistant validation"]

ONNX --> CSV
API --> CMP
CSV --> CMP
CMP --> DIFFCSV
CMP --> DIFFJSON
CMP --> POST
POST --> VALIHA

Main Purpose

The goal of this script is operationally simple:

Find all bins where vision count != DB quantity

This is the reconciliation point between:

  • physical reality as seen by the camera
  • logical stock state as stored in MariaDB via the API layer

Only bins with differences are written to output.

This keeps the result focused and practical.


Inputs

The script consumes two independent data sources.


1. Vision results CSV

Path:

<cfg.paths["det_out"]>/results.csv

Produced by:

onnx_batch.py

Expected important fields:

  • bin_ID
  • count

2. Backend bins summary API

Retrieved via API.

Endpoint:

GET /bins/summary

Example response:

[ { "st_bin": "03_03", "qty": 5 }, { "st_bin": "03_04", "qty": 6 } ]


Outputs

Written into:

<cfg.paths["det_out"]>

Files:

diff_only.csv

Flat CSV list of mismatching bins.

diff_only.json

Structured JSON payload with:

  • total diff count
  • list of mismatch rows

Proposal Creation

Instead of modifying the inventory automatically, the script creates a proposal.

Endpoint:

POST /vision/proposals

Example payload:

{ "id": 999, "bin_code": "03_03", "quantity_difference": -1, "snapshot_path": "/local/wine/03_03.png", "status": "pending", "created_at": "2026-03-12T10:00:00", "bin_details": { "items":[ {"description": "Chablis", "year": 2022, "qty": 3}, {"description": "Sancerre", "year": 2021, "qty": 2} ]} }

Snapshot Path

The snapshot image is generated earlier in the pipeline.

Example:

/local/wine/03_03.png

This path is compatible with Home Assistant image rendering.


Architectural Role

This script is the final read-only comparison stage.

It does not:

  • modify DB
  • trigger automatic stock update
  • attempt conflict resolution

Instead, it provides a reliable discrepancy report.

This is a safe architectural choice because: - vision can be uncertain - DB can be outdated - reconciliation should remain observable and reviewable


Configuration and Constants

API_TIMEOUT = 10.0

Timeout used for the backend HTTP request.

Purpose: - prevent pipeline hang if the backend is unreachable - fail clearly in bounded time


High-Level Execution Flow

flowchart TD

START["run(cfg)"]
CSV["load vision results.csv"]
API["fetch /bins/summary"]
AGG["aggregate vision counts by bin"]
MAP["map DB qty by st_bin"]
CMP["compare all bins"]
WRITE["write diff_only.csv + diff_only.json"]
POST["prepare and initiate router call"]
RET["return summary dict"]

START --> CSV
CSV --> AGG
AGG --> API
API --> MAP
MAP --> CMP
CMP --> WRITE
CMP --> POST
POST --> WRITE
WRITE --> RET

Loading Vision Counts

_load_vision_counts(csv_path)

This helper reads the vision CSV and returns two things:

  1. aggregated counts by bin_ID
  2. raw rows list

The raw rows are currently not used downstream inside this script, but returning them keeps the function flexible.


CSV Validation

The function checks:

  • file exists
  • CSV has header
  • bin_ID column exists
  • count column exists

If any of these are missing, it raises an error.

This is important because downstream comparison must not proceed on malformed detector output.


Count Aggregation Logic

Counts are aggregated by bin_ID using:

defaultdict(int)

Each row contributes:

counts_by_bin[bin_id] += c

This means if multiple rows exist for the same bin, their counts are summed.

That makes the comparison tolerant to multi-row detector outputs.


Numeric Conversion

count values are converted via:

int(float(row.get("count") or 0))

This tolerates: - integer strings - float-like strings - blanks

If conversion fails, count is treated as 0.

This design is practical for pipeline robustness.


Fetching DB State

_fetch_bins_summary(api_base, timeout_s=10.0)

This helper performs:

GET {api_base}/bins/summary

using the requests library.


Validation Rules

The response must be:

  • successful HTTP status
  • JSON-decoded
  • top-level list

If the payload is not a list, the function raises ValueError.

This protects against router regressions or API contract mismatches.


Comparison Logic

_compare(vision_counts, db_rows)

This is the core reconciliation function.

It converts the DB payload into a simple mapping:

st_bin -> qty

Then compares that mapping against:

bin_ID -> vision_count

DB Field Contract

For each DB row:

st_bin

The unique bin identifier.

qty

The database stock quantity.

The script explicitly requires qty to exist.
If a row lacks it, comparison fails.

This strictness is intentional because field-name ambiguity would make reconciliation unreliable.


Union of Bins

The function compares over:

all_bins = set(db_qty.keys()) | set(vision_counts.keys())

This is important.

It means the script detects differences even when:

  • a bin exists in DB but not in vision
  • a bin exists in vision but not in DB
  • a bin exists in both but counts differ

So it is not limited to only overlapping keys.


Mismatch Record Format

For each mismatching bin, the script appends:

{
  "st_bin": "...",
  "db_qty": ...,
  "vision_count": ...,
  "diff": ...
}

Where:

diff = vision_count - db_qty

Interpretation: - positive diff → vision sees more bottles than DB says - negative diff → vision sees fewer bottles than DB says


Writing Outputs

_write_output(diffs, out_path_csv, out_path_json)

This helper writes both JSON and CSV discrepancy outputs.


JSON Output

Payload format:

{
  "diff_count": N,
  "diffs": [...]
}

This is useful for: - machine reading - API forwarding - later automation - audit archiving


CSV Output

Field order:

  • st_bin
  • db_qty
  • vision_count
  • diff

This produces a compact table suitable for quick inspection or spreadsheet import.


Public Pipeline Entry Point

run(cfg)

This is the callable orchestration entry point used by stock_runtime.py.

It performs the full compare workflow:

  1. resolve API base and detection output folder
  2. load vision counts from CSV
  3. fetch DB stock state from API
  4. compare results
  5. build proposal payload
  6. send POST request to backend
  7. backend stores proposal
  8. write diff-only outputs
  9. return structured summary
  10. Home Assistant receives notification

Path Roles Inside run(cfg)

api_base

str(cfg.api_base)

Backend base URL used to contact the winecellar API.

det_out

Path(cfg.paths["det_out"])

Directory containing: - results.csv - output discrepancy files

vision_csv

det_out / results.csv

out_csv

det_out / diff_only.csv

out_json

det_out / diff_only.json

Path compatible with Home Assistant image rendering

This should be moved to settings.py

/local/wine/03_03.png

Returned Summary Structure

run(cfg) returns a dictionary containing:

  • step name
  • API base
  • vision CSV path
  • diff count
  • diff CSV path
  • diff JSON path

This makes the reconciliation stage easy to summarize inside pipeline runtime logs.


Manual Entry Point

main()

Standalone mode:

  • configure logging
  • load config
  • run comparison
  • print result summary

Useful for: - backend API testing - detector result validation - debugging mismatches without running full pipeline


Relationship with Upstream Detector Output

The comparison depends directly on:

results.csv

generated by onnx_batch.py.

Important assumptions:

  • bin_ID identifies the shelf bin
  • count is the final detector-derived stock count
  • the CSV file exists and is well-formed

So DB_compare.py does not perform visual inference itself.
It trusts the detector stage as its input truth on the vision side.


Relationship with Backend API

The script depends on the backend route:

/bins/summary

This means it compares against the API’s logical business view of stock, not directly against raw database tables.

That is architecturally sound because: - the API defines the application contract - internal DB schema can evolve - workers remain decoupled from direct DB access


Failure Modes

Typical failures include:

Failure Meaning
missing results.csv detector stage failed or not run
malformed CSV header detector output contract broken
backend unreachable API/network problem
HTTP error from /bins/summary service/router problem
non-list JSON payload backend contract mismatch
missing qty field backend schema/API regression
output write failure filesystem problem

The script raises on all critical issues so the orchestrator can mark the pipeline as failed.


Why “Diff Only” Output Is Useful

The script intentionally writes only discrepancies rather than a full joined stock table.

Advantages:

  • much smaller output
  • operational focus on actionable problems
  • easy daily review
  • less noise in scheduled runs

This is especially useful in a cellar environment where most bins usually remain unchanged.


Example Interpretation

If an output row is:

{
  "st_bin": "03_05",
  "db_qty": 2,
  "vision_count": 4,
  "diff": 2
}

it means:

  • DB says bin 03_05 contains 2 bottles
  • vision counted 4
  • the mismatch is +2

This does not automatically mean vision is correct.
It simply means the two system views disagree and require reconciliation.


Summary

DB_compare.py is the stock reconciliation stage of the Wine Platform vision lane.

Its responsibilities are:

  • read detector results
  • fetch backend stock state
  • compare counts bin by bin
  • write discrepancy-only reports
  • stop the pipeline clearly if comparison cannot be trusted

It provides the final operational answer of the vision run:

Which bins disagree between camera-based stock and database stock?

This makes it the final analytical checkpoint before any human or system-level stock correction workflow.