DB_compare.py — Vision vs Database Stock Reconciliation¶

Overview¶

DB_compare.py is the final comparison stage of the Wine Platform automated vision lane.

Its task is to compare:

the vision-derived bottle counts produced by onnx_batch.py
against the current stock quantities stored in the backend database

It performs this comparison bin by bin and writes outputs containing only the mismatches.

This script is the final analytical step of the scheduled detection pipeline.

Its role is not to update the DB automatically.
Instead, it produces structured discrepancy outputs for review or later action.

Position in the Vision Pipeline¶

flowchart TD

ONNX["onnx_batch.py"]
CSV["results.csv"]
API["GET /bins/summary"]
CMP["DB_compare.py"]
DIFFCSV["diff_only.csv"]
DIFFJSON["diff_only.json"]
POST["POST /vision/proposals"]
VALIHA["Home Assistant validation"]

ONNX --> CSV
API --> CMP
CSV --> CMP
CMP --> DIFFCSV
CMP --> DIFFJSON
CMP --> POST
POST --> VALIHA

Main Purpose¶

The goal of this script is operationally simple:

Find all bins where vision count != DB quantity

This is the reconciliation point between:

physical reality as seen by the camera
logical stock state as stored in MariaDB via the API layer

Only bins with differences are written to output.

This keeps the result focused and practical.

Inputs¶

The script consumes two independent data sources.

1. Vision results CSV¶

Path:

<cfg.paths["det_out"]>/results.csv

Produced by:

onnx_batch.py

Expected important fields:

bin_ID
count

2. Backend bins summary API¶

Retrieved via API.

Endpoint:

GET /bins/summary

Example response:

[ { "st_bin": "03_03", "qty": 5 }, { "st_bin": "03_04", "qty": 6 } ]

Outputs¶

Written into:

<cfg.paths["det_out"]>

Files:

`diff_only.csv`¶

Flat CSV list of mismatching bins.

`diff_only.json`¶

Structured JSON payload with:

total diff count
list of mismatch rows

Proposal Creation¶

Instead of modifying the inventory automatically, the script creates a proposal.

Endpoint:

POST /vision/proposals

Example payload:

{ "id": 999, "bin_code": "03_03", "quantity_difference": -1, "snapshot_path": "/local/wine/03_03.png", "status": "pending", "created_at": "2026-03-12T10:00:00", "bin_details": { "items":[ {"description": "Chablis", "year": 2022, "qty": 3}, {"description": "Sancerre", "year": 2021, "qty": 2} ]} }

Snapshot Path¶

The snapshot image is generated earlier in the pipeline.

Example:

/local/wine/03_03.png

This path is compatible with Home Assistant image rendering.

Architectural Role¶

This script is the final read-only comparison stage.

It does not:

modify DB
trigger automatic stock update
attempt conflict resolution

Instead, it provides a reliable discrepancy report.

This is a safe architectural choice because: - vision can be uncertain - DB can be outdated - reconciliation should remain observable and reviewable

Configuration and Constants¶

`API_TIMEOUT = 10.0`¶

Timeout used for the backend HTTP request.

Purpose: - prevent pipeline hang if the backend is unreachable - fail clearly in bounded time

High-Level Execution Flow¶

flowchart TD

START["run(cfg)"]
CSV["load vision results.csv"]
API["fetch /bins/summary"]
AGG["aggregate vision counts by bin"]
MAP["map DB qty by st_bin"]
CMP["compare all bins"]
WRITE["write diff_only.csv + diff_only.json"]
POST["prepare and initiate router call"]
RET["return summary dict"]

START --> CSV
CSV --> AGG
AGG --> API
API --> MAP
MAP --> CMP
CMP --> WRITE
CMP --> POST
POST --> WRITE
WRITE --> RET

Loading Vision Counts¶

`_load_vision_counts(csv_path)`¶

This helper reads the vision CSV and returns two things:

aggregated counts by bin_ID
raw rows list

The raw rows are currently not used downstream inside this script, but returning them keeps the function flexible.

CSV Validation¶

The function checks:

file exists
CSV has header
bin_ID column exists
count column exists

If any of these are missing, it raises an error.

This is important because downstream comparison must not proceed on malformed detector output.

Count Aggregation Logic¶

Counts are aggregated by bin_ID using:

defaultdict(int)

Each row contributes:

counts_by_bin[bin_id] += c

This means if multiple rows exist for the same bin, their counts are summed.

That makes the comparison tolerant to multi-row detector outputs.

Numeric Conversion¶

count values are converted via:

int(float(row.get("count") or 0))

This tolerates: - integer strings - float-like strings - blanks

If conversion fails, count is treated as 0.

This design is practical for pipeline robustness.

Fetching DB State¶

`_fetch_bins_summary(api_base, timeout_s=10.0)`¶

This helper performs:

GET {api_base}/bins/summary

using the requests library.

Validation Rules¶

The response must be:

successful HTTP status
JSON-decoded
top-level list

If the payload is not a list, the function raises ValueError.

This protects against router regressions or API contract mismatches.

Comparison Logic¶

`_compare(vision_counts, db_rows)`¶

This is the core reconciliation function.

It converts the DB payload into a simple mapping:

st_bin -> qty

Then compares that mapping against:

bin_ID -> vision_count

DB Field Contract¶

For each DB row:

`st_bin`¶

The unique bin identifier.

`qty`¶

The database stock quantity.

The script explicitly requires qty to exist.
If a row lacks it, comparison fails.

This strictness is intentional because field-name ambiguity would make reconciliation unreliable.

Union of Bins¶

The function compares over:

all_bins = set(db_qty.keys()) | set(vision_counts.keys())

This is important.

It means the script detects differences even when:

a bin exists in DB but not in vision
a bin exists in vision but not in DB
a bin exists in both but counts differ

So it is not limited to only overlapping keys.

Mismatch Record Format¶

For each mismatching bin, the script appends:

{
  "st_bin": "...",
  "db_qty": ...,
  "vision_count": ...,
  "diff": ...
}

Where:

diff = vision_count - db_qty

Interpretation: - positive diff → vision sees more bottles than DB says - negative diff → vision sees fewer bottles than DB says

Writing Outputs¶

`_write_output(diffs, out_path_csv, out_path_json)`¶

This helper writes both JSON and CSV discrepancy outputs.

JSON Output¶

Payload format:

{
  "diff_count": N,
  "diffs": [...]
}

This is useful for: - machine reading - API forwarding - later automation - audit archiving

CSV Output¶

Field order:

st_bin
db_qty
vision_count
diff

This produces a compact table suitable for quick inspection or spreadsheet import.

Public Pipeline Entry Point¶

`run(cfg)`¶

This is the callable orchestration entry point used by stock_runtime.py.

It performs the full compare workflow:

resolve API base and detection output folder
load vision counts from CSV
fetch DB stock state from API
compare results
build proposal payload
send POST request to backend
backend stores proposal
write diff-only outputs
return structured summary
Home Assistant receives notification

Path Roles Inside `run(cfg)`¶

`api_base`¶

str(cfg.api_base)

Backend base URL used to contact the winecellar API.

`det_out`¶

Path(cfg.paths["det_out"])

Directory containing: - results.csv - output discrepancy files

`vision_csv`¶

det_out / results.csv

`out_csv`¶

det_out / diff_only.csv

`out_json`¶

det_out / diff_only.json

Path compatible with Home Assistant image rendering¶

This should be moved to settings.py

/local/wine/03_03.png

Returned Summary Structure¶

run(cfg) returns a dictionary containing:

step name
API base
vision CSV path
diff count
diff CSV path
diff JSON path

This makes the reconciliation stage easy to summarize inside pipeline runtime logs.

Manual Entry Point¶

`main()`¶

Standalone mode:

configure logging
load config
run comparison
print result summary

Useful for: - backend API testing - detector result validation - debugging mismatches without running full pipeline

Relationship with Upstream Detector Output¶

The comparison depends directly on:

results.csv

generated by onnx_batch.py.

Important assumptions:

bin_ID identifies the shelf bin
count is the final detector-derived stock count
the CSV file exists and is well-formed

So DB_compare.py does not perform visual inference itself.
It trusts the detector stage as its input truth on the vision side.

Relationship with Backend API¶

The script depends on the backend route:

/bins/summary

This means it compares against the API’s logical business view of stock, not directly against raw database tables.

That is architecturally sound because: - the API defines the application contract - internal DB schema can evolve - workers remain decoupled from direct DB access

Failure Modes¶

Typical failures include:

Failure	Meaning
missing `results.csv`	detector stage failed or not run
malformed CSV header	detector output contract broken
backend unreachable	API/network problem
HTTP error from `/bins/summary`	service/router problem
non-list JSON payload	backend contract mismatch
missing `qty` field	backend schema/API regression
output write failure	filesystem problem

The script raises on all critical issues so the orchestrator can mark the pipeline as failed.

Why “Diff Only” Output Is Useful¶

The script intentionally writes only discrepancies rather than a full joined stock table.

Advantages:

much smaller output
operational focus on actionable problems
easy daily review
less noise in scheduled runs

This is especially useful in a cellar environment where most bins usually remain unchanged.

Example Interpretation¶

If an output row is:

{
  "st_bin": "03_05",
  "db_qty": 2,
  "vision_count": 4,
  "diff": 2
}

it means:

DB says bin 03_05 contains 2 bottles
vision counted 4
the mismatch is +2

This does not automatically mean vision is correct.
It simply means the two system views disagree and require reconciliation.

Summary¶

DB_compare.py is the stock reconciliation stage of the Wine Platform vision lane.

Its responsibilities are:

read detector results
fetch backend stock state
compare counts bin by bin
write discrepancy-only reports
stop the pipeline clearly if comparison cannot be trusted

It provides the final operational answer of the vision run:

Which bins disagree between camera-based stock and database stock?

This makes it the final analytical checkpoint before any human or system-level stock correction workflow.