DB_compare.py — Vision vs Database Stock Reconciliation¶
Overview¶
DB_compare.py is the final comparison stage of the Wine Platform automated vision lane.
Its task is to compare:
- the vision-derived bottle counts produced by
onnx_batch.py - against the current stock quantities stored in the backend database
It performs this comparison bin by bin and writes outputs containing only the mismatches.
This script is the final analytical step of the scheduled detection pipeline.
Its role is not to update the DB automatically.
Instead, it produces structured discrepancy outputs for review or later action.
Position in the Vision Pipeline¶
flowchart TD
ONNX["onnx_batch.py"]
CSV["results.csv"]
API["GET /bins/summary"]
CMP["DB_compare.py"]
DIFFCSV["diff_only.csv"]
DIFFJSON["diff_only.json"]
POST["POST /vision/proposals"]
VALIHA["Home Assistant validation"]
ONNX --> CSV
API --> CMP
CSV --> CMP
CMP --> DIFFCSV
CMP --> DIFFJSON
CMP --> POST
POST --> VALIHA
Main Purpose¶
The goal of this script is operationally simple:
Find all bins where vision count != DB quantity
This is the reconciliation point between:
- physical reality as seen by the camera
- logical stock state as stored in MariaDB via the API layer
Only bins with differences are written to output.
This keeps the result focused and practical.
Inputs¶
The script consumes two independent data sources.
1. Vision results CSV¶
Path:
<cfg.paths["det_out"]>/results.csv
Produced by:
onnx_batch.py
Expected important fields:
bin_IDcount
2. Backend bins summary API¶
Retrieved via API.
Endpoint:
GET /bins/summary
Example response:
[ { "st_bin": "03_03", "qty": 5 }, { "st_bin": "03_04", "qty": 6 } ]
Outputs¶
Written into:
<cfg.paths["det_out"]>
Files:
diff_only.csv¶
Flat CSV list of mismatching bins.
diff_only.json¶
Structured JSON payload with:
- total diff count
- list of mismatch rows
Proposal Creation¶
Instead of modifying the inventory automatically, the script creates a proposal.
Endpoint:
POST /vision/proposals
Example payload:
{ "id": 999, "bin_code": "03_03", "quantity_difference": -1, "snapshot_path": "/local/wine/03_03.png", "status": "pending", "created_at": "2026-03-12T10:00:00", "bin_details": { "items":[ {"description": "Chablis", "year": 2022, "qty": 3}, {"description": "Sancerre", "year": 2021, "qty": 2} ]} }
Snapshot Path¶
The snapshot image is generated earlier in the pipeline.
Example:
/local/wine/03_03.png
This path is compatible with Home Assistant image rendering.
Architectural Role¶
This script is the final read-only comparison stage.
It does not:
- modify DB
- trigger automatic stock update
- attempt conflict resolution
Instead, it provides a reliable discrepancy report.
This is a safe architectural choice because: - vision can be uncertain - DB can be outdated - reconciliation should remain observable and reviewable
Configuration and Constants¶
API_TIMEOUT = 10.0¶
Timeout used for the backend HTTP request.
Purpose: - prevent pipeline hang if the backend is unreachable - fail clearly in bounded time
High-Level Execution Flow¶
flowchart TD
START["run(cfg)"]
CSV["load vision results.csv"]
API["fetch /bins/summary"]
AGG["aggregate vision counts by bin"]
MAP["map DB qty by st_bin"]
CMP["compare all bins"]
WRITE["write diff_only.csv + diff_only.json"]
POST["prepare and initiate router call"]
RET["return summary dict"]
START --> CSV
CSV --> AGG
AGG --> API
API --> MAP
MAP --> CMP
CMP --> WRITE
CMP --> POST
POST --> WRITE
WRITE --> RET
Loading Vision Counts¶
_load_vision_counts(csv_path)¶
This helper reads the vision CSV and returns two things:
- aggregated counts by
bin_ID - raw rows list
The raw rows are currently not used downstream inside this script, but returning them keeps the function flexible.
CSV Validation¶
The function checks:
- file exists
- CSV has header
bin_IDcolumn existscountcolumn exists
If any of these are missing, it raises an error.
This is important because downstream comparison must not proceed on malformed detector output.
Count Aggregation Logic¶
Counts are aggregated by bin_ID using:
defaultdict(int)
Each row contributes:
counts_by_bin[bin_id] += c
This means if multiple rows exist for the same bin, their counts are summed.
That makes the comparison tolerant to multi-row detector outputs.
Numeric Conversion¶
count values are converted via:
int(float(row.get("count") or 0))
This tolerates: - integer strings - float-like strings - blanks
If conversion fails, count is treated as 0.
This design is practical for pipeline robustness.
Fetching DB State¶
_fetch_bins_summary(api_base, timeout_s=10.0)¶
This helper performs:
GET {api_base}/bins/summary
using the requests library.
Validation Rules¶
The response must be:
- successful HTTP status
- JSON-decoded
- top-level list
If the payload is not a list, the function raises ValueError.
This protects against router regressions or API contract mismatches.
Comparison Logic¶
_compare(vision_counts, db_rows)¶
This is the core reconciliation function.
It converts the DB payload into a simple mapping:
st_bin -> qty
Then compares that mapping against:
bin_ID -> vision_count
DB Field Contract¶
For each DB row:
st_bin¶
The unique bin identifier.
qty¶
The database stock quantity.
The script explicitly requires qty to exist.
If a row lacks it, comparison fails.
This strictness is intentional because field-name ambiguity would make reconciliation unreliable.
Union of Bins¶
The function compares over:
all_bins = set(db_qty.keys()) | set(vision_counts.keys())
This is important.
It means the script detects differences even when:
- a bin exists in DB but not in vision
- a bin exists in vision but not in DB
- a bin exists in both but counts differ
So it is not limited to only overlapping keys.
Mismatch Record Format¶
For each mismatching bin, the script appends:
{
"st_bin": "...",
"db_qty": ...,
"vision_count": ...,
"diff": ...
}
Where:
diff = vision_count - db_qty
Interpretation: - positive diff → vision sees more bottles than DB says - negative diff → vision sees fewer bottles than DB says
Writing Outputs¶
_write_output(diffs, out_path_csv, out_path_json)¶
This helper writes both JSON and CSV discrepancy outputs.
JSON Output¶
Payload format:
{
"diff_count": N,
"diffs": [...]
}
This is useful for: - machine reading - API forwarding - later automation - audit archiving
CSV Output¶
Field order:
st_bindb_qtyvision_countdiff
This produces a compact table suitable for quick inspection or spreadsheet import.
Public Pipeline Entry Point¶
run(cfg)¶
This is the callable orchestration entry point used by stock_runtime.py.
It performs the full compare workflow:
- resolve API base and detection output folder
- load vision counts from CSV
- fetch DB stock state from API
- compare results
- build proposal payload
- send POST request to backend
- backend stores proposal
- write diff-only outputs
- return structured summary
- Home Assistant receives notification
Path Roles Inside run(cfg)¶
api_base¶
str(cfg.api_base)
Backend base URL used to contact the winecellar API.
det_out¶
Path(cfg.paths["det_out"])
Directory containing:
- results.csv
- output discrepancy files
vision_csv¶
det_out / results.csv
out_csv¶
det_out / diff_only.csv
out_json¶
det_out / diff_only.json
Path compatible with Home Assistant image rendering¶
This should be moved to settings.py
/local/wine/03_03.png
Returned Summary Structure¶
run(cfg) returns a dictionary containing:
- step name
- API base
- vision CSV path
- diff count
- diff CSV path
- diff JSON path
This makes the reconciliation stage easy to summarize inside pipeline runtime logs.
Manual Entry Point¶
main()¶
Standalone mode:
- configure logging
- load config
- run comparison
- print result summary
Useful for: - backend API testing - detector result validation - debugging mismatches without running full pipeline
Relationship with Upstream Detector Output¶
The comparison depends directly on:
results.csv
generated by onnx_batch.py.
Important assumptions:
bin_IDidentifies the shelf bincountis the final detector-derived stock count- the CSV file exists and is well-formed
So DB_compare.py does not perform visual inference itself.
It trusts the detector stage as its input truth on the vision side.
Relationship with Backend API¶
The script depends on the backend route:
/bins/summary
This means it compares against the API’s logical business view of stock, not directly against raw database tables.
That is architecturally sound because: - the API defines the application contract - internal DB schema can evolve - workers remain decoupled from direct DB access
Failure Modes¶
Typical failures include:
| Failure | Meaning |
|---|---|
missing results.csv |
detector stage failed or not run |
| malformed CSV header | detector output contract broken |
| backend unreachable | API/network problem |
HTTP error from /bins/summary |
service/router problem |
| non-list JSON payload | backend contract mismatch |
missing qty field |
backend schema/API regression |
| output write failure | filesystem problem |
The script raises on all critical issues so the orchestrator can mark the pipeline as failed.
Why “Diff Only” Output Is Useful¶
The script intentionally writes only discrepancies rather than a full joined stock table.
Advantages:
- much smaller output
- operational focus on actionable problems
- easy daily review
- less noise in scheduled runs
This is especially useful in a cellar environment where most bins usually remain unchanged.
Example Interpretation¶
If an output row is:
{
"st_bin": "03_05",
"db_qty": 2,
"vision_count": 4,
"diff": 2
}
it means:
- DB says bin
03_05contains 2 bottles - vision counted 4
- the mismatch is +2
This does not automatically mean vision is correct.
It simply means the two system views disagree and require reconciliation.
Summary¶
DB_compare.py is the stock reconciliation stage of the Wine Platform vision lane.
Its responsibilities are:
- read detector results
- fetch backend stock state
- compare counts bin by bin
- write discrepancy-only reports
- stop the pipeline clearly if comparison cannot be trusted
It provides the final operational answer of the vision run:
Which bins disagree between camera-based stock and database stock?
This makes it the final analytical checkpoint before any human or system-level stock correction workflow.