Skip to content

take_clean_snapshot.py — Snapshot Acquisition, Corner Tracking, and Shelf Rectification

Overview

take_clean_snapshot.py is the image acquisition and geometric normalization entry point of the Wine Platform vision lane.

Its role is not limited to taking a picture. It performs a complete preparation pipeline:

  1. connect to RTSP camera
  2. wait until the visible shelf scene is bright enough and stable enough
  3. save an accepted snapshot
  4. detect the four corner markers of the shelf
  5. smooth and stabilize the detected geometry
  6. rectify the shelf into a normalized front-facing image
  7. write debug images and corner JSON history

This module is one of the most important computer-vision foundations in the system, because all downstream steps depend on the quality and geometric consistency of the rectified shelf image.

It is used by the scheduled pipeline orchestrator:

stock_runtime.py

and feeds the next stages such as:

  • crop_with_json.py
  • onnx_batch.py
  • DB_compare.py

Role in the Vision Pipeline

flowchart LR

subgraph Runtime
SR["stock_runtime.py"]
end

subgraph SnapshotStage
TCS["take_clean_snapshot.py"]
RTSP["RTSP frame acquisition"]
READY["illumination readiness gate"]
TRACK["corner marker tracking"]
SMOOTH["temporal + geometric stabilization"]
RECT["perspective rectification"]
end

subgraph NextStages
CROP["crop_with_json.py"]
ONNX["onnx_batch.py"]
DBC["DB_compare.py"]
end

SR --> TCS
TCS --> RTSP
RTSP --> READY
READY --> TRACK
TRACK --> SMOOTH
SMOOTH --> RECT
RECT --> CROP
CROP --> ONNX
ONNX --> DBC

Manual Main Points Source

A critical design detail of this module is that the expected shelf corner positions are not discovered automatically from scratch on every run.

Instead, they are provided by the manual calibration workflow through the kiosk page:

apps/winecellar/frontend/qt_kiosk/pages/pince_roi.py

The ROI calibration page lets the user:

  1. open the live RTSP stream
  2. freeze the last visible frame
  3. tap four shelf corner points manually
  4. save them into:
pince_shelf.ini

under:

[main_points]
top_left
top_right
bottom_right
bottom_left

This is the upstream source of the runtime tracker’s expected positions. :contentReference[oaicite:1]{index=1}


Manual ROI Calibration Architecture

flowchart TD

LIVE["Live RTSP stream on kiosk page"]
FREEZE["Freeze last frame"]
TAP["User taps 4 points in order"]
SAVE["Write [main_points] to pince_shelf.ini"]

LIVE --> FREEZE
FREEZE --> TAP
TAP --> SAVE

Required click order on the kiosk page:

  1. Top-Left
  2. Top-Right
  3. Bottom-Right
  4. Bottom-Left

These saved coordinates are later mapped by take_clean_snapshot.py into internal tracker keys:

  • TL
  • TR
  • BR
  • BL

Purpose of the Module

This module solves three core vision problems:

1. Capture a good frame

The script does not blindly accept the first RTSP frame.
It waits until the shelf region is:

  • bright enough
  • textured enough
  • stable enough over recent frames

This improves robustness when the light has just been switched on.

2. Track the four shelf markers

The script tracks four corner markers using:

  • expected positions from manual setup
  • template matching on gradient magnitude
  • local dark-dot centroid refinement

3. Normalize shelf geometry

The script transforms the raw oblique camera view into a rectified shelf image using perspective warping.

That rectified image is the canonical input for cropping and bottle detection.


Main Outputs

The module writes:

Output Purpose
snapshot.jpg accepted raw frame
shelf_rectified.jpg perspective-normalized shelf image
corners.json detected corner positions
prev_points.json smoothed history between runs
debug images inspection of every major stage

Debug images are written into:

snapshot/tmp/shelf_debug

High-Level Execution Flow

flowchart TD

START["run() start"]

CAPTURE["take_snapshot()"]
LOAD["load snapshot image"]
EXPECTED["load expected corner points from cfg.main_points"]
TEMPL["load corner templates"]

TRACK["track_all_corners()"]
SMOOTH["smooth_with_history()"]
GEOM["geometric_stabilize()"]
RECT["rectify_by_markers()"]

WRITE1["write shelf_rectified.jpg"]
WRITE2["write corners.json"]
RETURN["return runtime result dict"]

START --> CAPTURE
CAPTURE --> LOAD
LOAD --> EXPECTED
EXPECTED --> TEMPL
TEMPL --> TRACK
TRACK --> SMOOTH
SMOOTH --> GEOM
GEOM --> RECT
RECT --> WRITE1
WRITE1 --> WRITE2
WRITE2 --> RETURN

Core Data Model

CornerPts

CornerPts = Dict[str, Tuple[float, float]]

Corner coordinates are represented as a dictionary keyed by:

  • TL
  • TR
  • BL
  • BR

Each value is an (x, y) image coordinate.


Settings Dataclass

The module contains a local Settings dataclass with tracking and rectification parameters.

Variables and Roles

SEARCH_RADIUS_PX = 260

Defines the half-size of the search window around each expected point.

Role: - tracker does not search the full image - tracker searches inside a bounded ROI centered around the manually configured expected point

Effect: - larger value tolerates more camera drift - smaller value reduces false matches and improves speed


MIN_CORR = 0.35

Minimum acceptable normalized correlation score from cv2.matchTemplate.

Role: - rejects weak template matches - protects against false corner locking

Effect: - lower threshold is more tolerant but more risky - higher threshold is stricter but may fail under difficult lighting


TEMPLATE_SIZE_PX = 96

Square crop size used when initializing templates from a reference snapshot.

Role: - determines how much visual context is stored for each corner marker template

Effect: - too small: not enough context - too large: too much background variation


DOT_WIN_PX = 28

Half-window size for dark-dot refinement around the matched template center.

Role: - searches a small local region for the marker’s inner dark dot


DOT_DARK_THR = 95

Threshold used to isolate dark pixels for black-dot refinement.

Role: - pixels darker than this contribute to dot candidate extraction


MIN_QUAD_AREA = 12000.0

Minimum allowed area of the four-point quadrilateral.

Role: - rejects degenerate or tiny detections - protects against pathological matches


OUT_MARGIN_PX = 20

Margin inserted between the warped image border and the destination corner positions.

Role: - avoids placing the shelf corners exactly on the outer image border


OUT_WIDTH and OUT_HEIGHT

Optional overrides for final rectified size.

If None, output size is inferred from the measured source geometry.


Detailed Image Preparation Pipeline

The script performs several image transformations before and during tracking.


Stage 1 — Raw RTSP Frame Acquisition

The function:

read_good_frame_with_reconnect()

reads frames from the RTSP camera.

This stage includes:

  • stream opening
  • warmup reads
  • retry logic
  • reconnect logic
  • illumination readiness evaluation

This stage is essential because the camera view immediately after light switch-on may be:

  • too dark
  • still changing in brightness
  • temporarily unstable

Stage 2 — Shelf ROI Construction for Readiness Gating

Before deciding whether a frame is acceptable, the module builds a shelf-focused ROI using the manually defined expected corner points.

Function:

_shelf_roi_from_expected()

Input: - cfg.main_points - current frame shape - pad_px

Output: - (x0, y0, x1, y1) bounding box around the shelf area

Role: - readiness is evaluated only where the shelf matters - avoids influence from unrelated dark or bright regions outside the shelf

pad_px = 40

Adds a safety border around the bounding box built from the four expected corners.

This makes the readiness region slightly larger than the exact marker box.


Stage 3 — Luminance Statistics

Function:

_frame_luma_stats()

Processing: 1. crop the frame to the shelf ROI 2. convert patch to grayscale 3. compute: - mean luminance - luminance standard deviation

These two values drive the readiness gate.

Mean luminance

Represents overall brightness.

Standard deviation of luminance

Represents contrast / visual richness.

If the standard deviation is too low, the scene may be uniformly gray or washed out even if bright enough.


Stage 4 — Illumination Readiness Gate

Function:

_scene_illumination_ready()

This is one of the most important parts of the module.

The function evaluates the most recent RTSP frames and only accepts the scene if it is:

  • bright enough
  • not too flat
  • stable over time

Inputs

  • frames
  • roi
  • min_mean_luma
  • max_delta_mean
  • min_std_luma

Variables and Roles

illum_check_frames = 4

Number of recent frames kept for decision making.

Role: - readiness is not decided from a single frame - it is decided from a short recent history

min_mean_luma = 80.0

Minimum allowed average brightness.

If latest shelf ROI brightness is below this, the frame is rejected as:

too_dark

min_std_luma = 18.0

Minimum allowed grayscale standard deviation.

If lower, the frame is rejected as:

too_flat

Meaning: - the scene may be bright but lacks useful texture or contrast

max_delta_mean = 3.0

Maximum allowed brightness jump between consecutive recent frames.

If any recent luminance difference exceeds this, the frame is rejected as:

not_stable_yet

Meaning: - the light has not yet stabilized - exposure or scene brightness is still changing


Illumination Readiness Logic

flowchart TD

FRAMES["Recent RTSP frames"]
ROI["Crop shelf ROI"]
GRAY["Convert ROI to grayscale"]
MEANSTD["Compute mean and std"]
BRIGHT["Check mean >= min_mean_luma"]
TEXTURE["Check std >= min_std_luma"]
STABLE["Check recent deltas <= max_delta_mean"]

FRAMES --> ROI
ROI --> GRAY
GRAY --> MEANSTD
MEANSTD --> BRIGHT
BRIGHT --> TEXTURE
TEXTURE --> STABLE
STABLE --> READY["Frame accepted"]

Debug Images for Capture Readiness

Two debug images are written.

04_capture_ready_roi.png

Shows the shelf ROI used for readiness gating.

Purpose: - verify that the readiness window covers the correct shelf region

05_capture_ready_frame.png

Shows the accepted frame and overlays the ROI plus metrics.

Purpose: - confirm the exact frame accepted by the illumination gate


RTSP Robustness Strategy

Function:

read_good_frame_with_reconnect()

contains several anti-fragile mechanisms.

Variables and Roles

warmup_frames = 10

Number of initial frames discarded after opening the stream.

Role: - ignore unstable startup frames - allow decoder and stream to settle

max_consecutive_bad

Maximum allowed consecutive invalid reads before giving up the current attempt.

Role: - tolerate brief hiccups - stop trying if the stream is effectively unusable

open_retries

Number of full stream-open attempts.

reconnects

Number of reconnect cycles after an initial successful open but later failed readiness acquisition.

overall_timeout_s

Global time budget for capture acquisition.

Role: - prevents infinite waiting if camera or light conditions never become acceptable


Snapshot Writing

Function:

take_snapshot()

Responsibilities: - choose RTSP URL based on camera index - call read_good_frame_with_reconnect() - write the accepted frame to disk

Output file:

snapshot.jpg

This is the raw accepted snapshot before rectification.


Expected Point Extraction from Config

Function:

_expected_from_cfg()

This function maps the manually saved kiosk coordinates into the tracker’s internal format.

Mapping:

Config key Tracker key
top_left TL
top_right TR
bottom_right BR
bottom_left BL

This is where the manual calibration page and the automated runtime meet. :contentReference[oaicite:2]{index=2}

If any point is malformed, the function raises an error.


Template Initialization

Templates are reference crops for each corner marker:

  • TL.png
  • TR.png
  • BL.png
  • BR.png

Stored under:

CONF_DIR / "pince_shelf" / "templates"

init_templates_from_snapshot()

This function builds templates by cropping around the manually expected positions in a known good snapshot.

Variables:

size_px

Template crop size.

The function computes:

half = size_px // 2

and crops a square patch around each expected point.

Role: - captures local marker appearance - serves as the reference for later template matching

This mode is only used for manual initialization or refresh.


Image Preprocessing for Tracking

Two helper functions prepare images for robust marker tracking.


_clahe_gray()

Pipeline: 1. convert BGR to grayscale 2. apply CLAHE 3. apply Gaussian blur

Step 1 — grayscale conversion

Removes color dependency and focuses on intensity structure.

Step 2 — CLAHE

CLAHE = Contrast Limited Adaptive Histogram Equalization.

Role: - improve local contrast - reduce sensitivity to uneven illumination - make marker edges and patterns more visible

Step 3 — Gaussian blur

Role: - suppress small noise - improve template matching stability - make gradient computation less noisy

Output: - processed grayscale image


_gradmag()

Pipeline: 1. Sobel X gradient 2. Sobel Y gradient 3. magnitude computation 4. normalization to 0–255

Role: - represent edge structure instead of raw intensity - make template matching more robust to brightness variation

This is a key design choice.

Templates are matched against gradient magnitude images, not raw grayscale images.

That means the tracker relies more on shape and contour structure than on absolute brightness.


ROI Geometry Helpers

_roi_rect()

Builds a search rectangle centered around an expected point.

Inputs: - center (cx, cy) - radius r - image width and height

Output: - clamped rectangle (x0, y0, x1, y1)

Role: - defines the local search region for each corner tracker


Dark Dot Refinement

Function:

refine_by_dot()

After template matching, the script refines the corner position by searching for the small dark dot inside the white marker.

This stage improves precision beyond the raw template-match center.

Processing Steps

  1. crop a small window around the matched point
  2. blur it slightly
  3. threshold with binary inverse
  4. perform morphological opening
  5. connected-components analysis
  6. score candidate blobs
  7. return best centroid

Variables and Roles

win

Half-size of local refinement window.

Usually comes from:

DOT_WIN_PX

thr

Darkness threshold used in:

cv2.THRESH_BINARY_INV

Pixels darker than threshold become foreground candidates.

Usually comes from:

DOT_DARK_THR

Candidate Filtering

Connected components are filtered by area:

  • too small: likely noise
  • too large: likely not the marker dot

Current accepted area range:

6 <= area <= 260

Also rejected if touching the ROI boundary.

Role: - avoid partial or unreliable blobs


Candidate Scoring

Each blob receives a score:

score = area - 0.6 * distance_from_window_center

Interpretation: - larger blobs are better - closer-to-center blobs are better

This favors the most plausible dark marker center.


Corner Tracking

Function:

track_corner()

This is the core local tracker for one corner.

Pipeline

  1. preprocess full image to grayscale and gradient magnitude
  2. build ROI around expected point
  3. run template matching inside ROI
  4. reject if correlation below threshold
  5. compute matched center
  6. refine center by dark-dot search
  7. return (x, y) and score

Why It Is Robust

The method is robust because it combines:

  • manual prior position from kiosk setup
  • local ROI restriction
  • gradient-based template matching
  • dot-based sub-local refinement

This is much more stable than trying to detect markers globally in the full image.


Tracking All Corners

Function:

track_all_corners()

This runs track_corner() for:

  • TL
  • TR
  • BL
  • BR

It also writes useful debug images.


Debug Image 00_roi_overlay.png

Shows the original frame with the four search ROIs.

Purpose: - verify whether expected search windows are centered correctly - useful when camera has physically drifted too far


Debug Image 01_grad_roi.png

Shows the gradient magnitude image used for matching.

Purpose: - inspect whether edge information is usable - diagnose low-contrast or poorly illuminated scenes


Debug Image 02_tracked_points.png

Shows the tracked points and their correlation scores overlaid on the original image.

Purpose: - inspect final tracker decisions - verify corner order and accuracy


Quadrilateral Validation

Function:

_is_good_quad()

After all four corners are found, the script validates the resulting shape.

Checks:

  1. contour area must exceed MIN_QUAD_AREA
  2. hull area must be nonzero
  3. contour-to-hull area ratio must be at least 0.70

Meaning: - quadrilateral must be large enough - corners must not collapse - shape must be reasonably convex

This prevents invalid warps from bad tracking results.


Temporal Smoothing

Function:

smooth_with_history()

This performs persisted exponential moving average smoothing between runs.

It uses file:

prev_points.json

stored in:

corner_json_history

Purpose

The camera and tracker may show small frame-to-frame or run-to-run variations.

This smoothing reduces jitter in the detected corner positions across repeated pipeline runs.


Variables and Roles

alpha

EMA history weight.

Used as:

smoothed = alpha * previous + (1 - alpha) * current

Interpretation: - higher alpha = stronger inertia - lower alpha = more responsive to new measurements

This value comes from:

cfg.tracking.smooth_alpha

jump_threshold_px

Maximum tolerated distance from previous smoothed point before history is reset.

Role: - detect true camera repositioning or large geometry jump - avoid “smoothing through” a major viewpoint change

If a new point jumps more than this threshold, smoothing history is discarded and replaced with current points.

This value comes from:

cfg.tracking.jump_threshold_px

Geometric Stabilization

Function:

geometric_stabilize()

This stage improves quadrilateral consistency without erasing the detected perspective.

It computes several midpoint and edge-based adjustments to make the quad more stable and symmetric.

What It Does

It progressively enforces consistency between:

  • diagonal midpoints
  • top and bottom edge direction
  • left and right edge direction

Then blends the enforced geometry back toward the original detected points.


Variable beta

Controls how strongly the enforced geometry affects the final points.

Interpretation: - beta = 0 → no geometric stabilization - higher beta → more geometric regularization

This value comes from:

cfg.tracking.geom_beta

Role: - reduce noisy skew - keep stable shelf geometry over time - still preserve the true scene shape


Perspective Rectification

Function:

rectify_by_markers()

This stage converts the oblique shelf image into a normalized front-facing view.

Steps

  1. build source point array from tracked corners
  2. ensure order is TL, TR, BL, BR
  3. estimate source widths and heights
  4. choose output size
  5. build destination rectangle with margin
  6. compute homography
  7. warp image

_order_points_tl_tr_bl_br()

This helper ensures corner order is correct using point sums and coordinate differences.

Correct point order is essential for valid perspective transformation.


Output Size Logic

The output width and height are inferred from source geometry if not explicitly overridden.

Width

Computed from the longer of:

  • top edge length
  • bottom edge length

Height

Computed from the longer of:

  • left edge length
  • right edge length

This makes the destination image scale adapt to the detected shelf geometry.


Destination Margin

OUT_MARGIN_PX is used to inset the target corner positions.

This avoids a border-touching warp and gives a cleaner usable image.


Warp

The transformation matrix is computed with:

cv2.getPerspectiveTransform()

and applied with:

cv2.warpPerspective()

Result: - shelf_rectified.jpg


Debug Image 03_warped.png

Shows the final rectified shelf image.

Purpose: - verify the output geometry - inspect whether shelf normalization is correct before cropping stage


Full Geometry Processing Flow

flowchart TD

RAW["Raw accepted snapshot"]
GRAY["CLAHE grayscale"]
GRAD["Gradient magnitude"]
ROI["Local search ROI around each expected point"]
TM["Template match"]
DOT["Dark dot refinement"]
QUAD["Quad validation"]
EMA["History smoothing"]
GEO["Geometric stabilization"]
WARP["Perspective warp"]

RAW --> GRAY
GRAY --> GRAD
GRAD --> ROI
ROI --> TM
TM --> DOT
DOT --> QUAD
QUAD --> EMA
EMA --> GEO
GEO --> WARP

Debug Artifacts Summary

File Meaning
00_roi_overlay.png search windows for each corner
01_grad_roi.png gradient image used for template matching
02_tracked_points.png tracked corners and match scores
03_warped.png final rectified shelf
04_capture_ready_roi.png shelf ROI used for readiness gating
05_capture_ready_frame.png accepted frame with readiness metrics

These debug artifacts are extremely useful for diagnosing: - bad lighting - wrong expected points - camera drift - weak template matches - poor rectification


run() Function

This is the main programmatic entry point used by the vision pipeline.

Responsibilities

  1. define file paths
  2. capture snapshot
  3. read saved image
  4. load settings and expected points
  5. optionally initialize templates
  6. load templates
  7. track corners
  8. smooth points
  9. stabilize geometry
  10. rectify image
  11. write output files and JSON
  12. return result metadata

Output Files Used in run()

Variable Meaning
shelf raw accepted snapshot path
inp input image path for later processing
tmp_dir temporary/debug directory
out rectified output image path
debug_dir debug image folder
json_file final corners JSON
json_dir parent folder for history JSON

Returned Result Dictionary

The function returns a structured dictionary containing:

  • step label
  • detected points
  • match scores
  • output paths
  • debug folder path
  • snapshot file path

This integrates cleanly with stock_runtime.py.


CLI Mode

The module also has a small CLI for manual testing.

Options:

  • --cam
  • --init-templates

This is useful for: - calibration - debugging - template refresh - standalone validation


Relationship with the Kiosk ROI Page

The manual ROI kiosk page is a critical upstream dependency. :contentReference[oaicite:3]{index=3}

What the kiosk page does

  • shows live RTSP stream
  • freezes latest frame on user request
  • stores 4 manually tapped points
  • writes them atomically to pince_shelf.ini

Why this matters

The runtime tracker assumes that these expected points are approximately correct.

They are not the final runtime result.
They are the starting priors around which local tracking happens.

This hybrid model is powerful:

  • human sets approximate geometry once
  • runtime performs precise local re-detection every run

That combines: - manual reliability - automated repeatability


System-Level Interaction

sequenceDiagram

participant User
participant Kiosk
participant Ini
participant Runtime
participant Camera
participant Tracker

User->>Kiosk: tap 4 shelf corners
Kiosk->>Ini: save [main_points]

Runtime->>Camera: read RTSP frames
Camera-->>Runtime: recent frames

Runtime->>Runtime: illumination readiness gate
Runtime->>Tracker: expected points from ini
Tracker->>Tracker: template match + dot refine
Tracker-->>Runtime: tracked corners

Runtime->>Runtime: smooth + stabilize + rectify
Runtime-->>Runtime: shelf_rectified.jpg

Reliability Strategy

This module is robust because it does not rely on a single fragile mechanism.

It combines:

  • manual initialization of expected geometry
  • shelf-only readiness gating
  • retry/reconnect RTSP acquisition
  • gradient-based template matching
  • marker dot refinement
  • quadrilateral validation
  • persisted smoothing
  • geometric regularization
  • debug artifact generation

This layered design is why it is suitable for unattended cellar operation.


Failure Modes

Typical failure modes include:

Failure Meaning
missing RTSP URL camera config problem
cannot open stream network/camera issue
scene never becomes ready lighting or exposure instability
missing templates initialization incomplete
low correlation marker not found inside search ROI
bad quad detections geometrically invalid
snapshot save failure filesystem issue
image read failure corrupted or missing snapshot

The module raises explicit errors in these cases so the pipeline orchestrator can log them and fail clearly.


Summary

take_clean_snapshot.py is the image-preparation and geometric normalization core of the Wine Platform vision system.

Its responsibilities include:

  • acquiring a visually ready RTSP frame
  • evaluating shelf illumination stability
  • using manually configured corner priors from the kiosk page
  • tracking the real corner marker positions
  • smoothing and stabilizing geometry
  • producing a normalized rectified shelf image

This module transforms a noisy real-world camera stream into a stable, machine-usable canonical shelf image, which is the foundation for all downstream bin cropping and bottle detection.