take_clean_snapshot.py — Snapshot Acquisition, Corner Tracking, and Shelf Rectification¶
Overview¶
take_clean_snapshot.py is the image acquisition and geometric normalization entry point of the Wine Platform vision lane.
Its role is not limited to taking a picture. It performs a complete preparation pipeline:
- connect to RTSP camera
- wait until the visible shelf scene is bright enough and stable enough
- save an accepted snapshot
- detect the four corner markers of the shelf
- smooth and stabilize the detected geometry
- rectify the shelf into a normalized front-facing image
- write debug images and corner JSON history
This module is one of the most important computer-vision foundations in the system, because all downstream steps depend on the quality and geometric consistency of the rectified shelf image.
It is used by the scheduled pipeline orchestrator:
stock_runtime.py
and feeds the next stages such as:
crop_with_json.pyonnx_batch.pyDB_compare.py
Role in the Vision Pipeline¶
flowchart LR
subgraph Runtime
SR["stock_runtime.py"]
end
subgraph SnapshotStage
TCS["take_clean_snapshot.py"]
RTSP["RTSP frame acquisition"]
READY["illumination readiness gate"]
TRACK["corner marker tracking"]
SMOOTH["temporal + geometric stabilization"]
RECT["perspective rectification"]
end
subgraph NextStages
CROP["crop_with_json.py"]
ONNX["onnx_batch.py"]
DBC["DB_compare.py"]
end
SR --> TCS
TCS --> RTSP
RTSP --> READY
READY --> TRACK
TRACK --> SMOOTH
SMOOTH --> RECT
RECT --> CROP
CROP --> ONNX
ONNX --> DBC
Manual Main Points Source¶
A critical design detail of this module is that the expected shelf corner positions are not discovered automatically from scratch on every run.
Instead, they are provided by the manual calibration workflow through the kiosk page:
apps/winecellar/frontend/qt_kiosk/pages/pince_roi.py
The ROI calibration page lets the user:
- open the live RTSP stream
- freeze the last visible frame
- tap four shelf corner points manually
- save them into:
pince_shelf.ini
under:
[main_points]
top_left
top_right
bottom_right
bottom_left
This is the upstream source of the runtime tracker’s expected positions. :contentReference[oaicite:1]{index=1}
Manual ROI Calibration Architecture¶
flowchart TD
LIVE["Live RTSP stream on kiosk page"]
FREEZE["Freeze last frame"]
TAP["User taps 4 points in order"]
SAVE["Write [main_points] to pince_shelf.ini"]
LIVE --> FREEZE
FREEZE --> TAP
TAP --> SAVE
Required click order on the kiosk page:
- Top-Left
- Top-Right
- Bottom-Right
- Bottom-Left
These saved coordinates are later mapped by take_clean_snapshot.py into internal tracker keys:
TLTRBRBL
Purpose of the Module¶
This module solves three core vision problems:
1. Capture a good frame¶
The script does not blindly accept the first RTSP frame.
It waits until the shelf region is:
- bright enough
- textured enough
- stable enough over recent frames
This improves robustness when the light has just been switched on.
2. Track the four shelf markers¶
The script tracks four corner markers using:
- expected positions from manual setup
- template matching on gradient magnitude
- local dark-dot centroid refinement
3. Normalize shelf geometry¶
The script transforms the raw oblique camera view into a rectified shelf image using perspective warping.
That rectified image is the canonical input for cropping and bottle detection.
Main Outputs¶
The module writes:
| Output | Purpose |
|---|---|
snapshot.jpg |
accepted raw frame |
shelf_rectified.jpg |
perspective-normalized shelf image |
corners.json |
detected corner positions |
prev_points.json |
smoothed history between runs |
| debug images | inspection of every major stage |
Debug images are written into:
snapshot/tmp/shelf_debug
High-Level Execution Flow¶
flowchart TD
START["run() start"]
CAPTURE["take_snapshot()"]
LOAD["load snapshot image"]
EXPECTED["load expected corner points from cfg.main_points"]
TEMPL["load corner templates"]
TRACK["track_all_corners()"]
SMOOTH["smooth_with_history()"]
GEOM["geometric_stabilize()"]
RECT["rectify_by_markers()"]
WRITE1["write shelf_rectified.jpg"]
WRITE2["write corners.json"]
RETURN["return runtime result dict"]
START --> CAPTURE
CAPTURE --> LOAD
LOAD --> EXPECTED
EXPECTED --> TEMPL
TEMPL --> TRACK
TRACK --> SMOOTH
SMOOTH --> GEOM
GEOM --> RECT
RECT --> WRITE1
WRITE1 --> WRITE2
WRITE2 --> RETURN
Core Data Model¶
CornerPts¶
CornerPts = Dict[str, Tuple[float, float]]
Corner coordinates are represented as a dictionary keyed by:
TLTRBLBR
Each value is an (x, y) image coordinate.
Settings Dataclass¶
The module contains a local Settings dataclass with tracking and rectification parameters.
Variables and Roles¶
SEARCH_RADIUS_PX = 260¶
Defines the half-size of the search window around each expected point.
Role: - tracker does not search the full image - tracker searches inside a bounded ROI centered around the manually configured expected point
Effect: - larger value tolerates more camera drift - smaller value reduces false matches and improves speed
MIN_CORR = 0.35¶
Minimum acceptable normalized correlation score from cv2.matchTemplate.
Role: - rejects weak template matches - protects against false corner locking
Effect: - lower threshold is more tolerant but more risky - higher threshold is stricter but may fail under difficult lighting
TEMPLATE_SIZE_PX = 96¶
Square crop size used when initializing templates from a reference snapshot.
Role: - determines how much visual context is stored for each corner marker template
Effect: - too small: not enough context - too large: too much background variation
DOT_WIN_PX = 28¶
Half-window size for dark-dot refinement around the matched template center.
Role: - searches a small local region for the marker’s inner dark dot
DOT_DARK_THR = 95¶
Threshold used to isolate dark pixels for black-dot refinement.
Role: - pixels darker than this contribute to dot candidate extraction
MIN_QUAD_AREA = 12000.0¶
Minimum allowed area of the four-point quadrilateral.
Role: - rejects degenerate or tiny detections - protects against pathological matches
OUT_MARGIN_PX = 20¶
Margin inserted between the warped image border and the destination corner positions.
Role: - avoids placing the shelf corners exactly on the outer image border
OUT_WIDTH and OUT_HEIGHT¶
Optional overrides for final rectified size.
If None, output size is inferred from the measured source geometry.
Detailed Image Preparation Pipeline¶
The script performs several image transformations before and during tracking.
Stage 1 — Raw RTSP Frame Acquisition¶
The function:
read_good_frame_with_reconnect()
reads frames from the RTSP camera.
This stage includes:
- stream opening
- warmup reads
- retry logic
- reconnect logic
- illumination readiness evaluation
This stage is essential because the camera view immediately after light switch-on may be:
- too dark
- still changing in brightness
- temporarily unstable
Stage 2 — Shelf ROI Construction for Readiness Gating¶
Before deciding whether a frame is acceptable, the module builds a shelf-focused ROI using the manually defined expected corner points.
Function:
_shelf_roi_from_expected()
Input:
- cfg.main_points
- current frame shape
- pad_px
Output:
- (x0, y0, x1, y1) bounding box around the shelf area
Role: - readiness is evaluated only where the shelf matters - avoids influence from unrelated dark or bright regions outside the shelf
pad_px = 40¶
Adds a safety border around the bounding box built from the four expected corners.
This makes the readiness region slightly larger than the exact marker box.
Stage 3 — Luminance Statistics¶
Function:
_frame_luma_stats()
Processing: 1. crop the frame to the shelf ROI 2. convert patch to grayscale 3. compute: - mean luminance - luminance standard deviation
These two values drive the readiness gate.
Mean luminance¶
Represents overall brightness.
Standard deviation of luminance¶
Represents contrast / visual richness.
If the standard deviation is too low, the scene may be uniformly gray or washed out even if bright enough.
Stage 4 — Illumination Readiness Gate¶
Function:
_scene_illumination_ready()
This is one of the most important parts of the module.
The function evaluates the most recent RTSP frames and only accepts the scene if it is:
- bright enough
- not too flat
- stable over time
Inputs¶
framesroimin_mean_lumamax_delta_meanmin_std_luma
Variables and Roles¶
illum_check_frames = 4¶
Number of recent frames kept for decision making.
Role: - readiness is not decided from a single frame - it is decided from a short recent history
min_mean_luma = 80.0¶
Minimum allowed average brightness.
If latest shelf ROI brightness is below this, the frame is rejected as:
too_dark
min_std_luma = 18.0¶
Minimum allowed grayscale standard deviation.
If lower, the frame is rejected as:
too_flat
Meaning: - the scene may be bright but lacks useful texture or contrast
max_delta_mean = 3.0¶
Maximum allowed brightness jump between consecutive recent frames.
If any recent luminance difference exceeds this, the frame is rejected as:
not_stable_yet
Meaning: - the light has not yet stabilized - exposure or scene brightness is still changing
Illumination Readiness Logic¶
flowchart TD
FRAMES["Recent RTSP frames"]
ROI["Crop shelf ROI"]
GRAY["Convert ROI to grayscale"]
MEANSTD["Compute mean and std"]
BRIGHT["Check mean >= min_mean_luma"]
TEXTURE["Check std >= min_std_luma"]
STABLE["Check recent deltas <= max_delta_mean"]
FRAMES --> ROI
ROI --> GRAY
GRAY --> MEANSTD
MEANSTD --> BRIGHT
BRIGHT --> TEXTURE
TEXTURE --> STABLE
STABLE --> READY["Frame accepted"]
Debug Images for Capture Readiness¶
Two debug images are written.
04_capture_ready_roi.png¶
Shows the shelf ROI used for readiness gating.
Purpose: - verify that the readiness window covers the correct shelf region
05_capture_ready_frame.png¶
Shows the accepted frame and overlays the ROI plus metrics.
Purpose: - confirm the exact frame accepted by the illumination gate
RTSP Robustness Strategy¶
Function:
read_good_frame_with_reconnect()
contains several anti-fragile mechanisms.
Variables and Roles¶
warmup_frames = 10¶
Number of initial frames discarded after opening the stream.
Role: - ignore unstable startup frames - allow decoder and stream to settle
max_consecutive_bad¶
Maximum allowed consecutive invalid reads before giving up the current attempt.
Role: - tolerate brief hiccups - stop trying if the stream is effectively unusable
open_retries¶
Number of full stream-open attempts.
reconnects¶
Number of reconnect cycles after an initial successful open but later failed readiness acquisition.
overall_timeout_s¶
Global time budget for capture acquisition.
Role: - prevents infinite waiting if camera or light conditions never become acceptable
Snapshot Writing¶
Function:
take_snapshot()
Responsibilities:
- choose RTSP URL based on camera index
- call read_good_frame_with_reconnect()
- write the accepted frame to disk
Output file:
snapshot.jpg
This is the raw accepted snapshot before rectification.
Expected Point Extraction from Config¶
Function:
_expected_from_cfg()
This function maps the manually saved kiosk coordinates into the tracker’s internal format.
Mapping:
| Config key | Tracker key |
|---|---|
top_left |
TL |
top_right |
TR |
bottom_right |
BR |
bottom_left |
BL |
This is where the manual calibration page and the automated runtime meet. :contentReference[oaicite:2]{index=2}
If any point is malformed, the function raises an error.
Template Initialization¶
Templates are reference crops for each corner marker:
TL.pngTR.pngBL.pngBR.png
Stored under:
CONF_DIR / "pince_shelf" / "templates"
init_templates_from_snapshot()¶
This function builds templates by cropping around the manually expected positions in a known good snapshot.
Variables:
size_px¶
Template crop size.
The function computes:
half = size_px // 2
and crops a square patch around each expected point.
Role: - captures local marker appearance - serves as the reference for later template matching
This mode is only used for manual initialization or refresh.
Image Preprocessing for Tracking¶
Two helper functions prepare images for robust marker tracking.
_clahe_gray()¶
Pipeline: 1. convert BGR to grayscale 2. apply CLAHE 3. apply Gaussian blur
Step 1 — grayscale conversion¶
Removes color dependency and focuses on intensity structure.
Step 2 — CLAHE¶
CLAHE = Contrast Limited Adaptive Histogram Equalization.
Role: - improve local contrast - reduce sensitivity to uneven illumination - make marker edges and patterns more visible
Step 3 — Gaussian blur¶
Role: - suppress small noise - improve template matching stability - make gradient computation less noisy
Output: - processed grayscale image
_gradmag()¶
Pipeline: 1. Sobel X gradient 2. Sobel Y gradient 3. magnitude computation 4. normalization to 0–255
Role: - represent edge structure instead of raw intensity - make template matching more robust to brightness variation
This is a key design choice.
Templates are matched against gradient magnitude images, not raw grayscale images.
That means the tracker relies more on shape and contour structure than on absolute brightness.
ROI Geometry Helpers¶
_roi_rect()¶
Builds a search rectangle centered around an expected point.
Inputs:
- center (cx, cy)
- radius r
- image width and height
Output:
- clamped rectangle (x0, y0, x1, y1)
Role: - defines the local search region for each corner tracker
Dark Dot Refinement¶
Function:
refine_by_dot()
After template matching, the script refines the corner position by searching for the small dark dot inside the white marker.
This stage improves precision beyond the raw template-match center.
Processing Steps¶
- crop a small window around the matched point
- blur it slightly
- threshold with binary inverse
- perform morphological opening
- connected-components analysis
- score candidate blobs
- return best centroid
Variables and Roles¶
win¶
Half-size of local refinement window.
Usually comes from:
DOT_WIN_PX
thr¶
Darkness threshold used in:
cv2.THRESH_BINARY_INV
Pixels darker than threshold become foreground candidates.
Usually comes from:
DOT_DARK_THR
Candidate Filtering¶
Connected components are filtered by area:
- too small: likely noise
- too large: likely not the marker dot
Current accepted area range:
6 <= area <= 260
Also rejected if touching the ROI boundary.
Role: - avoid partial or unreliable blobs
Candidate Scoring¶
Each blob receives a score:
score = area - 0.6 * distance_from_window_center
Interpretation: - larger blobs are better - closer-to-center blobs are better
This favors the most plausible dark marker center.
Corner Tracking¶
Function:
track_corner()
This is the core local tracker for one corner.
Pipeline¶
- preprocess full image to grayscale and gradient magnitude
- build ROI around expected point
- run template matching inside ROI
- reject if correlation below threshold
- compute matched center
- refine center by dark-dot search
- return
(x, y)and score
Why It Is Robust¶
The method is robust because it combines:
- manual prior position from kiosk setup
- local ROI restriction
- gradient-based template matching
- dot-based sub-local refinement
This is much more stable than trying to detect markers globally in the full image.
Tracking All Corners¶
Function:
track_all_corners()
This runs track_corner() for:
TLTRBLBR
It also writes useful debug images.
Debug Image 00_roi_overlay.png¶
Shows the original frame with the four search ROIs.
Purpose: - verify whether expected search windows are centered correctly - useful when camera has physically drifted too far
Debug Image 01_grad_roi.png¶
Shows the gradient magnitude image used for matching.
Purpose: - inspect whether edge information is usable - diagnose low-contrast or poorly illuminated scenes
Debug Image 02_tracked_points.png¶
Shows the tracked points and their correlation scores overlaid on the original image.
Purpose: - inspect final tracker decisions - verify corner order and accuracy
Quadrilateral Validation¶
Function:
_is_good_quad()
After all four corners are found, the script validates the resulting shape.
Checks:
- contour area must exceed
MIN_QUAD_AREA - hull area must be nonzero
- contour-to-hull area ratio must be at least
0.70
Meaning: - quadrilateral must be large enough - corners must not collapse - shape must be reasonably convex
This prevents invalid warps from bad tracking results.
Temporal Smoothing¶
Function:
smooth_with_history()
This performs persisted exponential moving average smoothing between runs.
It uses file:
prev_points.json
stored in:
corner_json_history
Purpose¶
The camera and tracker may show small frame-to-frame or run-to-run variations.
This smoothing reduces jitter in the detected corner positions across repeated pipeline runs.
Variables and Roles¶
alpha¶
EMA history weight.
Used as:
smoothed = alpha * previous + (1 - alpha) * current
Interpretation:
- higher alpha = stronger inertia
- lower alpha = more responsive to new measurements
This value comes from:
cfg.tracking.smooth_alpha
jump_threshold_px¶
Maximum tolerated distance from previous smoothed point before history is reset.
Role: - detect true camera repositioning or large geometry jump - avoid “smoothing through” a major viewpoint change
If a new point jumps more than this threshold, smoothing history is discarded and replaced with current points.
This value comes from:
cfg.tracking.jump_threshold_px
Geometric Stabilization¶
Function:
geometric_stabilize()
This stage improves quadrilateral consistency without erasing the detected perspective.
It computes several midpoint and edge-based adjustments to make the quad more stable and symmetric.
What It Does¶
It progressively enforces consistency between:
- diagonal midpoints
- top and bottom edge direction
- left and right edge direction
Then blends the enforced geometry back toward the original detected points.
Variable beta¶
Controls how strongly the enforced geometry affects the final points.
Interpretation:
- beta = 0 → no geometric stabilization
- higher beta → more geometric regularization
This value comes from:
cfg.tracking.geom_beta
Role: - reduce noisy skew - keep stable shelf geometry over time - still preserve the true scene shape
Perspective Rectification¶
Function:
rectify_by_markers()
This stage converts the oblique shelf image into a normalized front-facing view.
Steps¶
- build source point array from tracked corners
- ensure order is
TL, TR, BL, BR - estimate source widths and heights
- choose output size
- build destination rectangle with margin
- compute homography
- warp image
_order_points_tl_tr_bl_br()¶
This helper ensures corner order is correct using point sums and coordinate differences.
Correct point order is essential for valid perspective transformation.
Output Size Logic¶
The output width and height are inferred from source geometry if not explicitly overridden.
Width¶
Computed from the longer of:
- top edge length
- bottom edge length
Height¶
Computed from the longer of:
- left edge length
- right edge length
This makes the destination image scale adapt to the detected shelf geometry.
Destination Margin¶
OUT_MARGIN_PX is used to inset the target corner positions.
This avoids a border-touching warp and gives a cleaner usable image.
Warp¶
The transformation matrix is computed with:
cv2.getPerspectiveTransform()
and applied with:
cv2.warpPerspective()
Result:
- shelf_rectified.jpg
Debug Image 03_warped.png¶
Shows the final rectified shelf image.
Purpose: - verify the output geometry - inspect whether shelf normalization is correct before cropping stage
Full Geometry Processing Flow¶
flowchart TD
RAW["Raw accepted snapshot"]
GRAY["CLAHE grayscale"]
GRAD["Gradient magnitude"]
ROI["Local search ROI around each expected point"]
TM["Template match"]
DOT["Dark dot refinement"]
QUAD["Quad validation"]
EMA["History smoothing"]
GEO["Geometric stabilization"]
WARP["Perspective warp"]
RAW --> GRAY
GRAY --> GRAD
GRAD --> ROI
ROI --> TM
TM --> DOT
DOT --> QUAD
QUAD --> EMA
EMA --> GEO
GEO --> WARP
Debug Artifacts Summary¶
| File | Meaning |
|---|---|
00_roi_overlay.png |
search windows for each corner |
01_grad_roi.png |
gradient image used for template matching |
02_tracked_points.png |
tracked corners and match scores |
03_warped.png |
final rectified shelf |
04_capture_ready_roi.png |
shelf ROI used for readiness gating |
05_capture_ready_frame.png |
accepted frame with readiness metrics |
These debug artifacts are extremely useful for diagnosing: - bad lighting - wrong expected points - camera drift - weak template matches - poor rectification
run() Function¶
This is the main programmatic entry point used by the vision pipeline.
Responsibilities¶
- define file paths
- capture snapshot
- read saved image
- load settings and expected points
- optionally initialize templates
- load templates
- track corners
- smooth points
- stabilize geometry
- rectify image
- write output files and JSON
- return result metadata
Output Files Used in run()¶
| Variable | Meaning |
|---|---|
shelf |
raw accepted snapshot path |
inp |
input image path for later processing |
tmp_dir |
temporary/debug directory |
out |
rectified output image path |
debug_dir |
debug image folder |
json_file |
final corners JSON |
json_dir |
parent folder for history JSON |
Returned Result Dictionary¶
The function returns a structured dictionary containing:
- step label
- detected points
- match scores
- output paths
- debug folder path
- snapshot file path
This integrates cleanly with stock_runtime.py.
CLI Mode¶
The module also has a small CLI for manual testing.
Options:
--cam--init-templates
This is useful for: - calibration - debugging - template refresh - standalone validation
Relationship with the Kiosk ROI Page¶
The manual ROI kiosk page is a critical upstream dependency. :contentReference[oaicite:3]{index=3}
What the kiosk page does¶
- shows live RTSP stream
- freezes latest frame on user request
- stores 4 manually tapped points
- writes them atomically to
pince_shelf.ini
Why this matters¶
The runtime tracker assumes that these expected points are approximately correct.
They are not the final runtime result.
They are the starting priors around which local tracking happens.
This hybrid model is powerful:
- human sets approximate geometry once
- runtime performs precise local re-detection every run
That combines: - manual reliability - automated repeatability
System-Level Interaction¶
sequenceDiagram
participant User
participant Kiosk
participant Ini
participant Runtime
participant Camera
participant Tracker
User->>Kiosk: tap 4 shelf corners
Kiosk->>Ini: save [main_points]
Runtime->>Camera: read RTSP frames
Camera-->>Runtime: recent frames
Runtime->>Runtime: illumination readiness gate
Runtime->>Tracker: expected points from ini
Tracker->>Tracker: template match + dot refine
Tracker-->>Runtime: tracked corners
Runtime->>Runtime: smooth + stabilize + rectify
Runtime-->>Runtime: shelf_rectified.jpg
Reliability Strategy¶
This module is robust because it does not rely on a single fragile mechanism.
It combines:
- manual initialization of expected geometry
- shelf-only readiness gating
- retry/reconnect RTSP acquisition
- gradient-based template matching
- marker dot refinement
- quadrilateral validation
- persisted smoothing
- geometric regularization
- debug artifact generation
This layered design is why it is suitable for unattended cellar operation.
Failure Modes¶
Typical failure modes include:
| Failure | Meaning |
|---|---|
| missing RTSP URL | camera config problem |
| cannot open stream | network/camera issue |
| scene never becomes ready | lighting or exposure instability |
| missing templates | initialization incomplete |
| low correlation | marker not found inside search ROI |
| bad quad | detections geometrically invalid |
| snapshot save failure | filesystem issue |
| image read failure | corrupted or missing snapshot |
The module raises explicit errors in these cases so the pipeline orchestrator can log them and fail clearly.
Summary¶
take_clean_snapshot.py is the image-preparation and geometric normalization core of the Wine Platform vision system.
Its responsibilities include:
- acquiring a visually ready RTSP frame
- evaluating shelf illumination stability
- using manually configured corner priors from the kiosk page
- tracking the real corner marker positions
- smoothing and stabilizing geometry
- producing a normalized rectified shelf image
This module transforms a noisy real-world camera stream into a stable, machine-usable canonical shelf image, which is the foundation for all downstream bin cropping and bottle detection.