AI Label Parsing Pipeline¶
Purpose¶
The AI Label Parsing pipeline extracts structured wine metadata from a bottle label image captured in the Winecellar kiosk.
The system is designed with three layers of intelligence to guarantee reliability:
- OCR extraction using Google Cloud Vision
- Heuristic parsing using local logic
- AI enrichment using OpenAI structured extraction
This architecture ensures the kiosk workflow still functions even if the AI service is unavailable.
High-Level Architecture¶
flowchart TD
A[Label Image Upload] --> B[FastAPI Endpoint /label/parse]
B --> C[label.py Router]
C --> D[Google Vision OCR]
D --> E[Raw OCR Text]
E --> F[Heuristic Parsing]
F --> G[Basic Prefill]
G --> H{AI Enabled}
H -->|No| I[Return OCR Prefill]
H -->|Yes| J[OpenAI Label Parser]
J --> K[Structured Wine Metadata]
K --> L[Merge Results]
L --> M[Return Final Response]
Files Involved¶
Router¶
apps/winecellar/backend/app/api/routers/label.py
Handles the HTTP request and orchestrates the full parsing workflow.
OCR Service¶
apps/winecellar/backend/app/services/vision_ocr.py
Responsible for extracting text from the label image using Google Cloud Vision.
AI Label Parser¶
apps/winecellar/backend/app/services/ai_label_parser.py
Uses OpenAI models to convert OCR text into structured wine metadata.
Backend Configuration¶
apps/winecellar/backend/app/core/config.py
Loads runtime configuration and environment variables.
Detailed Execution Flow¶
1. Image Upload¶
The kiosk or frontend uploads a label image.
Request example:
POST /label/parse?ai=true
Content-Type: multipart/form-data
file=<label_image>
The request is handled by the router:
label.py
2. OCR Stage¶
The router calls the OCR service.
Function:
google_vision_ocr_bytes(image_bytes, language_hints)
This function uses Google Cloud Vision:
vision.ImageAnnotatorClient()
document_text_detection()
Result returned:
{
"text": "raw OCR extracted text"
}
Important characteristics:
- Extracts only text
- Does not perform semantic parsing
- Uses document_text_detection for high accuracy
3. Heuristic Parsing¶
After OCR, the router performs basic local parsing.
Function:
parse_basic(text)
This logic extracts minimal metadata using local rules.
Typical extracted values:
| Field | Method |
|---|---|
| year | regex detection |
| wine name | first strong text line |
| appellation | keyword detection |
Example extracted content:
2018
Chateau Margaux
Margaux
This step ensures the system works even without AI services.
4. Prefill Construction¶
Function:
build_prefill_from_basic()
Transforms parsed values into kiosk form fields.
Mapping example:
| OCR Data | Kiosk Field |
|---|---|
| wine_name | description |
| appellation | region |
| vintage | year |
| producer | winery |
Example prefill:
{
"description": "Chateau Margaux",
"region": "Margaux",
"year": 2018,
"color": "red",
"winery": "Chateau Margaux"
}
5. Optional AI Parsing¶
If the request includes:
?ai=true
the router calls the AI parser.
File:
ai_label_parser.py
Function:
parse_wine_label_from_ocr_text(text)
The function sends OCR text to OpenAI.
Model typically used:
gpt-4o-mini
The prompt instructs the model to return structured JSON metadata.
Example output:
{
"producer": "Chateau Margaux",
"wine_name": "Margaux",
"appellation": "Margaux",
"region": "Bordeaux",
"country": "France",
"vintage": 2018,
"color": "red",
"grapes": ["Cabernet Sauvignon", "Merlot"],
"confidence": 0.93
}
6. Result Merge Strategy¶
Function:
build_prefill_from_ai()
AI results are merged with heuristic results.
Priority order:
AI value
↓
heuristic value
↓
empty
This guarantees that the best available data is used.
API Response Example¶
Typical response:
{
"provider": "google_vision+openai",
"text": "2018 Chateau Margaux Margaux",
"year": 2018,
"name": "Chateau Margaux",
"appellation": "Margaux",
"prefill": {
"description": "Chateau Margaux",
"region": "Margaux",
"year": 2018,
"color": "red",
"winery": "Chateau Margaux"
},
"ai": {
"producer": "Chateau Margaux",
"region": "Bordeaux"
}
}
Configuration System¶
Configuration is handled by:
config.py
Using:
pydantic-settings
Environment File Discovery¶
config.py loads configuration in this order.
1. Explicit override¶
WINECELLAR_ENV_FILE
2. Shared platform configuration¶
shared/config/shared.env
3. Local development fallback¶
.env
API Credentials¶
The code does not contain hardcoded API keys.
Credentials are read from the runtime environment.
Google Vision Credentials¶
Expected environment variable:
GOOGLE_APPLICATION_CREDENTIALS
Example:
GOOGLE_APPLICATION_CREDENTIALS=/home/pi/secrets/google_vision_service_account.json
Used automatically by:
vision.ImageAnnotatorClient()
OpenAI Credentials¶
Expected environment variable:
OPENAI_API_KEY
Example:
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx
Used automatically by:
OpenAI()
Where Environment Variables Come From¶
Two sources can provide environment variables.
1. Shared Environment File¶
Location:
shared/config/shared.env
Example:
OPENAI_API_KEY=xxxx
GOOGLE_APPLICATION_CREDENTIALS=/home/pi/secrets/google.json
Loaded automatically by the backend configuration loader.
2. Systemd Service Environment¶
Location:
/etc/systemd/system/winecellar-api.service
Example configuration:
Environment=OPENAI_API_KEY=xxxx
Environment=GOOGLE_APPLICATION_CREDENTIALS=/path/file.json
Or:
EnvironmentFile=/home/pi/wine_platform/shared/config/shared.env
Configuration Precedence¶
When the backend runs via systemd, environment variables are injected before the application starts.
Runtime environment variables therefore take priority.
Reliability Model¶
| Component | Dependency | Impact |
|---|---|---|
| Google OCR | Hard dependency | request fails if unavailable |
| OpenAI parser | Soft dependency | fallback to OCR |
| Heuristic parsing | Local | always available |
Failure Handling¶
OpenAI Failure¶
Possible causes:
- missing API key
- network issue
- quota exceeded
Behavior:
ai_error field returned in response
OCR results are still returned.
Google OCR Failure¶
Possible causes:
- missing credentials
- invalid credential file
- network failure
Behavior:
RuntimeError raised
The request fails.
Troubleshooting¶
Inspect service configuration¶
systemctl cat winecellar-api.service
Look for environment variables:
OPENAI_API_KEY
GOOGLE_APPLICATION_CREDENTIALS
Verify environment file¶
shared/config/shared.env
Test endpoint manually¶
curl -X POST http://localhost:8000/label/parse
Summary¶
| Component | Responsibility |
|---|---|
| label.py | API endpoint and orchestration |
| vision_ocr.py | OCR text extraction |
| ai_label_parser.py | AI metadata extraction |
| config.py | configuration loading |
Pipeline:
label image
→ OCR extraction
→ heuristic parsing
→ optional AI enrichment
→ kiosk field prefill