Skip to content

AI Label Parsing Pipeline

Purpose

The AI Label Parsing pipeline extracts structured wine metadata from a bottle label image captured in the Winecellar kiosk.

The system is designed with three layers of intelligence to guarantee reliability:

  1. OCR extraction using Google Cloud Vision
  2. Heuristic parsing using local logic
  3. AI enrichment using OpenAI structured extraction

This architecture ensures the kiosk workflow still functions even if the AI service is unavailable.


High-Level Architecture

flowchart TD

A[Label Image Upload] --> B[FastAPI Endpoint /label/parse]

B --> C[label.py Router]

C --> D[Google Vision OCR]
D --> E[Raw OCR Text]

E --> F[Heuristic Parsing]
F --> G[Basic Prefill]

G --> H{AI Enabled}

H -->|No| I[Return OCR Prefill]

H -->|Yes| J[OpenAI Label Parser]

J --> K[Structured Wine Metadata]

K --> L[Merge Results]

L --> M[Return Final Response]

Files Involved

Router

apps/winecellar/backend/app/api/routers/label.py

Handles the HTTP request and orchestrates the full parsing workflow.


OCR Service

apps/winecellar/backend/app/services/vision_ocr.py

Responsible for extracting text from the label image using Google Cloud Vision.


AI Label Parser

apps/winecellar/backend/app/services/ai_label_parser.py

Uses OpenAI models to convert OCR text into structured wine metadata.


Backend Configuration

apps/winecellar/backend/app/core/config.py

Loads runtime configuration and environment variables.


Detailed Execution Flow

1. Image Upload

The kiosk or frontend uploads a label image.

Request example:

POST /label/parse?ai=true
Content-Type: multipart/form-data
file=<label_image>

The request is handled by the router:

label.py

2. OCR Stage

The router calls the OCR service.

Function:

google_vision_ocr_bytes(image_bytes, language_hints)

This function uses Google Cloud Vision:

vision.ImageAnnotatorClient()
document_text_detection()

Result returned:

{
  "text": "raw OCR extracted text"
}

Important characteristics:

  • Extracts only text
  • Does not perform semantic parsing
  • Uses document_text_detection for high accuracy

3. Heuristic Parsing

After OCR, the router performs basic local parsing.

Function:

parse_basic(text)

This logic extracts minimal metadata using local rules.

Typical extracted values:

Field Method
year regex detection
wine name first strong text line
appellation keyword detection

Example extracted content:

2018
Chateau Margaux
Margaux

This step ensures the system works even without AI services.


4. Prefill Construction

Function:

build_prefill_from_basic()

Transforms parsed values into kiosk form fields.

Mapping example:

OCR Data Kiosk Field
wine_name description
appellation region
vintage year
producer winery

Example prefill:

{
  "description": "Chateau Margaux",
  "region": "Margaux",
  "year": 2018,
  "color": "red",
  "winery": "Chateau Margaux"
}

5. Optional AI Parsing

If the request includes:

?ai=true

the router calls the AI parser.

File:

ai_label_parser.py

Function:

parse_wine_label_from_ocr_text(text)

The function sends OCR text to OpenAI.

Model typically used:

gpt-4o-mini

The prompt instructs the model to return structured JSON metadata.

Example output:

{
  "producer": "Chateau Margaux",
  "wine_name": "Margaux",
  "appellation": "Margaux",
  "region": "Bordeaux",
  "country": "France",
  "vintage": 2018,
  "color": "red",
  "grapes": ["Cabernet Sauvignon", "Merlot"],
  "confidence": 0.93
}

6. Result Merge Strategy

Function:

build_prefill_from_ai()

AI results are merged with heuristic results.

Priority order:

AI value
↓
heuristic value
↓
empty

This guarantees that the best available data is used.


API Response Example

Typical response:

{
  "provider": "google_vision+openai",

  "text": "2018 Chateau Margaux Margaux",

  "year": 2018,
  "name": "Chateau Margaux",
  "appellation": "Margaux",

  "prefill": {
    "description": "Chateau Margaux",
    "region": "Margaux",
    "year": 2018,
    "color": "red",
    "winery": "Chateau Margaux"
  },

  "ai": {
    "producer": "Chateau Margaux",
    "region": "Bordeaux"
  }
}

Configuration System

Configuration is handled by:

config.py

Using:

pydantic-settings

Environment File Discovery

config.py loads configuration in this order.

1. Explicit override

WINECELLAR_ENV_FILE

2. Shared platform configuration

shared/config/shared.env

3. Local development fallback

.env

API Credentials

The code does not contain hardcoded API keys.

Credentials are read from the runtime environment.


Google Vision Credentials

Expected environment variable:

GOOGLE_APPLICATION_CREDENTIALS

Example:

GOOGLE_APPLICATION_CREDENTIALS=/home/pi/secrets/google_vision_service_account.json

Used automatically by:

vision.ImageAnnotatorClient()

OpenAI Credentials

Expected environment variable:

OPENAI_API_KEY

Example:

OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx

Used automatically by:

OpenAI()

Where Environment Variables Come From

Two sources can provide environment variables.


1. Shared Environment File

Location:

shared/config/shared.env

Example:

OPENAI_API_KEY=xxxx
GOOGLE_APPLICATION_CREDENTIALS=/home/pi/secrets/google.json

Loaded automatically by the backend configuration loader.


2. Systemd Service Environment

Location:

/etc/systemd/system/winecellar-api.service

Example configuration:

Environment=OPENAI_API_KEY=xxxx
Environment=GOOGLE_APPLICATION_CREDENTIALS=/path/file.json

Or:

EnvironmentFile=/home/pi/wine_platform/shared/config/shared.env

Configuration Precedence

When the backend runs via systemd, environment variables are injected before the application starts.

Runtime environment variables therefore take priority.


Reliability Model

Component Dependency Impact
Google OCR Hard dependency request fails if unavailable
OpenAI parser Soft dependency fallback to OCR
Heuristic parsing Local always available

Failure Handling

OpenAI Failure

Possible causes:

  • missing API key
  • network issue
  • quota exceeded

Behavior:

ai_error field returned in response

OCR results are still returned.


Google OCR Failure

Possible causes:

  • missing credentials
  • invalid credential file
  • network failure

Behavior:

RuntimeError raised

The request fails.


Troubleshooting

Inspect service configuration

systemctl cat winecellar-api.service

Look for environment variables:

OPENAI_API_KEY
GOOGLE_APPLICATION_CREDENTIALS

Verify environment file

shared/config/shared.env

Test endpoint manually

curl -X POST http://localhost:8000/label/parse

Summary

Component Responsibility
label.py API endpoint and orchestration
vision_ocr.py OCR text extraction
ai_label_parser.py AI metadata extraction
config.py configuration loading

Pipeline:

label image
→ OCR extraction
→ heuristic parsing
→ optional AI enrichment
→ kiosk field prefill