CLAUDE.md — AI Development Context

This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.

Project Summary

A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.

Architecture

[Audio source]
     ↓ (USB mic or mixer line-in)
[Windows PC]
  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
  ├── Mosquitto MQTT broker (port 1883)
  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
     ↓ (WiFi / MQTT topic: display/text)
[ESP32-WROOM or S3]
  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)

PC Environment

OS: Windows 10/11
GPU: NVIDIA RTX series (tested with RTX 4070 Super)
Python: 3.11+
MQTT broker: Mosquitto (localhost:1883)
Whisper server: WhisperLiveKit (wlk --model large-v3 --language en)
Whisper WebSocket: ws://localhost:8000/asr

ESP32 Environment

Board: ESP32-WROOM-32 or ESP32-S3
Framework: Arduino (via PlatformIO)
Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
Display library: GxEPD2
MQTT library: PubSubClient
Build tool: PlatformIO (VSCode)

SPI Wiring (Waveshare 7.5" V2 to ESP32)

Display Pin	ESP32 Pin
BUSY	GPIO 4
RST	GPIO 16
DC	GPIO 17
CS	GPIO 5
CLK	GPIO 18
DIN	GPIO 23
GND	GND
VCC	3.3V

MQTT Topics

Topic	Direction	Payload
`display/text`	PC → ESP32	JSON: `{"lines": ["line1", "line2", "line3"]}`
`display/clear`	PC → ESP32	Empty / any
`display/status`	ESP32 → PC	JSON: `{"ready": true}`

Key Files

bridge/bridge.py — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
esp32/src/main.cpp — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
esp32/platformio.ini — Board and library config.

Design Constraints & Decisions

Refresh Strategy

Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
Current approach: buffer until sentence boundary or 4-second silence, then push full screen update.
Display shows 3 lines of text. New text pushes old text up; oldest line drops off.

Text Formatting

Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
At ~800px wide with a large font: approximately 35–45 characters per line
Lines wrap at word boundaries
All caps optional for readability (configurable)

Audio Input

Preferred: direct feed from church mixing desk (line-in or USB audio interface)
Fallback: USB condenser microphone near pulpit/lectern
Whisper performs best with clean, low-noise input
VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically

Network

All on local WiFi (church LAN or dedicated hotspot)
MQTT broker on Windows PC
ESP32 connects to same WiFi network
Static IP recommended for ESP32 to avoid reconnection delays

Bridge Script Logic (bridge.py)

1. Connect to Mosquitto MQTT broker
2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
3. Receive partial transcription updates
4. Accumulate words into a sentence buffer
5. On sentence-end signal (or timeout):
   a. Word-wrap text into lines (max ~40 chars each)
   b. Maintain a rolling 3-line buffer
   c. Publish JSON payload to MQTT topic display/text
6. On reconnect events: re-establish WS and MQTT connections

Known Issues / Open Questions

Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
WiFi reconnection handling in firmware — need watchdog/retry logic

Development Notes

WhisperLiveKit WebSocket returns incremental JSON with text and is_final fields
GxEPD2 supports both full and partial refresh; partial requires setPartialWindow()
PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
Use client.setBufferSize(512) in PubSubClient setup

Testing Approach

Test Whisper server standalone: speak into mic, verify text in browser at http://localhost:8000
Test MQTT: use MQTT Explorer or mosquitto_sub to verify bridge publishes correctly
Test ESP32 display: send static MQTT messages manually before connecting bridge
End-to-end: full pipeline test with recorded sermon audio
In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback

CLAUDE.md 5.0 KB Historia Raaka