# CLAUDE.md — AI Development Context This file provides context for AI-assisted development sessions on the Church Live Transcription Display project. ## Project Summary A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation. ## Architecture ``` [Audio source] ↓ (USB mic or mixer line-in) [Windows PC] ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000) ├── Mosquitto MQTT broker (port 1883) └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher) ↓ (WiFi / MQTT topic: display/text) [ESP32-WROOM or S3] └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library) ``` ## PC Environment - OS: Windows 10/11 - GPU: NVIDIA RTX series (tested with RTX 4070 Super) - Python: 3.11+ - MQTT broker: Mosquitto (localhost:1883) - Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`) - Whisper WebSocket: `ws://localhost:8000/asr` ## ESP32 Environment - Board: ESP32-WROOM-32 or ESP32-S3 - Framework: Arduino (via PlatformIO) - Display: Waveshare 7.5" V2 (800×480 pixels, black/white) - Display library: GxEPD2 - MQTT library: PubSubClient - Build tool: PlatformIO (VSCode) ### SPI Wiring (Waveshare 7.5" V2 to ESP32) | Display Pin | ESP32 Pin | |---|---| | BUSY | GPIO 4 | | RST | GPIO 16 | | DC | GPIO 17 | | CS | GPIO 5 | | CLK | GPIO 18 | | DIN | GPIO 23 | | GND | GND | | VCC | 3.3V | ## MQTT Topics | Topic | Direction | Payload | |---|---|---| | `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` | | `display/clear` | PC → ESP32 | Empty / any | | `display/status` | ESP32 → PC | JSON: `{"ready": true}` | ## Key Files - `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT. - `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink. - `esp32/platformio.ini` — Board and library config. ## Design Constraints & Decisions ### Refresh Strategy - Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates. - Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed. - **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update. - Display shows 3 lines of text. New text pushes old text up; oldest line drops off. ### Text Formatting - Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide) - At ~800px wide with a large font: approximately 35–45 characters per line - Lines wrap at word boundaries - All caps optional for readability (configurable) ### Audio Input - Preferred: direct feed from church mixing desk (line-in or USB audio interface) - Fallback: USB condenser microphone near pulpit/lectern - Whisper performs best with clean, low-noise input - VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically ### Network - All on local WiFi (church LAN or dedicated hotspot) - MQTT broker on Windows PC - ESP32 connects to same WiFi network - Static IP recommended for ESP32 to avoid reconnection delays ## Bridge Script Logic (bridge.py) ``` 1. Connect to Mosquitto MQTT broker 2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr) 3. Receive partial transcription updates 4. Accumulate words into a sentence buffer 5. On sentence-end signal (or timeout): a. Word-wrap text into lines (max ~40 chars each) b. Maintain a rolling 3-line buffer c. Publish JSON payload to MQTT topic display/text 6. On reconnect events: re-establish WS and MQTT connections ``` ## Known Issues / Open Questions - [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear? - [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency - [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references) - [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant) - [ ] WiFi reconnection handling in firmware — need watchdog/retry logic ## Development Notes - WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields - GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()` - PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes) - Use `client.setBufferSize(512)` in PubSubClient setup ## Testing Approach 1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000` 2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly 3. Test ESP32 display: send static MQTT messages manually before connecting bridge 4. End-to-end: full pipeline test with recorded sermon audio 5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback