CLAUDE.md 5.0 KB

CLAUDE.md — AI Development Context

This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.

Project Summary

A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.

Architecture

[Audio source]
     ↓ (USB mic or mixer line-in)
[Windows PC]
  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
  ├── Mosquitto MQTT broker (port 1883)
  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
     ↓ (WiFi / MQTT topic: display/text)
[ESP32-WROOM or S3]
  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)

PC Environment

  • OS: Windows 10/11
  • GPU: NVIDIA RTX series (tested with RTX 4070 Super)
  • Python: 3.11+
  • MQTT broker: Mosquitto (localhost:1883)
  • Whisper server: WhisperLiveKit (wlk --model large-v3 --language en)
  • Whisper WebSocket: ws://localhost:8000/asr

ESP32 Environment

  • Board: ESP32-WROOM-32 or ESP32-S3
  • Framework: Arduino (via PlatformIO)
  • Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
  • Display library: GxEPD2
  • MQTT library: PubSubClient
  • Build tool: PlatformIO (VSCode)

SPI Wiring (Waveshare 7.5" V2 to ESP32)

Display Pin ESP32 Pin
BUSY GPIO 4
RST GPIO 16
DC GPIO 17
CS GPIO 5
CLK GPIO 18
DIN GPIO 23
GND GND
VCC 3.3V

MQTT Topics

Topic Direction Payload
display/text PC → ESP32 JSON: {"lines": ["line1", "line2", "line3"]}
display/clear PC → ESP32 Empty / any
display/status ESP32 → PC JSON: {"ready": true}

Key Files

  • bridge/bridge.py — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
  • esp32/src/main.cpp — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
  • esp32/platformio.ini — Board and library config.

Design Constraints & Decisions

Refresh Strategy

  • Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
  • Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
  • Current approach: buffer until sentence boundary or 4-second silence, then push full screen update.
  • Display shows 3 lines of text. New text pushes old text up; oldest line drops off.

Text Formatting

  • Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
  • At ~800px wide with a large font: approximately 35–45 characters per line
  • Lines wrap at word boundaries
  • All caps optional for readability (configurable)

Audio Input

  • Preferred: direct feed from church mixing desk (line-in or USB audio interface)
  • Fallback: USB condenser microphone near pulpit/lectern
  • Whisper performs best with clean, low-noise input
  • VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically

Network

  • All on local WiFi (church LAN or dedicated hotspot)
  • MQTT broker on Windows PC
  • ESP32 connects to same WiFi network
  • Static IP recommended for ESP32 to avoid reconnection delays

Bridge Script Logic (bridge.py)

1. Connect to Mosquitto MQTT broker
2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
3. Receive partial transcription updates
4. Accumulate words into a sentence buffer
5. On sentence-end signal (or timeout):
   a. Word-wrap text into lines (max ~40 chars each)
   b. Maintain a rolling 3-line buffer
   c. Publish JSON payload to MQTT topic display/text
6. On reconnect events: re-establish WS and MQTT connections

Known Issues / Open Questions

  • Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
  • Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
  • Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
  • ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
  • WiFi reconnection handling in firmware — need watchdog/retry logic

Development Notes

  • WhisperLiveKit WebSocket returns incremental JSON with text and is_final fields
  • GxEPD2 supports both full and partial refresh; partial requires setPartialWindow()
  • PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
  • Use client.setBufferSize(512) in PubSubClient setup

Testing Approach

  1. Test Whisper server standalone: speak into mic, verify text in browser at http://localhost:8000
  2. Test MQTT: use MQTT Explorer or mosquitto_sub to verify bridge publishes correctly
  3. Test ESP32 display: send static MQTT messages manually before connecting bridge
  4. End-to-end: full pipeline test with recorded sermon audio
  5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback