|
@@ -0,0 +1,129 @@
|
|
|
|
|
+# CLAUDE.md — AI Development Context
|
|
|
|
|
+
|
|
|
|
|
+This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.
|
|
|
|
|
+
|
|
|
|
|
+## Project Summary
|
|
|
|
|
+
|
|
|
|
|
+A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.
|
|
|
|
|
+
|
|
|
|
|
+## Architecture
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+[Audio source]
|
|
|
|
|
+ ↓ (USB mic or mixer line-in)
|
|
|
|
|
+[Windows PC]
|
|
|
|
|
+ ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
|
|
|
|
|
+ ├── Mosquitto MQTT broker (port 1883)
|
|
|
|
|
+ └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
|
|
|
|
|
+ ↓ (WiFi / MQTT topic: display/text)
|
|
|
|
|
+[ESP32-WROOM or S3]
|
|
|
|
|
+ └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+## PC Environment
|
|
|
|
|
+
|
|
|
|
|
+- OS: Windows 10/11
|
|
|
|
|
+- GPU: NVIDIA RTX series (tested with RTX 4070 Super)
|
|
|
|
|
+- Python: 3.11+
|
|
|
|
|
+- MQTT broker: Mosquitto (localhost:1883)
|
|
|
|
|
+- Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`)
|
|
|
|
|
+- Whisper WebSocket: `ws://localhost:8000/asr`
|
|
|
|
|
+
|
|
|
|
|
+## ESP32 Environment
|
|
|
|
|
+
|
|
|
|
|
+- Board: ESP32-WROOM-32 or ESP32-S3
|
|
|
|
|
+- Framework: Arduino (via PlatformIO)
|
|
|
|
|
+- Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
|
|
|
|
|
+- Display library: GxEPD2
|
|
|
|
|
+- MQTT library: PubSubClient
|
|
|
|
|
+- Build tool: PlatformIO (VSCode)
|
|
|
|
|
+
|
|
|
|
|
+### SPI Wiring (Waveshare 7.5" V2 to ESP32)
|
|
|
|
|
+
|
|
|
|
|
+| Display Pin | ESP32 Pin |
|
|
|
|
|
+|---|---|
|
|
|
|
|
+| BUSY | GPIO 4 |
|
|
|
|
|
+| RST | GPIO 16 |
|
|
|
|
|
+| DC | GPIO 17 |
|
|
|
|
|
+| CS | GPIO 5 |
|
|
|
|
|
+| CLK | GPIO 18 |
|
|
|
|
|
+| DIN | GPIO 23 |
|
|
|
|
|
+| GND | GND |
|
|
|
|
|
+| VCC | 3.3V |
|
|
|
|
|
+
|
|
|
|
|
+## MQTT Topics
|
|
|
|
|
+
|
|
|
|
|
+| Topic | Direction | Payload |
|
|
|
|
|
+|---|---|---|
|
|
|
|
|
+| `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` |
|
|
|
|
|
+| `display/clear` | PC → ESP32 | Empty / any |
|
|
|
|
|
+| `display/status` | ESP32 → PC | JSON: `{"ready": true}` |
|
|
|
|
|
+
|
|
|
|
|
+## Key Files
|
|
|
|
|
+
|
|
|
|
|
+- `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
|
|
|
|
|
+- `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
|
|
|
|
|
+- `esp32/platformio.ini` — Board and library config.
|
|
|
|
|
+
|
|
|
|
|
+## Design Constraints & Decisions
|
|
|
|
|
+
|
|
|
|
|
+### Refresh Strategy
|
|
|
|
|
+- Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
|
|
|
|
|
+- Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
|
|
|
|
|
+- **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update.
|
|
|
|
|
+- Display shows 3 lines of text. New text pushes old text up; oldest line drops off.
|
|
|
|
|
+
|
|
|
|
|
+### Text Formatting
|
|
|
|
|
+- Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
|
|
|
|
|
+- At ~800px wide with a large font: approximately 35–45 characters per line
|
|
|
|
|
+- Lines wrap at word boundaries
|
|
|
|
|
+- All caps optional for readability (configurable)
|
|
|
|
|
+
|
|
|
|
|
+### Audio Input
|
|
|
|
|
+- Preferred: direct feed from church mixing desk (line-in or USB audio interface)
|
|
|
|
|
+- Fallback: USB condenser microphone near pulpit/lectern
|
|
|
|
|
+- Whisper performs best with clean, low-noise input
|
|
|
|
|
+- VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically
|
|
|
|
|
+
|
|
|
|
|
+### Network
|
|
|
|
|
+- All on local WiFi (church LAN or dedicated hotspot)
|
|
|
|
|
+- MQTT broker on Windows PC
|
|
|
|
|
+- ESP32 connects to same WiFi network
|
|
|
|
|
+- Static IP recommended for ESP32 to avoid reconnection delays
|
|
|
|
|
+
|
|
|
|
|
+## Bridge Script Logic (bridge.py)
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+1. Connect to Mosquitto MQTT broker
|
|
|
|
|
+2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
|
|
|
|
|
+3. Receive partial transcription updates
|
|
|
|
|
+4. Accumulate words into a sentence buffer
|
|
|
|
|
+5. On sentence-end signal (or timeout):
|
|
|
|
|
+ a. Word-wrap text into lines (max ~40 chars each)
|
|
|
|
|
+ b. Maintain a rolling 3-line buffer
|
|
|
|
|
+ c. Publish JSON payload to MQTT topic display/text
|
|
|
|
|
+6. On reconnect events: re-establish WS and MQTT connections
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+## Known Issues / Open Questions
|
|
|
|
|
+
|
|
|
|
|
+- [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
|
|
|
|
|
+- [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
|
|
|
|
|
+- [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
|
|
|
|
|
+- [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
|
|
|
|
|
+- [ ] WiFi reconnection handling in firmware — need watchdog/retry logic
|
|
|
|
|
+
|
|
|
|
|
+## Development Notes
|
|
|
|
|
+
|
|
|
|
|
+- WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields
|
|
|
|
|
+- GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()`
|
|
|
|
|
+- PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
|
|
|
|
|
+- Use `client.setBufferSize(512)` in PubSubClient setup
|
|
|
|
|
+
|
|
|
|
|
+## Testing Approach
|
|
|
|
|
+
|
|
|
|
|
+1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000`
|
|
|
|
|
+2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly
|
|
|
|
|
+3. Test ESP32 display: send static MQTT messages manually before connecting bridge
|
|
|
|
|
+4. End-to-end: full pipeline test with recorded sermon audio
|
|
|
|
|
+5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback
|