1 ヶ月前 · 6d417b9941
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,129 @@
 
				+# CLAUDE.md — AI Development Context
			
 
				+
			
 
				+This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.
			
 
				+
			
 
				+## Project Summary
			
 
				+
			
 
				+A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.
			
 
				+
			
 
				+## Architecture
			
 
				+
			
 
				+```
			
 
				+[Audio source]
			
 
				+     ↓ (USB mic or mixer line-in)
			
 
				+[Windows PC]
			
 
				+  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
			
 
				+  ├── Mosquitto MQTT broker (port 1883)
			
 
				+  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
			
 
				+     ↓ (WiFi / MQTT topic: display/text)
			
 
				+[ESP32-WROOM or S3]
			
 
				+  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
			
 
				+```
			
 
				+
			
 
				+## PC Environment
			
 
				+
			
 
				+- OS: Windows 10/11
			
 
				+- GPU: NVIDIA RTX series (tested with RTX 4070 Super)
			
 
				+- Python: 3.11+
			
 
				+- MQTT broker: Mosquitto (localhost:1883)
			
 
				+- Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`)
			
 
				+- Whisper WebSocket: `ws://localhost:8000/asr`
			
 
				+
			
 
				+## ESP32 Environment
			
 
				+
			
 
				+- Board: ESP32-WROOM-32 or ESP32-S3
			
 
				+- Framework: Arduino (via PlatformIO)
			
 
				+- Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
			
 
				+- Display library: GxEPD2
			
 
				+- MQTT library: PubSubClient
			
 
				+- Build tool: PlatformIO (VSCode)
			
 
				+
			
 
				+### SPI Wiring (Waveshare 7.5" V2 to ESP32)
			
 
				+
			
 
				+| Display Pin | ESP32 Pin |
			
 
				+|---|---|
			
 
				+| BUSY | GPIO 4 |
			
 
				+| RST | GPIO 16 |
			
 
				+| DC | GPIO 17 |
			
 
				+| CS | GPIO 5 |
			
 
				+| CLK | GPIO 18 |
			
 
				+| DIN | GPIO 23 |
			
 
				+| GND | GND |
			
 
				+| VCC | 3.3V |
			
 
				+
			
 
				+## MQTT Topics
			
 
				+
			
 
				+| Topic | Direction | Payload |
			
 
				+|---|---|---|
			
 
				+| `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` |
			
 
				+| `display/clear` | PC → ESP32 | Empty / any |
			
 
				+| `display/status` | ESP32 → PC | JSON: `{"ready": true}` |
			
 
				+
			
 
				+## Key Files
			
 
				+
			
 
				+- `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
			
 
				+- `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
			
 
				+- `esp32/platformio.ini` — Board and library config.
			
 
				+
			
 
				+## Design Constraints & Decisions
			
 
				+
			
 
				+### Refresh Strategy
			
 
				+- Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
			
 
				+- Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
			
 
				+- **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update.
			
 
				+- Display shows 3 lines of text. New text pushes old text up; oldest line drops off.
			
 
				+
			
 
				+### Text Formatting
			
 
				+- Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
			
 
				+- At ~800px wide with a large font: approximately 35–45 characters per line
			
 
				+- Lines wrap at word boundaries
			
 
				+- All caps optional for readability (configurable)
			
 
				+
			
 
				+### Audio Input
			
 
				+- Preferred: direct feed from church mixing desk (line-in or USB audio interface)
			
 
				+- Fallback: USB condenser microphone near pulpit/lectern
			
 
				+- Whisper performs best with clean, low-noise input
			
 
				+- VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically
			
 
				+
			
 
				+### Network
			
 
				+- All on local WiFi (church LAN or dedicated hotspot)
			
 
				+- MQTT broker on Windows PC
			
 
				+- ESP32 connects to same WiFi network
			
 
				+- Static IP recommended for ESP32 to avoid reconnection delays
			
 
				+
			
 
				+## Bridge Script Logic (bridge.py)
			
 
				+
			
 
				+```
			
 
				+1. Connect to Mosquitto MQTT broker
			
 
				+2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
			
 
				+3. Receive partial transcription updates
			
 
				+4. Accumulate words into a sentence buffer
			
 
				+5. On sentence-end signal (or timeout):
			
 
				+   a. Word-wrap text into lines (max ~40 chars each)
			
 
				+   b. Maintain a rolling 3-line buffer
			
 
				+   c. Publish JSON payload to MQTT topic display/text
			
 
				+6. On reconnect events: re-establish WS and MQTT connections
			
 
				+```
			
 
				+
			
 
				+## Known Issues / Open Questions
			
 
				+
			
 
				+- [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
			
 
				+- [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
			
 
				+- [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
			
 
				+- [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
			
 
				+- [ ] WiFi reconnection handling in firmware — need watchdog/retry logic
			
 
				+
			
 
				+## Development Notes
			
 
				+
			
 
				+- WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields
			
 
				+- GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()`
			
 
				+- PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
			
 
				+- Use `client.setBufferSize(512)` in PubSubClient setup
			
 
				+
			
 
				+## Testing Approach
			
 
				+
			
 
				+1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000`
			
 
				+2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly
			
 
				+3. Test ESP32 display: send static MQTT messages manually before connecting bridge
			
 
				+4. End-to-end: full pipeline test with recorded sermon audio
			
 
				+5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback
			
--- a/README.md
+++ b/README.md
@@ -0,0 +1,94 @@
 
				+# Church Live Transcription Display
			
 
				+
			
 
				+A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.
			
 
				+
			
 
				+```
			
 
				+[Microphone / Mixer] → [Windows PC: Whisper transcription]
			
 
				+                              ↓ MQTT over WiFi
			
 
				+                       [ESP32 + e-ink display]
			
 
				+```
			
 
				+
			
 
				+## Goals
			
 
				+
			
 
				+- Real-time captions with minimal latency (target: < 3 seconds end-to-end)
			
 
				+- Runs entirely on local network — no cloud dependency
			
 
				+- Readable at distance with large font (36–48pt equivalent)
			
 
				+- Displays 3–4 lines of rolling text, clearing as new content arrives
			
 
				+- Low cost, low complexity hardware
			
 
				+
			
 
				+## System Components
			
 
				+
			
 
				+### PC Side (Windows)
			
 
				+- **WhisperLiveKit** — local GPU-accelerated speech-to-text server with WebSocket output
			
 
				+- **Mosquitto** — lightweight MQTT broker running on the same PC
			
 
				+- **Python bridge script** — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT
			
 
				+
			
 
				+### ESP32 Side
			
 
				+- **ESP32 (WROOM or S3)** — WiFi-enabled microcontroller
			
 
				+- **Waveshare e-ink display** — 7.5" V2 (800×480) or larger
			
 
				+- **GxEPD2 / Adafruit GFX** — display driver library
			
 
				+- **PubSubClient** — MQTT client library for Arduino
			
 
				+
			
 
				+## Hardware
			
 
				+
			
 
				+| Component | Model | Notes |
			
 
				+|---|---|---|
			
 
				+| Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM |
			
 
				+| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
			
 
				+| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper |
			
 
				+| Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio |
			
 
				+
			
 
				+## Key Design Decisions
			
 
				+
			
 
				+### Text Buffering Strategy
			
 
				+E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.
			
 
				+
			
 
				+### Display Layout
			
 
				+- 3–4 lines of large text
			
 
				+- Most recent line at bottom, scrolling upward
			
 
				+- Simple black-on-white, no graphics
			
 
				+- Font size prioritises readability at 3–5 metres
			
 
				+
			
 
				+### Network
			
 
				+- All traffic stays on local WiFi network
			
 
				+- MQTT broker on PC (port 1883)
			
 
				+- No internet required during operation
			
 
				+
			
 
				+## Repository Structure
			
 
				+
			
 
				+```
			
 
				+/
			
 
				+├── README.md               — This file
			
 
				+├── CLAUDE.md               — AI assistant context for development sessions
			
 
				+├── bridge/
			
 
				+│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
			
 
				+├── esp32/
			
 
				+│   ├── src/
			
 
				+│   │   └── main.cpp        — ESP32 Arduino firmware
			
 
				+│   └── platformio.ini      — PlatformIO build config
			
 
				+└── docs/
			
 
				+    ├── hardware-wiring.md  — SPI pin connections for display
			
 
				+    └── setup.md            — Installation and configuration guide
			
 
				+```
			
 
				+
			
 
				+## Reference Projects
			
 
				+
			
 
				+- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — real-time Whisper server with WebSocket API
			
 
				+- [reriiasu/speech-to-text](https://github.com/reriiasu/speech-to-text) — faster-whisper with VAD and WebSocket output
			
 
				+- [denwilliams/mqtt-epaper](https://github.com/denwilliams/mqtt-epaper) — ESP32 e-paper display driven by MQTT JSON
			
 
				+- [cuci90/epaper_mqtt_esp32](https://github.com/cuci90/epaper_mqtt_esp32) — ESP32 Waveshare display MQTT template
			
 
				+
			
 
				+## Status
			
 
				+
			
 
				+🟡 **Planning / Research phase**
			
 
				+
			
 
				+- [x] Architecture defined
			
 
				+- [ ] Python bridge script
			
 
				+- [ ] ESP32 firmware
			
 
				+- [ ] Hardware wiring and test
			
 
				+- [ ] End-to-end integration test
			
 
				+- [ ] Church deployment trial