1 mese fa · 6d417b9941
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,129 @@
 
															+# CLAUDE.md — AI Development Context
														
 
															+
														
 
															+This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.
														
 
															+
														
 
															+## Project Summary
														
 
															+
														
 
															+A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.
														
 
															+
														
 
															+## Architecture
														
 
															+
														
 
															+```
														
 
															+[Audio source]
														
 
															+     ↓ (USB mic or mixer line-in)
														
 
															+[Windows PC]
														
 
															+  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
														
 
															+  ├── Mosquitto MQTT broker (port 1883)
														
 
															+  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
														
 
															+     ↓ (WiFi / MQTT topic: display/text)
														
 
															+[ESP32-WROOM or S3]
														
 
															+  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
														
 
															+```
														
 
															+
														
 
															+## PC Environment
														
 
															+
														
 
															+- OS: Windows 10/11
														
 
															+- GPU: NVIDIA RTX series (tested with RTX 4070 Super)
														
 
															+- Python: 3.11+
														
 
															+- MQTT broker: Mosquitto (localhost:1883)
														
 
															+- Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`)
														
 
															+- Whisper WebSocket: `ws://localhost:8000/asr`
														
 
															+
														
 
															+## ESP32 Environment
														
 
															+
														
 
															+- Board: ESP32-WROOM-32 or ESP32-S3
														
 
															+- Framework: Arduino (via PlatformIO)
														
 
															+- Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
														
 
															+- Display library: GxEPD2
														
 
															+- MQTT library: PubSubClient
														
 
															+- Build tool: PlatformIO (VSCode)
														
 
															+
														
 
															+### SPI Wiring (Waveshare 7.5" V2 to ESP32)
														
 
															+
														
 
															+| Display Pin | ESP32 Pin |
														
 
															+|---|---|
														
 
															+| BUSY | GPIO 4 |
														
 
															+| RST | GPIO 16 |
														
 
															+| DC | GPIO 17 |
														
 
															+| CS | GPIO 5 |
														
 
															+| CLK | GPIO 18 |
														
 
															+| DIN | GPIO 23 |
														
 
															+| GND | GND |
														
 
															+| VCC | 3.3V |
														
 
															+
														
 
															+## MQTT Topics
														
 
															+
														
 
															+| Topic | Direction | Payload |
														
 
															+|---|---|---|
														
 
															+| `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` |
														
 
															+| `display/clear` | PC → ESP32 | Empty / any |
														
 
															+| `display/status` | ESP32 → PC | JSON: `{"ready": true}` |
														
 
															+
														
 
															+## Key Files
														
 
															+
														
 
															+- `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
														
 
															+- `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
														
 
															+- `esp32/platformio.ini` — Board and library config.
														
 
															+
														
 
															+## Design Constraints & Decisions
														
 
															+
														
 
															+### Refresh Strategy
														
 
															+- Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
														
 
															+- Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
														
 
															+- **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update.
														
 
															+- Display shows 3 lines of text. New text pushes old text up; oldest line drops off.
														
 
															+
														
 
															+### Text Formatting
														
 
															+- Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
														
 
															+- At ~800px wide with a large font: approximately 35–45 characters per line
														
 
															+- Lines wrap at word boundaries
														
 
															+- All caps optional for readability (configurable)
														
 
															+
														
 
															+### Audio Input
														
 
															+- Preferred: direct feed from church mixing desk (line-in or USB audio interface)
														
 
															+- Fallback: USB condenser microphone near pulpit/lectern
														
 
															+- Whisper performs best with clean, low-noise input
														
 
															+- VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically
														
 
															+
														
 
															+### Network
														
 
															+- All on local WiFi (church LAN or dedicated hotspot)
														
 
															+- MQTT broker on Windows PC
														
 
															+- ESP32 connects to same WiFi network
														
 
															+- Static IP recommended for ESP32 to avoid reconnection delays
														
 
															+
														
 
															+## Bridge Script Logic (bridge.py)
														
 
															+
														
 
															+```
														
 
															+1. Connect to Mosquitto MQTT broker
														
 
															+2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
														
 
															+3. Receive partial transcription updates
														
 
															+4. Accumulate words into a sentence buffer
														
 
															+5. On sentence-end signal (or timeout):
														
 
															+   a. Word-wrap text into lines (max ~40 chars each)
														
 
															+   b. Maintain a rolling 3-line buffer
														
 
															+   c. Publish JSON payload to MQTT topic display/text
														
 
															+6. On reconnect events: re-establish WS and MQTT connections
														
 
															+```
														
 
															+
														
 
															+## Known Issues / Open Questions
														
 
															+
														
 
															+- [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
														
 
															+- [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
														
 
															+- [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
														
 
															+- [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
														
 
															+- [ ] WiFi reconnection handling in firmware — need watchdog/retry logic
														
 
															+
														
 
															+## Development Notes
														
 
															+
														
 
															+- WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields
														
 
															+- GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()`
														
 
															+- PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
														
 
															+- Use `client.setBufferSize(512)` in PubSubClient setup
														
 
															+
														
 
															+## Testing Approach
														
 
															+
														
 
															+1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000`
														
 
															+2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly
														
 
															+3. Test ESP32 display: send static MQTT messages manually before connecting bridge
														
 
															+4. End-to-end: full pipeline test with recorded sermon audio
														
 
															+5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback
														
--- a/README.md
+++ b/README.md
@@ -0,0 +1,94 @@
 
															+# Church Live Transcription Display
														
 
															+
														
 
															+A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.
														
 
															+
														
 
															+```
														
 
															+[Microphone / Mixer] → [Windows PC: Whisper transcription]
														
 
															+                              ↓ MQTT over WiFi
														
 
															+                       [ESP32 + e-ink display]
														
 
															+```
														
 
															+
														
 
															+## Goals
														
 
															+
														
 
															+- Real-time captions with minimal latency (target: < 3 seconds end-to-end)
														
 
															+- Runs entirely on local network — no cloud dependency
														
 
															+- Readable at distance with large font (36–48pt equivalent)
														
 
															+- Displays 3–4 lines of rolling text, clearing as new content arrives
														
 
															+- Low cost, low complexity hardware
														
 
															+
														
 
															+## System Components
														
 
															+
														
 
															+### PC Side (Windows)
														
 
															+- **WhisperLiveKit** — local GPU-accelerated speech-to-text server with WebSocket output
														
 
															+- **Mosquitto** — lightweight MQTT broker running on the same PC
														
 
															+- **Python bridge script** — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT
														
 
															+
														
 
															+### ESP32 Side
														
 
															+- **ESP32 (WROOM or S3)** — WiFi-enabled microcontroller
														
 
															+- **Waveshare e-ink display** — 7.5" V2 (800×480) or larger
														
 
															+- **GxEPD2 / Adafruit GFX** — display driver library
														
 
															+- **PubSubClient** — MQTT client library for Arduino
														
 
															+
														
 
															+## Hardware
														
 
															+
														
 
															+| Component | Model | Notes |
														
 
															+|---|---|---|
														
 
															+| Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM |
														
 
															+| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
														
 
															+| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper |
														
 
															+| Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio |
														
 
															+
														
 
															+## Key Design Decisions
														
 
															+
														
 
															+### Text Buffering Strategy
														
 
															+E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.
														
 
															+
														
 
															+### Display Layout
														
 
															+- 3–4 lines of large text
														
 
															+- Most recent line at bottom, scrolling upward
														
 
															+- Simple black-on-white, no graphics
														
 
															+- Font size prioritises readability at 3–5 metres
														
 
															+
														
 
															+### Network
														
 
															+- All traffic stays on local WiFi network
														
 
															+- MQTT broker on PC (port 1883)
														
 
															+- No internet required during operation
														
 
															+
														
 
															+## Repository Structure
														
 
															+
														
 
															+```
														
 
															+/
														
 
															+├── README.md               — This file
														
 
															+├── CLAUDE.md               — AI assistant context for development sessions
														
 
															+├── bridge/
														
 
															+│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
														
 
															+├── esp32/
														
 
															+│   ├── src/
														
 
															+│   │   └── main.cpp        — ESP32 Arduino firmware
														
 
															+│   └── platformio.ini      — PlatformIO build config
														
 
															+└── docs/
														
 
															+    ├── hardware-wiring.md  — SPI pin connections for display
														
 
															+    └── setup.md            — Installation and configuration guide
														
 
															+```
														
 
															+
														
 
															+## Reference Projects
														
 
															+
														
 
															+- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — real-time Whisper server with WebSocket API
														
 
															+- [reriiasu/speech-to-text](https://github.com/reriiasu/speech-to-text) — faster-whisper with VAD and WebSocket output
														
 
															+- [denwilliams/mqtt-epaper](https://github.com/denwilliams/mqtt-epaper) — ESP32 e-paper display driven by MQTT JSON
														
 
															+- [cuci90/epaper_mqtt_esp32](https://github.com/cuci90/epaper_mqtt_esp32) — ESP32 Waveshare display MQTT template
														
 
															+
														
 
															+## Status
														
 
															+
														
 
															+🟡 **Planning / Research phase**
														
 
															+
														
 
															+- [x] Architecture defined
														
 
															+- [ ] Python bridge script
														
 
															+- [ ] ESP32 firmware
														
 
															+- [ ] Hardware wiring and test
														
 
															+- [ ] End-to-end integration test
														
 
															+- [ ] Church deployment trial