Benjamin Harris 1 ヶ月 前
コミット
6d417b9941
2 ファイル変更223 行追加0 行削除
  1. 129 0
      CLAUDE.md
  2. 94 0
      README.md

+ 129 - 0
CLAUDE.md

@@ -0,0 +1,129 @@
+# CLAUDE.md — AI Development Context
+
+This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.
+
+## Project Summary
+
+A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.
+
+## Architecture
+
+```
+[Audio source]
+     ↓ (USB mic or mixer line-in)
+[Windows PC]
+  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
+  ├── Mosquitto MQTT broker (port 1883)
+  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
+     ↓ (WiFi / MQTT topic: display/text)
+[ESP32-WROOM or S3]
+  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
+```
+
+## PC Environment
+
+- OS: Windows 10/11
+- GPU: NVIDIA RTX series (tested with RTX 4070 Super)
+- Python: 3.11+
+- MQTT broker: Mosquitto (localhost:1883)
+- Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`)
+- Whisper WebSocket: `ws://localhost:8000/asr`
+
+## ESP32 Environment
+
+- Board: ESP32-WROOM-32 or ESP32-S3
+- Framework: Arduino (via PlatformIO)
+- Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
+- Display library: GxEPD2
+- MQTT library: PubSubClient
+- Build tool: PlatformIO (VSCode)
+
+### SPI Wiring (Waveshare 7.5" V2 to ESP32)
+
+| Display Pin | ESP32 Pin |
+|---|---|
+| BUSY | GPIO 4 |
+| RST | GPIO 16 |
+| DC | GPIO 17 |
+| CS | GPIO 5 |
+| CLK | GPIO 18 |
+| DIN | GPIO 23 |
+| GND | GND |
+| VCC | 3.3V |
+
+## MQTT Topics
+
+| Topic | Direction | Payload |
+|---|---|---|
+| `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` |
+| `display/clear` | PC → ESP32 | Empty / any |
+| `display/status` | ESP32 → PC | JSON: `{"ready": true}` |
+
+## Key Files
+
+- `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
+- `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
+- `esp32/platformio.ini` — Board and library config.
+
+## Design Constraints & Decisions
+
+### Refresh Strategy
+- Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
+- Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
+- **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update.
+- Display shows 3 lines of text. New text pushes old text up; oldest line drops off.
+
+### Text Formatting
+- Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
+- At ~800px wide with a large font: approximately 35–45 characters per line
+- Lines wrap at word boundaries
+- All caps optional for readability (configurable)
+
+### Audio Input
+- Preferred: direct feed from church mixing desk (line-in or USB audio interface)
+- Fallback: USB condenser microphone near pulpit/lectern
+- Whisper performs best with clean, low-noise input
+- VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically
+
+### Network
+- All on local WiFi (church LAN or dedicated hotspot)
+- MQTT broker on Windows PC
+- ESP32 connects to same WiFi network
+- Static IP recommended for ESP32 to avoid reconnection delays
+
+## Bridge Script Logic (bridge.py)
+
+```
+1. Connect to Mosquitto MQTT broker
+2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
+3. Receive partial transcription updates
+4. Accumulate words into a sentence buffer
+5. On sentence-end signal (or timeout):
+   a. Word-wrap text into lines (max ~40 chars each)
+   b. Maintain a rolling 3-line buffer
+   c. Publish JSON payload to MQTT topic display/text
+6. On reconnect events: re-establish WS and MQTT connections
+```
+
+## Known Issues / Open Questions
+
+- [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
+- [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
+- [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
+- [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
+- [ ] WiFi reconnection handling in firmware — need watchdog/retry logic
+
+## Development Notes
+
+- WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields
+- GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()`
+- PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
+- Use `client.setBufferSize(512)` in PubSubClient setup
+
+## Testing Approach
+
+1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000`
+2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly
+3. Test ESP32 display: send static MQTT messages manually before connecting bridge
+4. End-to-end: full pipeline test with recorded sermon audio
+5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback

+ 94 - 0
README.md

@@ -0,0 +1,94 @@
+# Church Live Transcription Display
+
+A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.
+
+## Overview
+
+Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.
+
+```
+[Microphone / Mixer] → [Windows PC: Whisper transcription]
+                              ↓ MQTT over WiFi
+                       [ESP32 + e-ink display]
+```
+
+## Goals
+
+- Real-time captions with minimal latency (target: < 3 seconds end-to-end)
+- Runs entirely on local network — no cloud dependency
+- Readable at distance with large font (36–48pt equivalent)
+- Displays 3–4 lines of rolling text, clearing as new content arrives
+- Low cost, low complexity hardware
+
+## System Components
+
+### PC Side (Windows)
+- **WhisperLiveKit** — local GPU-accelerated speech-to-text server with WebSocket output
+- **Mosquitto** — lightweight MQTT broker running on the same PC
+- **Python bridge script** — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT
+
+### ESP32 Side
+- **ESP32 (WROOM or S3)** — WiFi-enabled microcontroller
+- **Waveshare e-ink display** — 7.5" V2 (800×480) or larger
+- **GxEPD2 / Adafruit GFX** — display driver library
+- **PubSubClient** — MQTT client library for Arduino
+
+## Hardware
+
+| Component | Model | Notes |
+|---|---|---|
+| Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM |
+| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
+| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper |
+| Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio |
+
+## Key Design Decisions
+
+### Text Buffering Strategy
+E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.
+
+### Display Layout
+- 3–4 lines of large text
+- Most recent line at bottom, scrolling upward
+- Simple black-on-white, no graphics
+- Font size prioritises readability at 3–5 metres
+
+### Network
+- All traffic stays on local WiFi network
+- MQTT broker on PC (port 1883)
+- No internet required during operation
+
+## Repository Structure
+
+```
+/
+├── README.md               — This file
+├── CLAUDE.md               — AI assistant context for development sessions
+├── bridge/
+│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
+├── esp32/
+│   ├── src/
+│   │   └── main.cpp        — ESP32 Arduino firmware
+│   └── platformio.ini      — PlatformIO build config
+└── docs/
+    ├── hardware-wiring.md  — SPI pin connections for display
+    └── setup.md            — Installation and configuration guide
+```
+
+## Reference Projects
+
+- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — real-time Whisper server with WebSocket API
+- [reriiasu/speech-to-text](https://github.com/reriiasu/speech-to-text) — faster-whisper with VAD and WebSocket output
+- [denwilliams/mqtt-epaper](https://github.com/denwilliams/mqtt-epaper) — ESP32 e-paper display driven by MQTT JSON
+- [cuci90/epaper_mqtt_esp32](https://github.com/cuci90/epaper_mqtt_esp32) — ESP32 Waveshare display MQTT template
+
+## Status
+
+🟡 **Planning / Research phase**
+
+- [x] Architecture defined
+- [ ] Python bridge script
+- [ ] ESP32 firmware
+- [ ] Hardware wiring and test
+- [ ] End-to-end integration test
+- [ ] Church deployment trial