# Church Live Transcription Display A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller. ## Overview Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment. ``` [Microphone / Mixer] → [Windows PC: Whisper transcription] ↓ MQTT over WiFi [ESP32 + e-ink display] ``` ## Goals - Real-time captions with minimal latency (target: < 3 seconds end-to-end) - Runs entirely on local network — no cloud dependency - Readable at distance with large font (36–48pt equivalent) - Displays 3–4 lines of rolling text, clearing as new content arrives - Low cost, low complexity hardware ## System Components ### PC Side (Windows) - **WhisperLiveKit** — local GPU-accelerated speech-to-text server with WebSocket output - **Mosquitto** — lightweight MQTT broker running on the same PC - **Python bridge script** — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT ### ESP32 Side - **ESP32 (WROOM or S3)** — WiFi-enabled microcontroller - **Waveshare e-ink display** — 7.5" V2 (800×480) or larger - **GxEPD2 / Adafruit GFX** — display driver library - **PubSubClient** — MQTT client library for Arduino ## Hardware | Component | Model | Notes | |---|---|---| | Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM | | Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh | | PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper | | Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio | ## Key Design Decisions ### Text Buffering Strategy E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates. ### Display Layout - 3–4 lines of large text - Most recent line at bottom, scrolling upward - Simple black-on-white, no graphics - Font size prioritises readability at 3–5 metres ### Network - All traffic stays on local WiFi network - MQTT broker on PC (port 1883) - No internet required during operation ## Repository Structure ``` / ├── README.md — This file ├── CLAUDE.md — AI assistant context for development sessions ├── bridge/ │ └── bridge.py — Python: Whisper WebSocket → MQTT publisher ├── esp32/ │ ├── src/ │ │ └── main.cpp — ESP32 Arduino firmware │ └── platformio.ini — PlatformIO build config └── docs/ ├── hardware-wiring.md — SPI pin connections for display └── setup.md — Installation and configuration guide ``` ## Reference Projects - [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — real-time Whisper server with WebSocket API - [reriiasu/speech-to-text](https://github.com/reriiasu/speech-to-text) — faster-whisper with VAD and WebSocket output - [denwilliams/mqtt-epaper](https://github.com/denwilliams/mqtt-epaper) — ESP32 e-paper display driven by MQTT JSON - [cuci90/epaper_mqtt_esp32](https://github.com/cuci90/epaper_mqtt_esp32) — ESP32 Waveshare display MQTT template ## Status 🟡 **Planning / Research phase** - [x] Architecture defined - [ ] Python bridge script - [ ] ESP32 firmware - [ ] Hardware wiring and test - [ ] End-to-end integration test - [ ] Church deployment trial