A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.
Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.
[Microphone / Mixer] → [Windows PC: Whisper transcription]
↓ MQTT over WiFi
[ESP32 + e-ink display]
| Component | Model | Notes |
|---|---|---|
| Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM |
| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper |
| Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio |
E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.
/
├── README.md — This file
├── CLAUDE.md — AI assistant context for development sessions
├── bridge/
│ └── bridge.py — Python: Whisper WebSocket → MQTT publisher
├── esp32/
│ ├── src/
│ │ └── main.cpp — ESP32 Arduino firmware
│ └── platformio.ini — PlatformIO build config
└── docs/
├── hardware-wiring.md — SPI pin connections for display
└── setup.md — Installation and configuration guide
🟡 Planning / Research phase