Church Live Transcription Display

A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.

Overview

Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.

[Microphone / Mixer] → [Windows PC: Whisper transcription]
                              ↓ MQTT over WiFi
                       [ESP32 + e-ink display]

Goals

Real-time captions with minimal latency (target: < 3 seconds end-to-end)
Runs entirely on local network — no cloud dependency
Readable at distance with large font (36–48pt equivalent)
Displays 3–4 lines of rolling text, clearing as new content arrives
Low cost, low complexity hardware

System Components

PC Side (Windows)

WhisperLiveKit — local GPU-accelerated speech-to-text server with WebSocket output
Mosquitto — lightweight MQTT broker running on the same PC
Python bridge script — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT

ESP32 Side

ESP32 (WROOM or S3) — WiFi-enabled microcontroller
Waveshare e-ink display — 7.5" V2 (800×480) or larger
GxEPD2 / Adafruit GFX — display driver library
PubSubClient — MQTT client library for Arduino

Hardware

Component	Model	Notes
Microcontroller	ESP32-WROOM-32 or ESP32-S3	S3 preferred for more RAM
Display	Waveshare 7.5" V2 e-Paper	800×480, supports partial refresh
PC	Windows 10/11 with NVIDIA GPU	RTX series recommended for real-time Whisper
Microphone	USB condenser or mixer feed	Direct mixer feed preferred for clean audio

Key Design Decisions

Text Buffering Strategy

E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.

Display Layout

3–4 lines of large text
Most recent line at bottom, scrolling upward
Simple black-on-white, no graphics
Font size prioritises readability at 3–5 metres

Network

All traffic stays on local WiFi network
MQTT broker on PC (port 1883)
No internet required during operation

Repository Structure

/
├── README.md               — This file
├── CLAUDE.md               — AI assistant context for development sessions
├── bridge/
│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
├── esp32/
│   ├── src/
│   │   └── main.cpp        — ESP32 Arduino firmware
│   └── platformio.ini      — PlatformIO build config
└── docs/
    ├── hardware-wiring.md  — SPI pin connections for display
    └── setup.md            — Installation and configuration guide

Reference Projects

WhisperLiveKit — real-time Whisper server with WebSocket API
reriiasu/speech-to-text — faster-whisper with VAD and WebSocket output
denwilliams/mqtt-epaper — ESP32 e-paper display driven by MQTT JSON
cuci90/epaper_mqtt_esp32 — ESP32 Waveshare display MQTT template

Status

🟡 Planning / Research phase

Architecture defined
Python bridge script
ESP32 firmware
Hardware wiring and test
End-to-end integration test
Church deployment trial

README.md 3.9 KB Историја Датотека