Ei kuvausta

Benjamin Harris 6d417b9941 first commit 1 kuukausi sitten
CLAUDE.md 6d417b9941 first commit 1 kuukausi sitten
README.md 6d417b9941 first commit 1 kuukausi sitten

README.md

Church Live Transcription Display

A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.

Overview

Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.

[Microphone / Mixer] → [Windows PC: Whisper transcription]
                              ↓ MQTT over WiFi
                       [ESP32 + e-ink display]

Goals

  • Real-time captions with minimal latency (target: < 3 seconds end-to-end)
  • Runs entirely on local network — no cloud dependency
  • Readable at distance with large font (36–48pt equivalent)
  • Displays 3–4 lines of rolling text, clearing as new content arrives
  • Low cost, low complexity hardware

System Components

PC Side (Windows)

  • WhisperLiveKit — local GPU-accelerated speech-to-text server with WebSocket output
  • Mosquitto — lightweight MQTT broker running on the same PC
  • Python bridge script — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT

ESP32 Side

  • ESP32 (WROOM or S3) — WiFi-enabled microcontroller
  • Waveshare e-ink display — 7.5" V2 (800×480) or larger
  • GxEPD2 / Adafruit GFX — display driver library
  • PubSubClient — MQTT client library for Arduino

Hardware

Component Model Notes
Microcontroller ESP32-WROOM-32 or ESP32-S3 S3 preferred for more RAM
Display Waveshare 7.5" V2 e-Paper 800×480, supports partial refresh
PC Windows 10/11 with NVIDIA GPU RTX series recommended for real-time Whisper
Microphone USB condenser or mixer feed Direct mixer feed preferred for clean audio

Key Design Decisions

Text Buffering Strategy

E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.

Display Layout

  • 3–4 lines of large text
  • Most recent line at bottom, scrolling upward
  • Simple black-on-white, no graphics
  • Font size prioritises readability at 3–5 metres

Network

  • All traffic stays on local WiFi network
  • MQTT broker on PC (port 1883)
  • No internet required during operation

Repository Structure

/
├── README.md               — This file
├── CLAUDE.md               — AI assistant context for development sessions
├── bridge/
│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
├── esp32/
│   ├── src/
│   │   └── main.cpp        — ESP32 Arduino firmware
│   └── platformio.ini      — PlatformIO build config
└── docs/
    ├── hardware-wiring.md  — SPI pin connections for display
    └── setup.md            — Installation and configuration guide

Reference Projects

Status

🟡 Planning / Research phase

  • Architecture defined
  • Python bridge script
  • ESP32 firmware
  • Hardware wiring and test
  • End-to-end integration test
  • Church deployment trial