# Church Live Transcription Display

A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions on an e-ink screen driven by an ESP32 microcontroller.

## Overview

Audio from the church service is captured on a Windows PC, transcribed locally using a Whisper-based model, and the resulting text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display is readable in any lighting condition and requires no screen brightness — ideal for a church environment.

```
[Microphone / Mixer] → [Windows PC: Whisper transcription]
                              ↓ MQTT over WiFi
                       [ESP32 + e-ink display]
```

## Goals

- Real-time captions with minimal latency (target: < 3 seconds end-to-end)
- Runs entirely on local network — no cloud dependency
- Readable at distance with large font (36–48pt equivalent)
- Displays 3–4 lines of rolling text, clearing as new content arrives
- Low cost, low complexity hardware

## System Components

### PC Side (Windows)
- **WhisperLiveKit** — local GPU-accelerated speech-to-text server with WebSocket output
- **Mosquitto** — lightweight MQTT broker running on the same PC
- **Python bridge script** — subscribes to Whisper WebSocket, buffers sentences, publishes to MQTT

### ESP32 Side
- **ESP32 (WROOM or S3)** — WiFi-enabled microcontroller
- **Waveshare e-ink display** — 7.5" V2 (800×480) or larger
- **GxEPD2 / Adafruit GFX** — display driver library
- **PubSubClient** — MQTT client library for Arduino

## Hardware

| Component | Model | Notes |
|---|---|---|
| Microcontroller | ESP32-WROOM-32 or ESP32-S3 | S3 preferred for more RAM |
| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended for real-time Whisper |
| Microphone | USB condenser or mixer feed | Direct mixer feed preferred for clean audio |

## Key Design Decisions

### Text Buffering Strategy
E-ink full refresh takes ~1–2 seconds. Rather than updating word-by-word, the bridge script accumulates text until a natural pause (sentence boundary or ~5 seconds of speech), then pushes a complete "screen's worth" as a single MQTT message. Partial refresh mode can be used for faster but ghosting-prone updates.

### Display Layout
- 3–4 lines of large text
- Most recent line at bottom, scrolling upward
- Simple black-on-white, no graphics
- Font size prioritises readability at 3–5 metres

### Network
- All traffic stays on local WiFi network
- MQTT broker on PC (port 1883)
- No internet required during operation

## Repository Structure

```
/
├── README.md               — This file
├── CLAUDE.md               — AI assistant context for development sessions
├── bridge/
│   └── bridge.py           — Python: Whisper WebSocket → MQTT publisher
├── esp32/
│   ├── src/
│   │   └── main.cpp        — ESP32 Arduino firmware
│   └── platformio.ini      — PlatformIO build config
└── docs/
    ├── hardware-wiring.md  — SPI pin connections for display
    └── setup.md            — Installation and configuration guide
```

## Reference Projects

- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — real-time Whisper server with WebSocket API
- [reriiasu/speech-to-text](https://github.com/reriiasu/speech-to-text) — faster-whisper with VAD and WebSocket output
- [denwilliams/mqtt-epaper](https://github.com/denwilliams/mqtt-epaper) — ESP32 e-paper display driven by MQTT JSON
- [cuci90/epaper_mqtt_esp32](https://github.com/cuci90/epaper_mqtt_esp32) — ESP32 Waveshare display MQTT template

## Status

🟡 **Planning / Research phase**

- [x] Architecture defined
- [ ] Python bridge script
- [ ] ESP32 firmware
- [ ] Hardware wiring and test
- [ ] End-to-end integration test
- [ ] Church deployment trial