This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.
A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), performs real-time speaker diarization, maps anonymous speaker IDs to real names, and sends speaker-tagged rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.
[Audio source]
↓ (USB mic or mixer line-in)
[Windows PC]
├── WhisperLiveKit
│ ├── Whisper large-v3 (transcription)
│ └── Streaming Sortformer (real-time speaker diarization)
│ WebSocket output: ws://localhost:8000/asr
│
├── Mosquitto MQTT broker (port 1883)
│
├── bridge.py
│ ├── Subscribes to Whisper WebSocket
│ ├── Receives: {text, speaker_id, is_final, ...}
│ ├── Resolves speaker_id → name via speaker_registry
│ ├── Buffers text to sentence boundary
│ └── Publishes JSON payload to MQTT topic display/text
│
└── admin_ui.py (Tkinter)
├── Shows "New speaker detected" prompts
├── Operator types name once per unknown speaker
└── Updates speaker_registry in real time
↓ WiFi / MQTT
[ESP32-S3]
└── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
--diarization flag
whisperlivekit-server --model large-v3 --diarization --language enws://localhost:8000/asrpip install whisperlivekit[diarization-sortformer]pyannote/segmentationpyannote/segmentation-3.0pyannote/embeddinghuggingface-cli loginclient.setBufferSize(512))| Display Pin | ESP32 Pin |
|---|---|
| BUSY | GPIO 4 |
| RST | GPIO 16 |
| DC | GPIO 17 |
| CS | GPIO 5 |
| CLK | GPIO 18 |
| DIN | GPIO 23 |
| GND | GND |
| VCC | 3.3V |
| Topic | Direction | Payload |
|---|---|---|
display/text |
PC → ESP32 | JSON: see payload schema below |
display/clear |
PC → ESP32 | Empty / any value |
display/status |
ESP32 → PC | JSON: {"ready": true} |
{
"speaker": "PASTOR JOHN",
"speaker_changed": true,
"lines": [
"...and He said unto them, go",
"into all the world and preach"
]
}
speaker: resolved name string, or null if unknown/unnamedspeaker_changed: true triggers full display refresh + speaker header redrawlines: array of pre-wrapped strings, max 40 chars each, max 3 itemsbridge/bridge.pyMain orchestrator. Connects to Whisper WebSocket and Mosquitto. Receives incremental diarized transcription. Buffers text. Resolves speaker names. Publishes MQTT payloads.
WebSocket message fields from WhisperLiveKit (with diarization):
{
"text": "and He said unto them",
"speaker": "SPEAKER_0",
"is_final": true,
"start": 12.4,
"end": 15.1
}
Bridge logic:
is_final segment, extract text and speakerspeaker → name via speaker_registryadmin_ui (via queue or callback)speaker_changed: true if speaker differs from last published segmentbridge/speaker_registry.pyManages the session-persistent mapping of SPEAKER_N IDs to real names.
# Core interface
registry = SpeakerRegistry()
registry.assign(speaker_id="SPEAKER_0", name="Pastor John")
name = registry.resolve("SPEAKER_0") # Returns "Pastor John" or None
registry.is_known("SPEAKER_1") # Returns False
registry.save_session() # Persist to JSON for the session
bridge/sessions/YYYY-MM-DD.jsonbridge/admin_ui.pyLightweight Tkinter window. Runs in a separate thread alongside bridge.py.
Behaviour:
SPEAKER_N is detected, shows a prompt: "New speaker detected. Who is this?"registry.assign() and the display updates immediatelyesp32/src/main.cppESP32 firmware. WiFi + MQTT client. Receives JSON payloads and renders to e-ink.
Display rendering logic:
speaker_changed: true: full refresh, print speaker name in large CAPS on line 1, then print text lines belowspeaker_changed: false: partial refresh, overwrite text lines only (speaker header stays)┌────────────────────────────────────────────────┐ ← full width
│ PASTOR JOHN │ ← speaker name, top ~80px, bold/large
│────────────────────────────────────────────────│
│ ...and He said unto them, go into all the │ ← text line 1
│ world and preach the gospel to every │ ← text line 2
│ creature. He that believeth and is baptised │ ← text line 3
└────────────────────────────────────────────────┘
SPEAKER_N appearsSpeakerEmbedding pipelinebridge/profiles/<name>.npySPEAKER_N embedding to stored profileshttp://localhost:8000SPEAKER_0 / SPEAKER_1 labels in WS outputmosquitto_sub -t display/#--diarization flag, confirm WS output includes speaker labelsbridge.py (transcription only, no diarization yet) → verify MQTT publish worksspeaker_registry.py and admin_ui.py → test name mapping loopspeaker_changed logic