|
|
1 kuukausi sitten | |
|---|---|---|
| bridge | 1 kuukausi sitten | |
| docs | 1 kuukausi sitten | |
| esp32 | 1 kuukausi sitten | |
| CLAUDE.md | 1 kuukausi sitten | |
| README.md | 1 kuukausi sitten |
A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions with speaker identification on an e-ink screen driven by an ESP32 microcontroller.
Audio from the church service is captured on a Windows PC, transcribed and speaker-diarized locally using WhisperLiveKit, and the resulting speaker-tagged text is pushed over WiFi/MQTT to an ESP32 that drives a large e-ink display. The display shows who is speaking alongside what they are saying, updates in real time, and requires no internet connection.
[Microphone / Mixer]
↓
[Windows PC]
├── WhisperLiveKit (transcription + speaker diarization)
├── Mosquitto MQTT broker
├── bridge.py (WebSocket → name mapping → MQTT)
└── Speaker Admin UI (operator names speakers live)
↓ MQTT / WiFi
[ESP32 + e-ink display]
WhisperLiveKit includes Streaming Sortformer (SOTA 2025), a real-time speaker diarization model developed by NVIDIA. It runs alongside Whisper transcription and tags each segment of speech with an anonymous speaker label (SPEAKER_0, SPEAKER_1, etc.).
A name mapping layer in the bridge script translates these anonymous labels into real names, which are then included in the MQTT payload sent to the display.
When a speaker changes, their name is shown as a header line above their words. The name is not repeated on every line — only when the speaker changes.
┌─────────────────────────────────┐
│ PASTOR JOHN │
│ ...and He said unto them, go │
│ into all the world and preach │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ MARY (READER) │
│ A reading from Luke chapter 4, │
│ verse 18... │
└─────────────────────────────────┘
v1 — Operator-Assisted Naming (implemented first)
A simple admin UI runs on the PC alongside the bridge script. When a new unknown speaker is detected, the operator sees a prompt ("New speaker detected — who is this?") and types the name once. That name is stored for the session and used every time that speaker is detected again.
v2 — Voice Enrolment (future upgrade)
Before the service, a short voice sample (10–30 seconds) is recorded for each expected speaker. The bridge script compares incoming speaker embeddings against enrolled voices and automatically assigns the correct name without operator input.
| Role | Frequency | Notes |
|---|---|---|
| Pastor / Preacher | Every service | Primary speaker, longest segments |
| Worship leader | Most services | May overlap with congregation response |
| Reader / Scripture | Weekly | Short, distinct segments |
| Visiting speaker | Occasionally | New enrolment or operator naming needed |
| Announcements | Weekly | Often the same person each week |
| Component | Model | Notes |
|---|---|---|
| Microcontroller | ESP32-S3 | PSRAM required for large font bitmaps |
| Display | Waveshare 7.5" V2 e-Paper | 800×480, supports partial refresh |
| PC | Windows 10/11 with NVIDIA GPU | RTX series recommended |
| Microphone | USB condenser or direct mixer feed | Mixer feed preferred for clean diarization |
The bridge script accumulates text until a sentence boundary or natural pause (~4s), then checks whether the speaker has changed. If the speaker is unchanged, only new text lines are pushed. If the speaker has changed, a full new payload is sent including the speaker name header, triggering a full display refresh.
/
├── README.md — This file
├── CLAUDE.md — AI assistant context for development sessions
├── bridge/
│ ├── bridge.py — Main bridge: Whisper WS → name map → MQTT
│ ├── speaker_registry.py — Speaker ID ↔ name mapping and voice enrolment
│ └── admin_ui.py — Operator UI for live speaker naming (Tkinter)
├── esp32/
│ ├── src/
│ │ └── main.cpp — ESP32 Arduino firmware
│ └── platformio.ini — PlatformIO build config
└── docs/
├── hardware-wiring.md — SPI pin connections for Waveshare display
├── setup.md — Installation and configuration guide
└── speaker-enrolment.md — Guide for recording and enrolling voice samples (v2)
🟡 Planning / Research phase
speaker_registry.py)admin_ui.py)