|
|
hai 1 mes | |
|---|---|---|
| .claude | hai 1 mes | |
| bridge | hai 1 mes | |
| docs | hai 1 mes | |
| esp32 | hai 1 mes | |
| .gitignore | hai 1 mes | |
| CLAUDE.md | hai 1 mes | |
| Embedding.md | hai 1 mes | |
| README.md | hai 1 mes | |
| SETUP.md | hai 1 mes | |
| install.bat | hai 1 mes | |
| start.bat | hai 1 mes |
A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions with speaker identification on any tablet, screen, or browser-capable device.
Audio from the church service is captured on a Windows PC, transcribed and speaker-diarized locally using WhisperLiveKit, and the resulting speaker-tagged text is served to a fullscreen browser page. Any tablet, TV, or device with a browser on the same WiFi can act as the display — no custom hardware required.
[Microphone / Mixer]
↓
[Windows PC]
├── WhisperLiveKit (transcription + speaker diarization)
├── Mosquitto MQTT broker (internal message bus)
├── bridge.py (WebSocket → name mapping → MQTT)
└── admin.py (speaker management + fullscreen display page)
↓ WiFi (browser)
[Tablet / TV / Any device with a browser]
└── http://[PC-IP]:8001/display ← fullscreen display page
Multiple display devices can be open simultaneously — useful for front-of-church and hearing-loop desk simultaneously.
WhisperLiveKit includes built-in real-time speaker diarization via diart (pyannote.audio). It runs alongside Whisper transcription and tags each segment of speech with an anonymous speaker label (SPEAKER_00, SPEAKER_01, etc.).
A name mapping layer in the bridge script translates these labels into real names, which are then pushed to the display page.
When a speaker changes, their name is shown as a header line above their words.
┌─────────────────────────────────────────┐
│ PASTOR JOHN │
│ │
│ ...and He said unto them, go into │
│ all the world and preach the gospel │
│ to every creature. │
└─────────────────────────────────────────┘
(speaker changes)
┌─────────────────────────────────────────┐
│ MARY (READER) │
│ │
│ A reading from Luke chapter 4, │
│ verse 18... │
└─────────────────────────────────────────┘
The speaker admin page at http://[PC-IP]:8001 shows all detected speakers. When a new SPEAKER_XX appears, the operator types the name once. That name is stored persistently and used for every future session.
Before the service, a short voice sample (10–30 seconds) is uploaded for each expected speaker via the admin page. The bridge compares incoming speaker embeddings against enrolled voices and automatically assigns the correct name without operator input.
/display — fullscreen display page for tablets and screenshttp://[PC-IP]:8001/display and go fullscreen (F11)| Component | Notes |
|---|---|
| Windows PC with NVIDIA GPU | RTX series recommended; RTX 4070 Super tested |
| Microphone or mixer line-in | USB condenser or direct mixer feed; mixer preferred for clean diarization |
| Display device | Any tablet, TV, or spare PC with a browser on the same WiFi |
No microcontroller, no custom firmware, no hardware assembly required.
The bridge script accumulates text until a sentence boundary or natural pause (~4s), then checks whether the speaker has changed. On speaker change, a full new payload is pushed to the display including the speaker name header. Otherwise only the new text lines are pushed.
/
├── README.md — This file
├── CLAUDE.md — AI assistant context for development sessions
├── SETUP.md — Installation and configuration guide
├── install.bat — One-time setup script
├── start.bat — Launch script (double-click to start)
└── bridge/
├── bridge.py — Main bridge: Whisper WS → name map → MQTT
├── admin.py — Speaker admin + display web server (port 8001)
├── whisper_launcher.py — WhisperLiveKit startup wrapper (diart patch)
├── requirements.txt — Python dependencies
├── speakers.json — Persistent speaker name mappings (auto-created)
├── recordings/ — Per-speaker voice samples (auto-created)
└── test_recordings/ — Full-service recordings for pipeline testing
🟡 In development — core pipeline functional
/display fullscreen browser display page/api/display/stream)