Sen descrición

Benjamin Harris 2a98de29d5 System Sound		hai 1 mes
.claude	232411b4e7 Further Accuracy Improvments	hai 1 mes
bridge	2a98de29d5 System Sound	hai 1 mes
docs	ee90055248 Inital Hardare Commit	hai 1 mes
esp32	ee90055248 Inital Hardare Commit	hai 1 mes
.gitignore	7da8915ce6 Updates	hai 1 mes
CLAUDE.md	950b800388 CUDA Toolkit update	hai 1 mes
Embedding.md	980e1df655 Embedding.md	hai 1 mes
README.md	09f1d397c4 Initial Display Version	hai 1 mes
SETUP.md	950b800388 CUDA Toolkit update	hai 1 mes
install.bat	7da8915ce6 Updates	hai 1 mes
start.bat	aaea386eee Fix Dependencies	hai 1 mes

Church Live Transcription Display

A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions with speaker identification on any tablet, screen, or browser-capable device.

Overview

Audio from the church service is captured on a Windows PC, transcribed and speaker-diarized locally using WhisperLiveKit, and the resulting speaker-tagged text is served to a fullscreen browser page. Any tablet, TV, or device with a browser on the same WiFi can act as the display — no custom hardware required.

[Microphone / Mixer]
        ↓
[Windows PC]
  ├── WhisperLiveKit  (transcription + speaker diarization)
  ├── Mosquitto MQTT broker (internal message bus)
  ├── bridge.py       (WebSocket → name mapping → MQTT)
  └── admin.py        (speaker management + fullscreen display page)
        ↓ WiFi (browser)
[Tablet / TV / Any device with a browser]
  └── http://[PC-IP]:8001/display   ← fullscreen display page

Multiple display devices can be open simultaneously — useful for front-of-church and hearing-loop desk simultaneously.

Goals

Real-time captions with minimal latency (target: < 3 seconds end-to-end)
Speaker identification — display who is speaking when the speaker changes
Named speakers — operator maps anonymous speaker IDs to real names during service
Future: voice enrolment so names are matched automatically from pre-recorded samples
Runs entirely on local network — no cloud dependency
Readable at distance — large, auto-scaling font
No custom hardware — any spare tablet or screen works as the display

Speaker Identification

How It Works

WhisperLiveKit includes built-in real-time speaker diarization via diart (pyannote.audio). It runs alongside Whisper transcription and tags each segment of speech with an anonymous speaker label (SPEAKER_00, SPEAKER_01, etc.).

A name mapping layer in the bridge script translates these labels into real names, which are then pushed to the display page.

Display Format

When a speaker changes, their name is shown as a header line above their words.

┌─────────────────────────────────────────┐
│ PASTOR JOHN                             │
│                                         │
│ ...and He said unto them, go into       │
│ all the world and preach the gospel     │
│ to every creature.                      │
└─────────────────────────────────────────┘

  (speaker changes)

┌─────────────────────────────────────────┐
│ MARY (READER)                           │
│                                         │
│ A reading from Luke chapter 4,          │
│ verse 18...                             │
└─────────────────────────────────────────┘

Speaker Naming — Two Approaches

v1 — Operator-Assisted Naming (implemented)

The speaker admin page at http://[PC-IP]:8001 shows all detected speakers. When a new SPEAKER_XX appears, the operator types the name once. That name is stored persistently and used for every future session.

No setup required before the service
Works from the very first Sunday
Operator (e.g. sound desk volunteer) assigns names as speakers appear

v2 — Voice Enrolment (planned)

Before the service, a short voice sample (10–30 seconds) is uploaded for each expected speaker via the admin page. The bridge compares incoming speaker embeddings against enrolled voices and automatically assigns the correct name without operator input.

No operator intervention during the service
More accurate for recurring speakers (pastor, regular readers)
Enrolled voice profiles persist week to week

System Components

PC Side (Windows)

WhisperLiveKit — local GPU-accelerated transcription + diarization server (port 8000)
Mosquitto — lightweight MQTT broker (internal message bus, port 1883)
bridge.py — WebSocket subscriber, speaker name mapper, MQTT publisher
admin.py — web server (port 8001) providing:
- Speaker name management (40–50 speaker table)
- Voice sample upload and playback
- Test recording playback for offline pipeline testing
- /display — fullscreen display page for tablets and screens

Display Side

Any device with a browser — tablet, phone, Smart TV, laptop, HDMI monitor + PC
Navigate to http://[PC-IP]:8001/display and go fullscreen (F11)
Font and layout scale automatically to the screen size
Multiple display devices can be open simultaneously

Hardware

Component	Notes
Windows PC with NVIDIA GPU	RTX series recommended; RTX 4070 Super tested
Microphone or mixer line-in	USB condenser or direct mixer feed; mixer preferred for clean diarization
Display device	Any tablet, TV, or spare PC with a browser on the same WiFi

No microcontroller, no custom firmware, no hardware assembly required.

Key Design Decisions

Text & Speaker Buffering

The bridge script accumulates text until a sentence boundary or natural pause (~4s), then checks whether the speaker has changed. On speaker change, a full new payload is pushed to the display including the speaker name header. Otherwise only the new text lines are pushed.

Display Layout

Speaker name — shown in CAPS at the top, updated only when the speaker changes
Rolling text — 3 lines of word-wrapped transcription text below
Font scales to the display device's screen size
No refresh artifacts — instant update (unlike e-ink)

Network

All traffic on local WiFi (church LAN or dedicated hotspot)
MQTT broker on Windows PC (port 1883, internal use)
The display page connects to the PC's admin server (port 8001) via the local network
PC static IP recommended to avoid having to update the URL on tablets

Repository Structure

/
├── README.md                     — This file
├── CLAUDE.md                     — AI assistant context for development sessions
├── SETUP.md                      — Installation and configuration guide
├── install.bat                   — One-time setup script
├── start.bat                     — Launch script (double-click to start)
└── bridge/
    ├── bridge.py                 — Main bridge: Whisper WS → name map → MQTT
    ├── admin.py                  — Speaker admin + display web server (port 8001)
    ├── whisper_launcher.py       — WhisperLiveKit startup wrapper (diart patch)
    ├── requirements.txt          — Python dependencies
    ├── speakers.json             — Persistent speaker name mappings (auto-created)
    ├── recordings/               — Per-speaker voice samples (auto-created)
    └── test_recordings/          — Full-service recordings for pipeline testing

Reference Projects

WhisperLiveKit — real-time Whisper + speaker diarization
pyannote.audio / diart — streaming speaker diarization
NVIDIA Streaming Sortformer — alternative diarization backend

Status

🟡 In development — core pipeline functional

Architecture defined
WhisperLiveKit + diart diarization working
bridge.py — transcription → MQTT pipeline
admin.py — web speaker management (40–50 speakers)
Persistent speaker name storage (speakers.json)
Per-speaker voice sample upload
Test recording playback for offline pipeline testing
install.bat / start.bat — double-click operation
/display fullscreen browser display page
SSE push from admin.py to display page (/api/display/stream)
Voice enrolment v2 (auto name matching from voice samples)
Church deployment trial