Sen descrición

Benjamin Harris 2a98de29d5 System Sound hai 1 mes
.claude 232411b4e7 Further Accuracy Improvments hai 1 mes
bridge 2a98de29d5 System Sound hai 1 mes
docs ee90055248 Inital Hardare Commit hai 1 mes
esp32 ee90055248 Inital Hardare Commit hai 1 mes
.gitignore 7da8915ce6 Updates hai 1 mes
CLAUDE.md 950b800388 CUDA Toolkit update hai 1 mes
Embedding.md 980e1df655 Embedding.md hai 1 mes
README.md 09f1d397c4 Initial Display Version hai 1 mes
SETUP.md 950b800388 CUDA Toolkit update hai 1 mes
install.bat 7da8915ce6 Updates hai 1 mes
start.bat aaea386eee Fix Dependencies hai 1 mes

README.md

Church Live Transcription Display

A live speech-to-text system for deaf and hard-of-hearing congregants, displaying real-time transcriptions with speaker identification on any tablet, screen, or browser-capable device.

Overview

Audio from the church service is captured on a Windows PC, transcribed and speaker-diarized locally using WhisperLiveKit, and the resulting speaker-tagged text is served to a fullscreen browser page. Any tablet, TV, or device with a browser on the same WiFi can act as the display — no custom hardware required.

[Microphone / Mixer]
        ↓
[Windows PC]
  ├── WhisperLiveKit  (transcription + speaker diarization)
  ├── Mosquitto MQTT broker (internal message bus)
  ├── bridge.py       (WebSocket → name mapping → MQTT)
  └── admin.py        (speaker management + fullscreen display page)
        ↓ WiFi (browser)
[Tablet / TV / Any device with a browser]
  └── http://[PC-IP]:8001/display   ← fullscreen display page

Multiple display devices can be open simultaneously — useful for front-of-church and hearing-loop desk simultaneously.

Goals

  • Real-time captions with minimal latency (target: < 3 seconds end-to-end)
  • Speaker identification — display who is speaking when the speaker changes
  • Named speakers — operator maps anonymous speaker IDs to real names during service
  • Future: voice enrolment so names are matched automatically from pre-recorded samples
  • Runs entirely on local network — no cloud dependency
  • Readable at distance — large, auto-scaling font
  • No custom hardware — any spare tablet or screen works as the display

Speaker Identification

How It Works

WhisperLiveKit includes built-in real-time speaker diarization via diart (pyannote.audio). It runs alongside Whisper transcription and tags each segment of speech with an anonymous speaker label (SPEAKER_00, SPEAKER_01, etc.).

A name mapping layer in the bridge script translates these labels into real names, which are then pushed to the display page.

Display Format

When a speaker changes, their name is shown as a header line above their words.

┌─────────────────────────────────────────┐
│ PASTOR JOHN                             │
│                                         │
│ ...and He said unto them, go into       │
│ all the world and preach the gospel     │
│ to every creature.                      │
└─────────────────────────────────────────┘

  (speaker changes)

┌─────────────────────────────────────────┐
│ MARY (READER)                           │
│                                         │
│ A reading from Luke chapter 4,          │
│ verse 18...                             │
└─────────────────────────────────────────┘

Speaker Naming — Two Approaches

v1 — Operator-Assisted Naming (implemented)

The speaker admin page at http://[PC-IP]:8001 shows all detected speakers. When a new SPEAKER_XX appears, the operator types the name once. That name is stored persistently and used for every future session.

  • No setup required before the service
  • Works from the very first Sunday
  • Operator (e.g. sound desk volunteer) assigns names as speakers appear

v2 — Voice Enrolment (planned)

Before the service, a short voice sample (10–30 seconds) is uploaded for each expected speaker via the admin page. The bridge compares incoming speaker embeddings against enrolled voices and automatically assigns the correct name without operator input.

  • No operator intervention during the service
  • More accurate for recurring speakers (pastor, regular readers)
  • Enrolled voice profiles persist week to week

System Components

PC Side (Windows)

  • WhisperLiveKit — local GPU-accelerated transcription + diarization server (port 8000)
  • Mosquitto — lightweight MQTT broker (internal message bus, port 1883)
  • bridge.py — WebSocket subscriber, speaker name mapper, MQTT publisher
  • admin.py — web server (port 8001) providing:
    • Speaker name management (40–50 speaker table)
    • Voice sample upload and playback
    • Test recording playback for offline pipeline testing
    • /display — fullscreen display page for tablets and screens

Display Side

  • Any device with a browser — tablet, phone, Smart TV, laptop, HDMI monitor + PC
  • Navigate to http://[PC-IP]:8001/display and go fullscreen (F11)
  • Font and layout scale automatically to the screen size
  • Multiple display devices can be open simultaneously

Hardware

Component Notes
Windows PC with NVIDIA GPU RTX series recommended; RTX 4070 Super tested
Microphone or mixer line-in USB condenser or direct mixer feed; mixer preferred for clean diarization
Display device Any tablet, TV, or spare PC with a browser on the same WiFi

No microcontroller, no custom firmware, no hardware assembly required.


Key Design Decisions

Text & Speaker Buffering

The bridge script accumulates text until a sentence boundary or natural pause (~4s), then checks whether the speaker has changed. On speaker change, a full new payload is pushed to the display including the speaker name header. Otherwise only the new text lines are pushed.

Display Layout

  • Speaker name — shown in CAPS at the top, updated only when the speaker changes
  • Rolling text — 3 lines of word-wrapped transcription text below
  • Font scales to the display device's screen size
  • No refresh artifacts — instant update (unlike e-ink)

Network

  • All traffic on local WiFi (church LAN or dedicated hotspot)
  • MQTT broker on Windows PC (port 1883, internal use)
  • The display page connects to the PC's admin server (port 8001) via the local network
  • PC static IP recommended to avoid having to update the URL on tablets

Repository Structure

/
├── README.md                     — This file
├── CLAUDE.md                     — AI assistant context for development sessions
├── SETUP.md                      — Installation and configuration guide
├── install.bat                   — One-time setup script
├── start.bat                     — Launch script (double-click to start)
└── bridge/
    ├── bridge.py                 — Main bridge: Whisper WS → name map → MQTT
    ├── admin.py                  — Speaker admin + display web server (port 8001)
    ├── whisper_launcher.py       — WhisperLiveKit startup wrapper (diart patch)
    ├── requirements.txt          — Python dependencies
    ├── speakers.json             — Persistent speaker name mappings (auto-created)
    ├── recordings/               — Per-speaker voice samples (auto-created)
    └── test_recordings/          — Full-service recordings for pipeline testing

Reference Projects


Status

🟡 In development — core pipeline functional

  • Architecture defined
  • WhisperLiveKit + diart diarization working
  • bridge.py — transcription → MQTT pipeline
  • admin.py — web speaker management (40–50 speakers)
  • Persistent speaker name storage (speakers.json)
  • Per-speaker voice sample upload
  • Test recording playback for offline pipeline testing
  • install.bat / start.bat — double-click operation
  • /display fullscreen browser display page
  • SSE push from admin.py to display page (/api/display/stream)
  • Voice enrolment v2 (auto name matching from voice samples)
  • Church deployment trial