1 月之前 · 980e1df655
--- a/Embedding.md
+++ b/Embedding.md
@@ -0,0 +1,626 @@
 
				+The README shows voice enrolment v2 is the next planned feature, and per-speaker voice sample upload already exists in admin.py. That's the right foundation to build on.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### Voiceprint / Speaker Embedding Design
			
 
				+
			
 
				+**To answer your core question first:** No, don't store the embedding in `speakers.json`. A speaker embedding is a float32 numpy array of 512 dimensions — about 2KB of binary data. JSON can technically hold it as a list of floats but it becomes unwieldy and slow to load. The clean approach is:
			
 
				+
			
 
				+```
			
 
				+bridge/
			
 
				+├── speakers.json          ← names, metadata (text only, fast to load)
			
 
				+└── embeddings/
			
 
				+    ├── SPEAKER_00.npy     ← numpy binary, one file per speaker
			
 
				+    ├── SPEAKER_01.npy
			
 
				+    └── SPEAKER_02.npy
			
 
				+```
			
 
				+
			
 
				+When you later move to a remote DB, the `.npy` files become `BLOB` columns or vector embeddings in something like pgvector. The `speakers.json` structure stays identical — you just add an `"has_embedding": true` flag.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### Extracting a Voiceprint From a 10-Second Snip
			
 
				+
			
 
				+This is the right approach — far better than asking someone to "say a sentence". Here's the full picture:
			
 
				+
			
 
				+**The tool:** `pyannote.audio`'s `SpeakerEmbedding` pipeline — already in your venv as a dependency of diart. It takes an audio segment and returns a 512-dimensional embedding vector that acts as a voice fingerprint.
			
 
				+
			
 
				+**The process for extracting from an existing recording:**
			
 
				+
			
 
				+1. Operator opens admin.py, navigates to a speaker
			
 
				+2. Selects a timestamp range from a test recording (e.g. "00:45 → 00:55")
			
 
				+3. Admin server extracts that 10s segment using miniaudio or ffmpeg
			
 
				+4. Passes it to `SpeakerEmbedding` pipeline
			
 
				+5. Saves the resulting `.npy` file to `embeddings/SPEAKER_XX.npy`
			
 
				+
			
 
				+**At runtime during a service:**
			
 
				+
			
 
				+When diart assigns a `SPEAKER_XX` label, bridge.py checks if an embedding exists for any unmatched speaker ID. If the current speaker's embedding (accumulated over ~5 seconds of live speech) cosine-similarity matches a stored embedding above threshold (~0.82), it auto-assigns the name without operator input.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### The `speakers.json` Schema Change
			
 
				+
			
 
				+Extend it from the current simple `{id: name}` to:
			
 
				+
			
 
				+json
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "SPEAKER_00": {
			
 
				+    "name": "Pastor John",
			
 
				+    "has_embedding": true,
			
 
				+    "embedding_updated": "2026-05-04T09:32:00",
			
 
				+    "colour": "#2563eb",
			
 
				+    "notes": "Senior pastor"
			
 
				+  },
			
 
				+  "SPEAKER_01": {
			
 
				+    "name": "Mary (Reader)",
			
 
				+    "has_embedding": false,
			
 
				+    "embedding_updated": null,
			
 
				+    "colour": "#16a34a",
			
 
				+    "notes": ""
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The bridge reads `has_embedding` to decide whether to attempt auto-matching. The actual vector lives in `embeddings/SPEAKER_XX.npy`.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### Implementation Plan
			
 
				+
			
 
				+This is a 3-part addition:
			
 
				+
			
 
				+**Part 1 —** `embeddings.py` (new file, ~80 lines) A small module that handles extraction, storage and matching:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+# Core interface
			
 
				+extract_embedding(audio_path, start_sec, end_sec) -> np.ndarray
			
 
				+save_embedding(speaker_id, embedding)
			
 
				+load_embedding(speaker_id) -> np.ndarray | None
			
 
				+find_best_match(live_embedding, threshold=0.82) -> tuple[str, float] | None
			
 
				+```
			
 
				+
			
 
				+**Part 2 — Admin UI additions** On each speaker row, add a "Extract from recording" button that lets the operator:
			
 
				+
			
 
				+- Pick an uploaded test recording from a dropdown
			
 
				+- Set a start/end time (or use a simple "pick a 10s window" slider)
			
 
				+- Click "Extract" — server processes it and saves the `.npy`
			
 
				+- Row updates to show "✓ Voiceprint saved"
			
 
				+
			
 
				+**Part 3 — Bridge auto-matching** In `push_final()`, when a new unknown speaker appears, compare their accumulated audio embedding against all stored profiles before prompting the operator. If a match is found above threshold, auto-assign the name silently.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+Want me to write `embeddings.py` first, then the admin UI changes, then the bridge matching logic? The `embeddings.py` module is the cleanest starting point — it's self-contained and we can test it standalone before wiring it into the UI.
			
 
				+
			
 
				+## You said: ok lets get teh embeddings.
			
 
				+
			
 
				+ok lets get teh embeddings.py under way
			
 
				+
			
 
				+3:30 PM
			
 
				+
			
 
				+## Claude responded: Let me check the current bridge structure first before writing it.
			
 
				+
			
 
				+Let me check the current bridge structure first before writing it.
			
 
				+
			
 
				+Searched the web
			
 
				+
			
 
				+Searched the web
			
 
				+
			
 
				+The repo is requiring login for raw file access now. No matter — I have enough context from the session and the README. Let me write the module.
			
 
				+
			
 
				+Viewed a file, created a file, read a file
			
 
				+
			
 
				+Viewed a file, created a file, read a file
			
 
				+
			
 
				+Here's what's in it and why each piece is designed the way it is.
			
 
				+
			
 
				+**Storage** — `.npy` files in `bridge/embeddings/`, one per speaker. `speakers.json` gains `has_embedding`, `embedding_updated`, `colour`, and `notes` fields but the schema is backward compatible — existing flat `{"SPEAKER_00": "Pastor John"}` entries are migrated transparently via `_normalise_speaker_entry()`.
			
 
				+
			
 
				+**Extraction** — `extract()` uses miniaudio (already in your venv) to pull a segment by start/end seconds, writes a temp WAV, feeds it to pyannote's `SpeakerEmbedding` pipeline, then deletes the temp file. The pipeline is lazy-loaded on first call so import is fast.
			
 
				+
			
 
				+`LiveEmbeddingAccumulator` — this is the piece that plugs into bridge.py. When diart announces a new `SPEAKER_XX`, bridge creates an accumulator, feeds it PCM chunks via `push()`, and once `ready()` returns True (5 seconds by default) calls `extract_embedding()` + `registry.find_match()`. If it matches an enrolled speaker, the name is auto-assigned without operator prompt.
			
 
				+
			
 
				+**CLI** — the file is executable standalone for testing without starting the full stack:
			
 
				+
			
 
				+bash
			
 
				+
			
 
				+```bash
			
 
				+# Enrol from seconds 45–55 of last Sunday's recording
			
 
				+python bridge\embeddings.py enrol SPEAKER_00 test_recordings\service.mp3 --start 45 --end 55
			
 
				+
			
 
				+# Check what it matched against all enrolled speakers
			
 
				+python bridge\embeddings.py scores test_recordings\service.mp3 --start 45 --end 55
			
 
				+
			
 
				+# List who's enrolled
			
 
				+python bridge\embeddings.py list
			
 
				+```
			
 
				+
			
 
				+Next steps are the admin UI additions (a segment picker per speaker row) and wiring `LiveEmbeddingAccumulator` into `bridge.py`'s `push_final()` path.
			
 
				+
			
 
				+
			
 
				+### What a Voiceprint Actually Is
			
 
				+
			
 
				+The embedding captures **acoustic characteristics** of a voice — things like fundamental frequency, formant patterns, speaking rhythm, vocal tract shape. It has nothing to do with the words spoken. So a person saying "the quick brown fox" and saying "let us pray" would produce very similar embeddings, because it's the _voice_ not the _words_ being fingerprinted.
			
 
				+
			
 
				+So transcription accuracy doesn't directly improve the embedding quality.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### What _Does_ Improve Embedding Quality
			
 
				+
			
 
				+The factors that matter are:
			
 
				+
			
 
				+|     |     |
			
 
				+| --- | --- |
			
 
				+| Factor | Impact |
			
 
				+| **Audio cleanliness** | Highest — noise, reverb, music bleed all degrade it |
			
 
				+| **Segment length** | 10–20s is the sweet spot; diminishing returns after 30s |
			
 
				+| **Single speaker** | Must be isolated — no overlapping voices in the segment |
			
 
				+| **Consistent mic distance** | Close-mic (pulpit mic) is better than room mic |
			
 
				+| **Multiple samples averaged** | Enrolling 2–3 separate clips and averaging the embeddings is more robust than one clip |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### Where Transcription _Does_ Help — Indirectly
			
 
				+
			
 
				+Here's the indirect benefit you're probably intuiting:
			
 
				+
			
 
				+**If you know what was said, you can verify the segment is clean.** For example:
			
 
				+
			
 
				+- If the transcript shows `[SPEAKER_00] "Let us open in prayer..."` — you know that's isolated speech, no congregation response, no overlap. That's a good enrolment segment.
			
 
				+- If the transcript shows `[SPEAKER_00] "[inaudible] ...mmm... [SPEAKER_01]"` — overlapping speakers, bad segment, skip it.
			
 
				+
			
 
				+So the workflow becomes:
			
 
				+
			
 
				+```
			
 
				+1. Run a test recording through the pipeline
			
 
				+2. Read the transcript output to find clean, isolated segments per speaker
			
 
				+3. Note the timestamps of those clean segments
			
 
				+4. Use embeddings.py enrol with those specific timestamps
			
 
				+```
			
 
				+
			
 
				+This is actually a **much better enrolment strategy** than asking someone to record a sample — because the transcript lets you cherry-pick the cleanest real-world speech from an actual service.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### Practical Addition to embeddings.py
			
 
				+
			
 
				+We could add a `enrol_from_transcript()` helper that takes the transcript log (which bridge.py already produces) and automatically picks the best segment per speaker — longest clean run with no speaker change within it:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+def enrol_from_transcript(
			
 
				+    speaker_id: str,
			
 
				+    audio_path: Path,
			
 
				+    transcript_segments: list[dict],  # [{speaker, start, end, text}, ...]
			
 
				+    min_duration: float = 8.0,
			
 
				+) -> np.ndarray:
			
 
				+    """
			
 
				+    Find the longest clean single-speaker segment in a transcript
			
 
				+    and use it for enrolment. Avoids segments near speaker changes.
			
 
				+    """
			
 
				+    candidates = [
			
 
				+        s for s in transcript_segments
			
 
				+        if s["speaker"] == speaker_id
			
 
				+        and (s["end"] - s["start"]) >= min_duration
			
 
				+    ]
			
 
				+    if not candidates:
			
 
				+        raise ValueError(f"No clean segment >= {min_duration}s found for {speaker_id}")
			
 
				+    
			
 
				+    # Pick longest segment
			
 
				+    best = max(candidates, key=lambda s: s["end"] - s["start"])
			
 
				+    return self.extract_and_save(speaker_id, audio_path, best["start"], best["end"])
			
 
				+```
			
 
				+
			
 
				+This would mean after every test recording run, bridge.py logs timestamped segments, and the admin UI could show a "Use best segment for enrolment" button per speaker — fully automatic, no timestamp hunting needed.
			
 
				+
			
 
				+Want me to add that to `embeddings.py` and also add the transcript segment logging to `bridge.py` so the timestamps are captured?
			
 
				+
			
 
				+## You said: ok lets update them both let me know where i need to put the block into the bridge.
			
 
				+
			
 
				+ok lets update them both let me know where i need to put the block into the bridge.py and embeddings.py
			
 
				+
			
 
				+3:47 PM
			
 
				+
			
 
				+## Claude responded: Let me fetch the current bridge.
			
 
				+
			
 
				+Let me fetch the current bridge.py first so I place the changes exactly right.
			
 
				+
			
 
				+Failed to fetch: https://git.tazz.com.au/benjamin.harris/Deaf_Transcription_Service/src/main/bridge/bridge.py
			
 
				+
			
 
				+Failed to fetch: https://git.tazz.com.au/benjamin.harris/Deaf_Transcription_Service/src/main/bridge/bridge.py
			
 
				+
			
 
				+The repo is still requiring login for file access. Can you paste the current `bridge.py` content here? Specifically I need to see:
			
 
				+
			
 
				+1. The `push_final()` method in `BridgeState`
			
 
				+2. The `_receive_results()` async function
			
 
				+3. The `main()` function
			
 
				+
			
 
				+Those are the three places I need to insert code. While you grab those, here's the complete updated `embeddings.py` with the transcript-based enrolment added:
			
 
				+
			
 
				+Ran a command, created a file, read a file
			
 
				+
			
 
				+Ran a command, created a file, read a file
			
 
				+
			
 
				+Good. Now for the `bridge.py` changes — here are the **exact three insertions** you need to make. Since I can't see the current file I'll give you find/replace targets that should be unambiguous:
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### bridge.py — Change 1: Add import at the top
			
 
				+
			
 
				+Find the block of imports at the top (near `import json`, `import re` etc). Add after the last import line:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+from embeddings import (
			
 
				+    EmbeddingRegistry,
			
 
				+    LiveEmbeddingAccumulator,
			
 
				+    log_transcript_segment,
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### bridge.py — Change 2: Add registry + accumulator tracking to `BridgeState.__init__`
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+self._last_final_time = time.monotonic()
			
 
				+```
			
 
				+
			
 
				+Add immediately after it:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+# Voiceprint matching
			
 
				+self._embedding_registry = EmbeddingRegistry()
			
 
				+# Per-speaker accumulators: speaker_id -> LiveEmbeddingAccumulator
			
 
				+self._accumulators: dict[str, LiveEmbeddingAccumulator] = {}
			
 
				+# Speaker IDs confirmed this session (skip re-matching)
			
 
				+self._confirmed_ids: set[str] = set()
			
 
				+# Session ID for transcript logging (date string)
			
 
				+self._session_id: str = datetime.now(timezone.utc).strftime("%Y-%m-%d")
			
 
				+```
			
 
				+
			
 
				+You'll also need to add `from datetime import datetime, timezone` to the imports if it's not already there.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### bridge.py — Change 3: Add segment logging + embedding accumulation to `push_final()`
			
 
				+
			
 
				+Find the line inside `push_final()` that reads something like:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+self._last_final_time = time.monotonic()
			
 
				+```
			
 
				+
			
 
				+Just after that line, add:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+# ── Transcript segment logging ──────────────────────────────────────────
			
 
				+# Log every finalised segment for later voiceprint enrolment
			
 
				+# start_sec/end_sec from FrontData if available, else estimate from time
			
 
				+seg_start = getattr(response, "start", None)
			
 
				+seg_end   = getattr(response, "end",   None)
			
 
				+if seg_start is None:
			
 
				+    seg_end   = time.monotonic() - self._last_final_time
			
 
				+    seg_start = max(0.0, seg_end - len(text.split()) * 0.4)
			
 
				+log_transcript_segment(
			
 
				+    speaker_id = speaker_id or "UNKNOWN",
			
 
				+    text       = text,
			
 
				+    start_sec  = seg_start or 0.0,
			
 
				+    end_sec    = seg_end   or 0.0,
			
 
				+    session_id = self._session_id,
			
 
				+)
			
 
				+
			
 
				+# ── Live embedding accumulation for auto speaker matching ───────────────
			
 
				+if speaker_id and speaker_id not in self._confirmed_ids:
			
 
				+    if speaker_id not in self._accumulators:
			
 
				+        self._accumulators[speaker_id] = \
			
 
				+            self._embedding_registry.make_accumulator(min_seconds=5.0)
			
 
				+    # We don't have raw PCM here — accumulation happens in _send_audio
			
 
				+    # This is a placeholder; see Change 4 for the PCM-level accumulation
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### bridge.py — Change 4: PCM-level accumulation in `_send_audio`
			
 
				+
			
 
				+This is the most important change — accumulating actual audio per speaker. Find inside `audio_processor_loop` the `_send_audio` coroutine. Find the line that calls:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+await _audio_processor.process_audio(chunk)
			
 
				+```
			
 
				+
			
 
				+Replace it with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+await _audio_processor.process_audio(chunk)
			
 
				+
			
 
				+# Feed chunk to the accumulator for the current speaker
			
 
				+# _current_speaker_id is set by _receive_results as segments arrive
			
 
				+current_spk = state.get_current_speaker_id()
			
 
				+if current_spk and current_spk not in state._confirmed_ids:
			
 
				+    if current_spk not in state._accumulators:
			
 
				+        state._accumulators[current_spk] = \
			
 
				+            state._embedding_registry.make_accumulator(min_seconds=5.0)
			
 
				+    state._accumulators[current_spk].push(chunk)
			
 
				+    if state._accumulators[current_spk].ready():
			
 
				+        try:
			
 
				+            live_emb = state._accumulators[current_spk].extract_embedding()
			
 
				+            match    = state._embedding_registry.find_match(live_emb)
			
 
				+            if match:
			
 
				+                matched_id, score = match
			
 
				+                resolved_name = state.speaker_names.get(matched_id, matched_id)
			
 
				+                print(
			
 
				+                    f"[Embeddings] Auto-matched {current_spk} → "
			
 
				+                    f"{resolved_name} (score={score:.3f})"
			
 
				+                )
			
 
				+                state.set_speaker_name(current_spk, resolved_name)
			
 
				+                state._confirmed_ids.add(current_spk)
			
 
				+            else:
			
 
				+                # No match — reset accumulator to try again with more audio
			
 
				+                state._accumulators[current_spk].reset()
			
 
				+        except Exception as exc:
			
 
				+            logger.warning(f"[Embeddings] Accumulator error: {exc}")
			
 
				+```
			
 
				+
			
 
				+And add this method to `BridgeState`:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+def get_current_speaker_id(self) -> str | None:
			
 
				+    """Return the raw diarization ID (SPEAKER_XX) of the current speaker."""
			
 
				+    with self._lock:
			
 
				+        return self._raw_speaker_id  # see below
			
 
				+```
			
 
				+
			
 
				+And add `self._raw_speaker_id: str | None = None` to `__init__`, then in `push_final()` add:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+```python
			
 
				+self._raw_speaker_id = speaker_id  # track raw ID for accumulator
			
 
				+```
			
 
				+
			
 
				+at the top of the `with self._lock:` block.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+That's all four changes. The key file to copy into `bridge/` is `embeddings.py`.
			
 
				+
			
 
				+
			
 
				+Here are the four precise changes:
			
 
				+
			
 
				+
			
 
				+
			
 
				+Change 1 — Imports (top of file)
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+from collections import Counter
			
 
				+from pathlib import Path
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+from collections import Counter
			
 
				+from datetime import datetime, timezone
			
 
				+from pathlib import Path
			
 
				+
			
 
				+Then find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+from whisperlivekit import AudioProcessor, TranscriptionEngine
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+from whisperlivekit import AudioProcessor, TranscriptionEngine
			
 
				+from embeddings import (
			
 
				+    EmbeddingRegistry,
			
 
				+    log_transcript_segment,
			
 
				+)
			
 
				+
			
 
				+
			
 
				+
			
 
				+Change 2 — BridgeState.__init__
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+        self._last_final_time              = time.monotonic()
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+        self._last_final_time              = time.monotonic()
			
 
				+        # Raw diarization ID of current speaker (SPEAKER_XX)
			
 
				+        self._raw_speaker_id: str | None   = None
			
 
				+        # Voiceprint matching
			
 
				+        self._embedding_registry           = EmbeddingRegistry()
			
 
				+        self._accumulators: dict[str, object] = {}
			
 
				+        self._confirmed_ids: set[str]      = set()
			
 
				+        self._session_id: str              = datetime.now(timezone.utc).strftime("%Y-%m-%d")
			
 
				+
			
 
				+
			
 
				+
			
 
				+Change 3 — push_final() method
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+    def push_final(self, text: str, speaker_id: str | None, mqtt_client: mqtt.Client) -> None:
			
 
				+        """Accept a finalised segment; flush on sentence boundary or speaker change."""
			
 
				+        with self._lock:
			
 
				+            if speaker_id:
			
 
				+                self._seen.add(speaker_id)
			
 
				+
			
 
				+            resolved = self._resolve(speaker_id)
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+    def push_final(self, text: str, speaker_id: str | None, mqtt_client: mqtt.Client,
			
 
				+                   seg_start: float = 0.0, seg_end: float = 0.0) -> None:
			
 
				+        """Accept a finalised segment; flush on sentence boundary or speaker change."""
			
 
				+        with self._lock:
			
 
				+            # Track raw diarization ID for PCM accumulator
			
 
				+            self._raw_speaker_id = speaker_id
			
 
				+
			
 
				+            if speaker_id:
			
 
				+                self._seen.add(speaker_id)
			
 
				+
			
 
				+            # Log segment to transcript_segments.jsonl for later enrolment
			
 
				+            log_transcript_segment(
			
 
				+                speaker_id = speaker_id or "UNKNOWN",
			
 
				+                text       = text,
			
 
				+                start_sec  = seg_start,
			
 
				+                end_sec    = seg_end,
			
 
				+                session_id = self._session_id,
			
 
				+            )
			
 
				+
			
 
				+            resolved = self._resolve(speaker_id)
			
 
				+
			
 
				+
			
 
				+
			
 
				+Change 4 — _receive_results() — pass timestamps to push_final
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+    async def _receive_results():
			
 
				+        async for response in results_generator:
			
 
				+            # response is a FrontData dataclass, not a dict
			
 
				+            text     = (getattr(response, "text", None) or getattr(response, "buffer_transcription", None) or "").strip()
			
 
				+            is_final = getattr(response, "is_final", False) or getattr(response, "end_of_segment", False)
			
 
				+            speaker  = getattr(response, "speaker", None)
			
 
				+            if is_final and text:
			
 
				+                print(f"[Whisper] ({speaker or '?'}) {text}")
			
 
				+                state.push_final(text, speaker, mqtt_client)
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+    async def _receive_results():
			
 
				+        async for response in results_generator:
			
 
				+            text     = (getattr(response, "text", None) or
			
 
				+                        getattr(response, "buffer_transcription", None) or "").strip()
			
 
				+            is_final = getattr(response, "is_final", False) or \
			
 
				+                       getattr(response, "end_of_segment", False)
			
 
				+            speaker  = getattr(response, "speaker", None)
			
 
				+            # Grab timestamps if FrontData provides them
			
 
				+            seg_start = float(getattr(response, "start", 0.0) or 0.0)
			
 
				+            seg_end   = float(getattr(response, "end",   0.0) or 0.0)
			
 
				+            if is_final and text:
			
 
				+                print(f"[Whisper] ({speaker or '?'}) {text}")
			
 
				+                state.push_final(text, speaker, mqtt_client, seg_start, seg_end)
			
 
				+
			
 
				+
			
 
				+
			
 
				+Change 5 — _send_audio() — PCM accumulation for live matching
			
 
				+
			
 
				+Find:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+            while True:
			
 
				+                # Drain test audio injection first if available
			
 
				+                try:
			
 
				+                    chunk = test_audio_queue.get_nowait()
			
 
				+                except asyncio.QueueEmpty:
			
 
				+                    chunk = await audio_queue.get()
			
 
				+                await audio_processor.process_audio(chunk)
			
 
				+
			
 
				+Replace with:
			
 
				+
			
 
				+python
			
 
				+
			
 
				+            while True:
			
 
				+                # Drain test audio injection first if available
			
 
				+                try:
			
 
				+                    chunk = test_audio_queue.get_nowait()
			
 
				+                except asyncio.QueueEmpty:
			
 
				+                    chunk = await audio_queue.get()
			
 
				+                await audio_processor.process_audio(chunk)
			
 
				+
			
 
				+                # Accumulate PCM for live speaker matching
			
 
				+                current_spk = state._raw_speaker_id
			
 
				+                if current_spk and current_spk not in state._confirmed_ids:
			
 
				+                    if current_spk not in state._accumulators:
			
 
				+                        state._accumulators[current_spk] = \
			
 
				+                            state._embedding_registry.make_accumulator(min_seconds=5.0)
			
 
				+                    state._accumulators[current_spk].push(chunk)
			
 
				+                    if state._accumulators[current_spk].ready():
			
 
				+                        try:
			
 
				+                            live_emb = state._accumulators[current_spk].extract_embedding()
			
 
				+                            match    = state._embedding_registry.find_match(live_emb)
			
 
				+                            if match:
			
 
				+                                matched_id, score = match
			
 
				+                                resolved = state.speaker_names.get(matched_id, matched_id)
			
 
				+                                print(
			
 
				+                                    f"[Embeddings] Auto-matched {current_spk} → "
			
 
				+                                    f"{resolved} (score={score:.3f})"
			
 
				+                                )
			
 
				+                                with state._lock:
			
 
				+                                    state.speaker_names[current_spk] = resolved
			
 
				+                                    state._confirmed_ids.add(current_spk)
			
 
				+                                    _write_speakers(state.speaker_names)
			
 
				+                            else:
			
 
				+                                # No match yet — reset and try again with more audio
			
 
				+                                state._accumulators[current_spk].reset()
			
 
				+                        except Exception as exc:
			
 
				+                            print(f"[Embeddings] Accumulator error: {exc}")
			
 
				+                            state._accumulators[current_spk].reset()
			
 
				+
			
 
				+
			
 
				+
			
 
				+That's all five changes. The dtype in audio_callback is currently float32 but the accumulator expects int16 — note that Change 5 feeds whatever chunk is into the accumulator. The LiveEmbeddingAccumulator.push() and extract_embedding() write a WAV file assuming int16, so you'll want to make sure the chunk is int16 before pushing. Add this line right before state._accumulators[current_spk].push(chunk):
			
 
				+
			
 
				+python
			
 
				+
			
 
				+                    # Convert float32 → int16 for the embedding accumulator
			
 
				+                    chunk_i16 = (np.frombuffer(chunk, dtype=np.float32) * 32767).astype(np.int16).tobytes()
			
 
				+                    state._accumulators[current_spk].push(chunk_i16)
			
 
				+
			
 
				+And change the .push(chunk) call to .push(chunk_i16).
			
 
				+
			
 
				+Once those are in, drop embeddings.py into the bridge/ folder alongside bridge.py and it will be importable. The transcript log will start filling up from the first run, ready for auto-enrol once you have a recording to pull segments from.
			
 
				+
			
 
				+What needs to be added:
			
 
				+
			
 
				+- A **Voiceprint** column to the speaker table with status badge
			
 
				+- A **voiceprint modal** per speaker with three tabs: auto-enrol from transcript, manual segment picker, and similarity test
			
 
				+- New API endpoints for all embedding operations
			
 
				+- A **Voiceprints** summary card showing all enrolled speakers