1 bulan lalu · 49e65a4568
--- a/SETUP.md
+++ b/SETUP.md
@@ -0,0 +1,310 @@
 
				+# Setup Guide — Church Live Transcription Display
			
 
				+
			
 
				+This guide walks through everything needed to get the system running on a
			
 
				+Windows 11 PC from scratch. Follow each section in order.
			
 
				+
			
 
				+**Total setup time: approximately 30–60 minutes** (most of that is download time).
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Why no installer / executable?
			
 
				+
			
 
				+The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
			
 
				+combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
			
 
				+natively on the host machine regardless. Packaging everything into a single
			
 
				+`.exe` is not practical for software of this type.
			
 
				+
			
 
				+Instead this guide provides:
			
 
				+- `install.bat` — run **once** to set everything up
			
 
				+- `start.bat` — run each time to launch the full system
			
 
				+
			
 
				+After setup, operation is a double-click.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 1 — System Requirements
			
 
				+
			
 
				+Before starting, confirm your PC meets these requirements:
			
 
				+
			
 
				+| Requirement | Minimum | Recommended |
			
 
				+|---|---|---|
			
 
				+| OS | Windows 10 64-bit | Windows 11 |
			
 
				+| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
			
 
				+| VRAM | 6 GB | 8 GB+ |
			
 
				+| RAM | 16 GB | 32 GB |
			
 
				+| Storage | 10 GB free | 20 GB free |
			
 
				+| Internet | Required for setup | Not needed during services |
			
 
				+
			
 
				+> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 2 — NVIDIA Driver
			
 
				+
			
 
				+You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA
			
 
				+Toolkit separately — PyTorch bundles everything it needs.
			
 
				+
			
 
				+1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.
			
 
				+
			
 
				+   **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your
			
 
				+   GPU model, download and run the installer.
			
 
				+
			
 
				+2. Choose **Express Installation**.
			
 
				+
			
 
				+3. Restart the PC when prompted.
			
 
				+
			
 
				+4. Verify the driver is working:
			
 
				+   - Press `Win + R`, type `cmd`, press Enter.
			
 
				+   - Type `nvidia-smi` and press Enter.
			
 
				+   - You should see a table with your GPU name and driver version.
			
 
				+
			
 
				+   ```
			
 
				+   +-----------------------------------------------------------------------------+
			
 
				+   | NVIDIA-SMI 560.x   Driver Version: 560.x   CUDA Version: 12.6              |
			
 
				+   +-----------------------------------------------------------------------------+
			
 
				+   | RTX 4070 Super ...
			
 
				+   ```
			
 
				+
			
 
				+   If this command is not found, the driver did not install correctly.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 3 — Python 3.11
			
 
				+
			
 
				+1. Go to [python.org/downloads](https://www.python.org/downloads/release/python-3119/)
			
 
				+   and download **Python 3.11.x** (Windows installer, 64-bit).
			
 
				+
			
 
				+   > Use Python **3.11** specifically. Some ML libraries have known issues with
			
 
				+   > Python 3.13 on Windows.
			
 
				+
			
 
				+2. Run the installer. On the first screen:
			
 
				+   - **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
			
 
				+   - Click **Install Now**
			
 
				+
			
 
				+3. Once complete, verify in a new Command Prompt window:
			
 
				+
			
 
				+   ```
			
 
				+   python --version
			
 
				+   ```
			
 
				+
			
 
				+   Expected output: `Python 3.11.x`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 4 — Mosquitto (MQTT Broker)
			
 
				+
			
 
				+Mosquitto is the message relay between the PC and the display.
			
 
				+
			
 
				+1. Download the Windows installer from
			
 
				+   [mosquitto.org/download](https://mosquitto.org/download/) — choose the
			
 
				+   `.exe` installer for Windows.
			
 
				+
			
 
				+2. Run the installer, accept all defaults.
			
 
				+
			
 
				+3. Start Mosquitto as a Windows service (run Command Prompt **as Administrator**):
			
 
				+
			
 
				+   ```
			
 
				+   net start mosquitto
			
 
				+   ```
			
 
				+
			
 
				+4. Set it to start automatically with Windows:
			
 
				+
			
 
				+   ```
			
 
				+   sc config mosquitto start=auto
			
 
				+   ```
			
 
				+
			
 
				+5. Verify it's running:
			
 
				+
			
 
				+   ```
			
 
				+   mosquitto_sub -h localhost -t test -v
			
 
				+   ```
			
 
				+
			
 
				+   Leave this running in the background. If it shows no errors, Mosquitto is
			
 
				+   working. Press `Ctrl+C` to stop the test.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 5 — HuggingFace Account (required for speaker diarization)
			
 
				+
			
 
				+The automatic speaker detection uses a model from HuggingFace that requires
			
 
				+accepting its licence terms. This is free — it just needs an account.
			
 
				+
			
 
				+1. Go to [huggingface.co](https://huggingface.co) and create a free account.
			
 
				+
			
 
				+2. Accept the licence for the diarization model:
			
 
				+   - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
			
 
				+   - Click **"Agree and access repository"**
			
 
				+   - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
			
 
				+   - Click **"Agree and access repository"**
			
 
				+
			
 
				+   > If you skip this step, the server will fail to start with a 403 error.
			
 
				+
			
 
				+3. Create an access token:
			
 
				+   - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
			
 
				+   - Click **New token**
			
 
				+   - Name: `church-transcription` (or anything you like)
			
 
				+   - Role: **Read**
			
 
				+   - Click **Generate token**
			
 
				+   - Copy the token — it starts with `hf_`
			
 
				+
			
 
				+4. **Save this token somewhere safe** (Notepad or a password manager). You will
			
 
				+   paste it into `start.bat` in Part 7.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 6 — Run install.bat
			
 
				+
			
 
				+The `install.bat` script in this folder does the following automatically:
			
 
				+- Creates a Python virtual environment in `.venv\`
			
 
				+- Installs PyTorch with CUDA support
			
 
				+- Installs WhisperLiveKit
			
 
				+- Installs the bridge script dependencies
			
 
				+
			
 
				+**Steps:**
			
 
				+
			
 
				+1. Open File Explorer and navigate to this project folder.
			
 
				+
			
 
				+2. Double-click **`install.bat`**.
			
 
				+
			
 
				+   A Command Prompt window will open. You will see packages downloading and
			
 
				+   installing. This will take **10–20 minutes** depending on your internet speed.
			
 
				+   The PyTorch download alone is ~2.5 GB.
			
 
				+
			
 
				+3. Near the end you will see the Whisper model downloading for the first time:
			
 
				+
			
 
				+   ```
			
 
				+   Downloading model large-v3 (~3 GB) ...
			
 
				+   ```
			
 
				+
			
 
				+   Wait for this to complete. The model is cached after the first download.
			
 
				+
			
 
				+4. When you see `Installation complete.` the window will pause. Press any key
			
 
				+   to close it.
			
 
				+
			
 
				+> **If install.bat fails** — see the Troubleshooting section at the bottom.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 7 — Configure start.bat
			
 
				+
			
 
				+Before running the system for the first time, you need to add your HuggingFace
			
 
				+token to the startup script.
			
 
				+
			
 
				+1. Right-click **`start.bat`** → **Edit** (opens in Notepad).
			
 
				+
			
 
				+2. Find this line near the top:
			
 
				+
			
 
				+   ```
			
 
				+   set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
			
 
				+   ```
			
 
				+
			
 
				+3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5.
			
 
				+   Example:
			
 
				+
			
 
				+   ```
			
 
				+   set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
			
 
				+   ```
			
 
				+
			
 
				+4. Save the file (`Ctrl+S`).
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 8 — First run
			
 
				+
			
 
				+1. Double-click **`start.bat`**.
			
 
				+
			
 
				+   Two windows will open:
			
 
				+   - **Window 1 — Whisper Server**: shows the transcription engine loading.
			
 
				+     On first run this downloads the speaker diarization model (~500 MB).
			
 
				+     Wait until you see `Server running on ws://0.0.0.0:8000`.
			
 
				+   - **Window 2 — Bridge**: the speaker name mapping window appears, and the
			
 
				+     Command Prompt behind it shows connection status.
			
 
				+
			
 
				+2. Verify the Whisper server is working:
			
 
				+   - Open a browser and go to `http://localhost:8000`
			
 
				+   - You should see a simple web interface. Speak into the microphone — text
			
 
				+     should appear.
			
 
				+
			
 
				+3. Verify the display:
			
 
				+   - With the ESP32 powered on and connected to the same WiFi, send a test
			
 
				+     message. Open a third Command Prompt and run:
			
 
				+
			
 
				+     ```
			
 
				+     mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
			
 
				+     ```
			
 
				+
			
 
				+   - The e-ink display should refresh within 2 seconds showing those three lines.
			
 
				+
			
 
				+4. Full pipeline test:
			
 
				+   - Speak naturally into the microphone.
			
 
				+   - After a sentence or natural pause, text should appear on the display within
			
 
				+     3–5 seconds.
			
 
				+   - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line
			
 
				+     should appear between their sections.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Part 9 — Assigning speaker names
			
 
				+
			
 
				+The bridge window shows a **Speaker Name Mapping** panel. The system
			
 
				+automatically detects different speakers and labels them SPEAKER_00,
			
 
				+SPEAKER_01, etc.
			
 
				+
			
 
				+- The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the
			
 
				+  bridge starts.
			
 
				+- If a different person is speaking than expected, type their name in the
			
 
				+  matching row and click **Apply**.
			
 
				+- Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`)
			
 
				+  whenever the speaker changes.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Ongoing use (every Sunday)
			
 
				+
			
 
				+1. Double-click `start.bat`.
			
 
				+2. Wait ~30 seconds for both windows to show "ready" status.
			
 
				+3. The display will show `DISPLAY READY` when the ESP32 connects.
			
 
				+4. Begin the service — transcription runs automatically.
			
 
				+5. Close both windows when done.
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Troubleshooting
			
 
				+
			
 
				+### `nvidia-smi` not found
			
 
				+The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
			
 
				+and restart the PC.
			
 
				+
			
 
				+### `python --version` shows wrong version or "not found"
			
 
				+Python was not added to PATH. Re-run the Python installer, choose "Modify",
			
 
				+and tick "Add Python to environment variables".
			
 
				+
			
 
				+### install.bat fails with "torch" errors
			
 
				+PyTorch may have failed to download. Delete the `.venv` folder and run
			
 
				+`install.bat` again with a stable internet connection.
			
 
				+
			
 
				+### Whisper server fails with `401` or `403`
			
 
				+Your HuggingFace token is incorrect, or you have not accepted the model licence
			
 
				+terms. Re-check Part 5 — both model pages must have "Agree and access
			
 
				+repository" clicked while logged into the same account that generated the token.
			
 
				+
			
 
				+### Whisper server starts but no text appears
			
 
				+Check that the correct audio input device is selected:
			
 
				+- Open Windows **Sound Settings** → Input → ensure the microphone or audio
			
 
				+  interface is set as the default device.
			
 
				+- The bridge uses the Windows default input device.
			
 
				+
			
 
				+### Display does not update
			
 
				+- Check the ESP32 Serial Monitor for WiFi/MQTT connection messages.
			
 
				+- Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` →
			
 
				+  look for the WiFi adapter IPv4 address).
			
 
				+- Confirm Mosquitto is running: `sc query mosquitto`
			
 
				+
			
 
				+### `large-v3` is too slow (display lags more than 5–6 seconds)
			
 
				+Switch to a faster model by editing `start.bat`:
			
 
				+
			
 
				+```
			
 
				+set WHISPER_MODEL=distil-large-v3
			
 
				+```
			
 
				+
			
 
				+`distil-large-v3` is ~50%% faster with only a small accuracy reduction.
			
--- a/install.bat
+++ b/install.bat
@@ -0,0 +1,122 @@
 
				+@echo off
			
 
				+setlocal enabledelayedexpansion
			
 
				+title Church Transcription — Installation
			
 
				+
			
 
				+echo.
			
 
				+echo ============================================================
			
 
				+echo  Church Live Transcription Display — One-time Setup
			
 
				+echo ============================================================
			
 
				+echo.
			
 
				+echo This will install all required software into a local
			
 
				+echo virtual environment (.venv). It will NOT affect other
			
 
				+echo Python programs on this computer.
			
 
				+echo.
			
 
				+echo Estimated time: 10-20 minutes (depends on internet speed).
			
 
				+echo.
			
 
				+pause
			
 
				+
			
 
				+:: ── Check Python ────────────────────────────────────────────────────────────
			
 
				+
			
 
				+echo [1/6] Checking Python version...
			
 
				+python --version >nul 2>&1
			
 
				+if errorlevel 1 (
			
 
				+    echo.
			
 
				+    echo ERROR: Python is not installed or not in PATH.
			
 
				+    echo Please install Python 3.11 from https://python.org
			
 
				+    echo Make sure you tick "Add Python to PATH" during install.
			
 
				+    echo.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+for /f "tokens=2 delims= " %%v in ('python --version 2^>^&1') do set PYVER=%%v
			
 
				+echo Found Python %PYVER%
			
 
				+
			
 
				+:: ── Create virtual environment ───────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo [2/6] Creating virtual environment in .venv\ ...
			
 
				+if exist .venv (
			
 
				+    echo .venv already exists — skipping creation.
			
 
				+) else (
			
 
				+    python -m venv .venv
			
 
				+    if errorlevel 1 (
			
 
				+        echo ERROR: Failed to create virtual environment.
			
 
				+        pause
			
 
				+        exit /b 1
			
 
				+    )
			
 
				+)
			
 
				+
			
 
				+call .venv\Scripts\activate.bat
			
 
				+
			
 
				+:: ── Upgrade pip ──────────────────────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo [3/6] Upgrading pip...
			
 
				+python -m pip install --upgrade pip --quiet
			
 
				+
			
 
				+:: ── Install PyTorch with CUDA ─────────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo [4/6] Installing PyTorch with CUDA support (~2.5 GB download)...
			
 
				+echo This is the longest step. Please wait.
			
 
				+echo.
			
 
				+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
			
 
				+if errorlevel 1 (
			
 
				+    echo.
			
 
				+    echo ERROR: PyTorch installation failed.
			
 
				+    echo Check your internet connection and try again.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+:: ── Install WhisperLiveKit ────────────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo [5/6] Installing WhisperLiveKit and dependencies...
			
 
				+echo.
			
 
				+pip install whisperlivekit pyannote.audio
			
 
				+if errorlevel 1 (
			
 
				+    echo.
			
 
				+    echo ERROR: WhisperLiveKit installation failed.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+:: ── Install bridge dependencies ───────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo [6/6] Installing bridge script dependencies...
			
 
				+pip install -r bridge\requirements.txt
			
 
				+if errorlevel 1 (
			
 
				+    echo.
			
 
				+    echo ERROR: Bridge dependencies failed to install.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+:: ── Pre-download Whisper model ────────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo Downloading Whisper large-v3 model (~3 GB) — this only happens once.
			
 
				+echo.
			
 
				+python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cuda', compute_type='float16')"
			
 
				+if errorlevel 1 (
			
 
				+    echo.
			
 
				+    echo WARNING: Model pre-download failed. It will download on first start instead.
			
 
				+    echo This is not critical — continuing.
			
 
				+)
			
 
				+
			
 
				+:: ── Done ─────────────────────────────────────────────────────────────────────
			
 
				+
			
 
				+echo.
			
 
				+echo ============================================================
			
 
				+echo  Installation complete.
			
 
				+echo ============================================================
			
 
				+echo.
			
 
				+echo Next steps:
			
 
				+echo   1. Edit start.bat and add your HuggingFace token
			
 
				+echo      (see SETUP.md Part 7 for instructions)
			
 
				+echo   2. Double-click start.bat to launch the system
			
 
				+echo.
			
 
				+pause
			
--- a/start.bat
+++ b/start.bat
@@ -0,0 +1,88 @@
 
				+@echo off
			
 
				+setlocal enabledelayedexpansion
			
 
				+title Church Transcription — Launcher
			
 
				+
			
 
				+:: ════════════════════════════════════════════════════════════════════════════
			
 
				+::  CONFIGURATION — edit these lines before first use
			
 
				+:: ════════════════════════════════════════════════════════════════════════════
			
 
				+
			
 
				+:: Your HuggingFace access token (required for speaker diarization)
			
 
				+:: Get one at https://huggingface.co/settings/tokens
			
 
				+set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
			
 
				+
			
 
				+:: Whisper model to use:
			
 
				+::   large-v3          — most accurate, needs ~6 GB VRAM, ~3 s latency
			
 
				+::   distil-large-v3   — faster (~2 s latency), very slightly less accurate
			
 
				+::   medium            — fallback if VRAM is limited (~4 GB VRAM)
			
 
				+set WHISPER_MODEL=large-v3
			
 
				+
			
 
				+:: ════════════════════════════════════════════════════════════════════════════
			
 
				+
			
 
				+:: Check the token has been set
			
 
				+if "%HF_TOKEN%"=="PASTE_YOUR_TOKEN_HERE" (
			
 
				+    echo.
			
 
				+    echo ERROR: HuggingFace token not configured.
			
 
				+    echo.
			
 
				+    echo Open start.bat in Notepad and replace PASTE_YOUR_TOKEN_HERE
			
 
				+    echo with your token from https://huggingface.co/settings/tokens
			
 
				+    echo.
			
 
				+    echo See SETUP.md Part 7 for full instructions.
			
 
				+    echo.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+:: Check virtual environment exists
			
 
				+if not exist .venv\Scripts\activate.bat (
			
 
				+    echo.
			
 
				+    echo ERROR: Virtual environment not found.
			
 
				+    echo Please run install.bat first.
			
 
				+    echo.
			
 
				+    pause
			
 
				+    exit /b 1
			
 
				+)
			
 
				+
			
 
				+:: Check Mosquitto is running
			
 
				+sc query mosquitto | find "RUNNING" >nul 2>&1
			
 
				+if errorlevel 1 (
			
 
				+    echo Starting Mosquitto MQTT broker...
			
 
				+    net start mosquitto >nul 2>&1
			
 
				+    if errorlevel 1 (
			
 
				+        echo WARNING: Could not start Mosquitto. Is it installed?
			
 
				+        echo See SETUP.md Part 4.
			
 
				+        pause
			
 
				+        exit /b 1
			
 
				+    )
			
 
				+)
			
 
				+
			
 
				+echo.
			
 
				+echo ============================================================
			
 
				+echo  Church Live Transcription Display
			
 
				+echo ============================================================
			
 
				+echo.
			
 
				+echo Starting Whisper server in a new window...
			
 
				+echo Starting bridge in a new window...
			
 
				+echo.
			
 
				+echo Both windows must stay open during the service.
			
 
				+echo Close this window or both others to shut down.
			
 
				+echo.
			
 
				+
			
 
				+:: Activate venv and launch WhisperLiveKit in its own window
			
 
				+start "Whisper Transcription Server" cmd /k ^
			
 
				+    "call .venv\Scripts\activate.bat && ^
			
 
				+     set HF_TOKEN=%HF_TOKEN% && ^
			
 
				+     echo Starting WhisperLiveKit (%WHISPER_MODEL%) with diarization... && ^
			
 
				+     wlk --model %WHISPER_MODEL% --language en --diarization --hf-token %HF_TOKEN%"
			
 
				+
			
 
				+:: Brief pause so Whisper can begin loading before the bridge connects
			
 
				+timeout /t 5 /nobreak >nul
			
 
				+
			
 
				+:: Activate venv and launch the bridge (speaker UI opens in this process)
			
 
				+start "Transcription Bridge" cmd /k ^
			
 
				+    "call .venv\Scripts\activate.bat && ^
			
 
				+     echo Starting bridge... && ^
			
 
				+     python bridge\bridge.py"
			
 
				+
			
 
				+echo Both windows launched. You can minimise this window.
			
 
				+echo.
			
 
				+pause