Jelajahi Sumber

Add Setup Instructions

Benjamin Harris 1 bulan lalu
induk
melakukan
49e65a4568
3 mengubah file dengan 520 tambahan dan 0 penghapusan
  1. 310 0
      SETUP.md
  2. 122 0
      install.bat
  3. 88 0
      start.bat

+ 310 - 0
SETUP.md

@@ -0,0 +1,310 @@
+# Setup Guide — Church Live Transcription Display
+
+This guide walks through everything needed to get the system running on a
+Windows 11 PC from scratch. Follow each section in order.
+
+**Total setup time: approximately 30–60 minutes** (most of that is download time).
+
+---
+
+## Why no installer / executable?
+
+The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
+combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
+natively on the host machine regardless. Packaging everything into a single
+`.exe` is not practical for software of this type.
+
+Instead this guide provides:
+- `install.bat` — run **once** to set everything up
+- `start.bat` — run each time to launch the full system
+
+After setup, operation is a double-click.
+
+---
+
+## Part 1 — System Requirements
+
+Before starting, confirm your PC meets these requirements:
+
+| Requirement | Minimum | Recommended |
+|---|---|---|
+| OS | Windows 10 64-bit | Windows 11 |
+| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
+| VRAM | 6 GB | 8 GB+ |
+| RAM | 16 GB | 32 GB |
+| Storage | 10 GB free | 20 GB free |
+| Internet | Required for setup | Not needed during services |
+
+> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.
+
+---
+
+## Part 2 — NVIDIA Driver
+
+You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA
+Toolkit separately — PyTorch bundles everything it needs.
+
+1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.
+
+   **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your
+   GPU model, download and run the installer.
+
+2. Choose **Express Installation**.
+
+3. Restart the PC when prompted.
+
+4. Verify the driver is working:
+   - Press `Win + R`, type `cmd`, press Enter.
+   - Type `nvidia-smi` and press Enter.
+   - You should see a table with your GPU name and driver version.
+
+   ```
+   +-----------------------------------------------------------------------------+
+   | NVIDIA-SMI 560.x   Driver Version: 560.x   CUDA Version: 12.6              |
+   +-----------------------------------------------------------------------------+
+   | RTX 4070 Super ...
+   ```
+
+   If this command is not found, the driver did not install correctly.
+
+---
+
+## Part 3 — Python 3.11
+
+1. Go to [python.org/downloads](https://www.python.org/downloads/release/python-3119/)
+   and download **Python 3.11.x** (Windows installer, 64-bit).
+
+   > Use Python **3.11** specifically. Some ML libraries have known issues with
+   > Python 3.13 on Windows.
+
+2. Run the installer. On the first screen:
+   - **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
+   - Click **Install Now**
+
+3. Once complete, verify in a new Command Prompt window:
+
+   ```
+   python --version
+   ```
+
+   Expected output: `Python 3.11.x`
+
+---
+
+## Part 4 — Mosquitto (MQTT Broker)
+
+Mosquitto is the message relay between the PC and the display.
+
+1. Download the Windows installer from
+   [mosquitto.org/download](https://mosquitto.org/download/) — choose the
+   `.exe` installer for Windows.
+
+2. Run the installer, accept all defaults.
+
+3. Start Mosquitto as a Windows service (run Command Prompt **as Administrator**):
+
+   ```
+   net start mosquitto
+   ```
+
+4. Set it to start automatically with Windows:
+
+   ```
+   sc config mosquitto start=auto
+   ```
+
+5. Verify it's running:
+
+   ```
+   mosquitto_sub -h localhost -t test -v
+   ```
+
+   Leave this running in the background. If it shows no errors, Mosquitto is
+   working. Press `Ctrl+C` to stop the test.
+
+---
+
+## Part 5 — HuggingFace Account (required for speaker diarization)
+
+The automatic speaker detection uses a model from HuggingFace that requires
+accepting its licence terms. This is free — it just needs an account.
+
+1. Go to [huggingface.co](https://huggingface.co) and create a free account.
+
+2. Accept the licence for the diarization model:
+   - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
+   - Click **"Agree and access repository"**
+   - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
+   - Click **"Agree and access repository"**
+
+   > If you skip this step, the server will fail to start with a 403 error.
+
+3. Create an access token:
+   - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
+   - Click **New token**
+   - Name: `church-transcription` (or anything you like)
+   - Role: **Read**
+   - Click **Generate token**
+   - Copy the token — it starts with `hf_`
+
+4. **Save this token somewhere safe** (Notepad or a password manager). You will
+   paste it into `start.bat` in Part 7.
+
+---
+
+## Part 6 — Run install.bat
+
+The `install.bat` script in this folder does the following automatically:
+- Creates a Python virtual environment in `.venv\`
+- Installs PyTorch with CUDA support
+- Installs WhisperLiveKit
+- Installs the bridge script dependencies
+
+**Steps:**
+
+1. Open File Explorer and navigate to this project folder.
+
+2. Double-click **`install.bat`**.
+
+   A Command Prompt window will open. You will see packages downloading and
+   installing. This will take **10–20 minutes** depending on your internet speed.
+   The PyTorch download alone is ~2.5 GB.
+
+3. Near the end you will see the Whisper model downloading for the first time:
+
+   ```
+   Downloading model large-v3 (~3 GB) ...
+   ```
+
+   Wait for this to complete. The model is cached after the first download.
+
+4. When you see `Installation complete.` the window will pause. Press any key
+   to close it.
+
+> **If install.bat fails** — see the Troubleshooting section at the bottom.
+
+---
+
+## Part 7 — Configure start.bat
+
+Before running the system for the first time, you need to add your HuggingFace
+token to the startup script.
+
+1. Right-click **`start.bat`** → **Edit** (opens in Notepad).
+
+2. Find this line near the top:
+
+   ```
+   set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
+   ```
+
+3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5.
+   Example:
+
+   ```
+   set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
+   ```
+
+4. Save the file (`Ctrl+S`).
+
+---
+
+## Part 8 — First run
+
+1. Double-click **`start.bat`**.
+
+   Two windows will open:
+   - **Window 1 — Whisper Server**: shows the transcription engine loading.
+     On first run this downloads the speaker diarization model (~500 MB).
+     Wait until you see `Server running on ws://0.0.0.0:8000`.
+   - **Window 2 — Bridge**: the speaker name mapping window appears, and the
+     Command Prompt behind it shows connection status.
+
+2. Verify the Whisper server is working:
+   - Open a browser and go to `http://localhost:8000`
+   - You should see a simple web interface. Speak into the microphone — text
+     should appear.
+
+3. Verify the display:
+   - With the ESP32 powered on and connected to the same WiFi, send a test
+     message. Open a third Command Prompt and run:
+
+     ```
+     mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
+     ```
+
+   - The e-ink display should refresh within 2 seconds showing those three lines.
+
+4. Full pipeline test:
+   - Speak naturally into the microphone.
+   - After a sentence or natural pause, text should appear on the display within
+     3–5 seconds.
+   - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line
+     should appear between their sections.
+
+---
+
+## Part 9 — Assigning speaker names
+
+The bridge window shows a **Speaker Name Mapping** panel. The system
+automatically detects different speakers and labels them SPEAKER_00,
+SPEAKER_01, etc.
+
+- The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the
+  bridge starts.
+- If a different person is speaking than expected, type their name in the
+  matching row and click **Apply**.
+- Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`)
+  whenever the speaker changes.
+
+---
+
+## Ongoing use (every Sunday)
+
+1. Double-click `start.bat`.
+2. Wait ~30 seconds for both windows to show "ready" status.
+3. The display will show `DISPLAY READY` when the ESP32 connects.
+4. Begin the service — transcription runs automatically.
+5. Close both windows when done.
+
+---
+
+## Troubleshooting
+
+### `nvidia-smi` not found
+The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
+and restart the PC.
+
+### `python --version` shows wrong version or "not found"
+Python was not added to PATH. Re-run the Python installer, choose "Modify",
+and tick "Add Python to environment variables".
+
+### install.bat fails with "torch" errors
+PyTorch may have failed to download. Delete the `.venv` folder and run
+`install.bat` again with a stable internet connection.
+
+### Whisper server fails with `401` or `403`
+Your HuggingFace token is incorrect, or you have not accepted the model licence
+terms. Re-check Part 5 — both model pages must have "Agree and access
+repository" clicked while logged into the same account that generated the token.
+
+### Whisper server starts but no text appears
+Check that the correct audio input device is selected:
+- Open Windows **Sound Settings** → Input → ensure the microphone or audio
+  interface is set as the default device.
+- The bridge uses the Windows default input device.
+
+### Display does not update
+- Check the ESP32 Serial Monitor for WiFi/MQTT connection messages.
+- Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` →
+  look for the WiFi adapter IPv4 address).
+- Confirm Mosquitto is running: `sc query mosquitto`
+
+### `large-v3` is too slow (display lags more than 5–6 seconds)
+Switch to a faster model by editing `start.bat`:
+
+```
+set WHISPER_MODEL=distil-large-v3
+```
+
+`distil-large-v3` is ~50%% faster with only a small accuracy reduction.

+ 122 - 0
install.bat

@@ -0,0 +1,122 @@
+@echo off
+setlocal enabledelayedexpansion
+title Church Transcription — Installation
+
+echo.
+echo ============================================================
+echo  Church Live Transcription Display — One-time Setup
+echo ============================================================
+echo.
+echo This will install all required software into a local
+echo virtual environment (.venv). It will NOT affect other
+echo Python programs on this computer.
+echo.
+echo Estimated time: 10-20 minutes (depends on internet speed).
+echo.
+pause
+
+:: ── Check Python ────────────────────────────────────────────────────────────
+
+echo [1/6] Checking Python version...
+python --version >nul 2>&1
+if errorlevel 1 (
+    echo.
+    echo ERROR: Python is not installed or not in PATH.
+    echo Please install Python 3.11 from https://python.org
+    echo Make sure you tick "Add Python to PATH" during install.
+    echo.
+    pause
+    exit /b 1
+)
+
+for /f "tokens=2 delims= " %%v in ('python --version 2^>^&1') do set PYVER=%%v
+echo Found Python %PYVER%
+
+:: ── Create virtual environment ───────────────────────────────────────────────
+
+echo.
+echo [2/6] Creating virtual environment in .venv\ ...
+if exist .venv (
+    echo .venv already exists — skipping creation.
+) else (
+    python -m venv .venv
+    if errorlevel 1 (
+        echo ERROR: Failed to create virtual environment.
+        pause
+        exit /b 1
+    )
+)
+
+call .venv\Scripts\activate.bat
+
+:: ── Upgrade pip ──────────────────────────────────────────────────────────────
+
+echo.
+echo [3/6] Upgrading pip...
+python -m pip install --upgrade pip --quiet
+
+:: ── Install PyTorch with CUDA ─────────────────────────────────────────────────
+
+echo.
+echo [4/6] Installing PyTorch with CUDA support (~2.5 GB download)...
+echo This is the longest step. Please wait.
+echo.
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
+if errorlevel 1 (
+    echo.
+    echo ERROR: PyTorch installation failed.
+    echo Check your internet connection and try again.
+    pause
+    exit /b 1
+)
+
+:: ── Install WhisperLiveKit ────────────────────────────────────────────────────
+
+echo.
+echo [5/6] Installing WhisperLiveKit and dependencies...
+echo.
+pip install whisperlivekit pyannote.audio
+if errorlevel 1 (
+    echo.
+    echo ERROR: WhisperLiveKit installation failed.
+    pause
+    exit /b 1
+)
+
+:: ── Install bridge dependencies ───────────────────────────────────────────────
+
+echo.
+echo [6/6] Installing bridge script dependencies...
+pip install -r bridge\requirements.txt
+if errorlevel 1 (
+    echo.
+    echo ERROR: Bridge dependencies failed to install.
+    pause
+    exit /b 1
+)
+
+:: ── Pre-download Whisper model ────────────────────────────────────────────────
+
+echo.
+echo Downloading Whisper large-v3 model (~3 GB) — this only happens once.
+echo.
+python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cuda', compute_type='float16')"
+if errorlevel 1 (
+    echo.
+    echo WARNING: Model pre-download failed. It will download on first start instead.
+    echo This is not critical — continuing.
+)
+
+:: ── Done ─────────────────────────────────────────────────────────────────────
+
+echo.
+echo ============================================================
+echo  Installation complete.
+echo ============================================================
+echo.
+echo Next steps:
+echo   1. Edit start.bat and add your HuggingFace token
+echo      (see SETUP.md Part 7 for instructions)
+echo   2. Double-click start.bat to launch the system
+echo.
+pause

+ 88 - 0
start.bat

@@ -0,0 +1,88 @@
+@echo off
+setlocal enabledelayedexpansion
+title Church Transcription — Launcher
+
+:: ════════════════════════════════════════════════════════════════════════════
+::  CONFIGURATION — edit these lines before first use
+:: ════════════════════════════════════════════════════════════════════════════
+
+:: Your HuggingFace access token (required for speaker diarization)
+:: Get one at https://huggingface.co/settings/tokens
+set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
+
+:: Whisper model to use:
+::   large-v3          — most accurate, needs ~6 GB VRAM, ~3 s latency
+::   distil-large-v3   — faster (~2 s latency), very slightly less accurate
+::   medium            — fallback if VRAM is limited (~4 GB VRAM)
+set WHISPER_MODEL=large-v3
+
+:: ════════════════════════════════════════════════════════════════════════════
+
+:: Check the token has been set
+if "%HF_TOKEN%"=="PASTE_YOUR_TOKEN_HERE" (
+    echo.
+    echo ERROR: HuggingFace token not configured.
+    echo.
+    echo Open start.bat in Notepad and replace PASTE_YOUR_TOKEN_HERE
+    echo with your token from https://huggingface.co/settings/tokens
+    echo.
+    echo See SETUP.md Part 7 for full instructions.
+    echo.
+    pause
+    exit /b 1
+)
+
+:: Check virtual environment exists
+if not exist .venv\Scripts\activate.bat (
+    echo.
+    echo ERROR: Virtual environment not found.
+    echo Please run install.bat first.
+    echo.
+    pause
+    exit /b 1
+)
+
+:: Check Mosquitto is running
+sc query mosquitto | find "RUNNING" >nul 2>&1
+if errorlevel 1 (
+    echo Starting Mosquitto MQTT broker...
+    net start mosquitto >nul 2>&1
+    if errorlevel 1 (
+        echo WARNING: Could not start Mosquitto. Is it installed?
+        echo See SETUP.md Part 4.
+        pause
+        exit /b 1
+    )
+)
+
+echo.
+echo ============================================================
+echo  Church Live Transcription Display
+echo ============================================================
+echo.
+echo Starting Whisper server in a new window...
+echo Starting bridge in a new window...
+echo.
+echo Both windows must stay open during the service.
+echo Close this window or both others to shut down.
+echo.
+
+:: Activate venv and launch WhisperLiveKit in its own window
+start "Whisper Transcription Server" cmd /k ^
+    "call .venv\Scripts\activate.bat && ^
+     set HF_TOKEN=%HF_TOKEN% && ^
+     echo Starting WhisperLiveKit (%WHISPER_MODEL%) with diarization... && ^
+     wlk --model %WHISPER_MODEL% --language en --diarization --hf-token %HF_TOKEN%"
+
+:: Brief pause so Whisper can begin loading before the bridge connects
+timeout /t 5 /nobreak >nul
+
+:: Activate venv and launch the bridge (speaker UI opens in this process)
+start "Transcription Bridge" cmd /k ^
+    "call .venv\Scripts\activate.bat && ^
+     echo Starting bridge... && ^
+     python bridge\bridge.py"
+
+echo Both windows launched. You can minimise this window.
+echo.
+pause