# Setup Guide — Live Transcription Display This guide walks through everything needed to get the system running on a Windows 11 PC from scratch. Follow each section in order. **Total setup time: approximately 30–60 minutes** (most of that is download time). --- ## Why no installer / executable? The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed natively on the host machine regardless. Packaging everything into a single `.exe` is not practical for software of this type. Instead this guide provides: - `install.bat` — run **once** to set everything up - `start.bat` — run each time to launch the full system After setup, operation is a double-click. --- ## Part 1 — System Requirements Before starting, confirm your PC meets these requirements: | Requirement | Minimum | Recommended | | --- | --- | --- | | OS | Windows 10 64-bit | Windows 11 | | GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better | | VRAM | 6 GB | 8 GB+ | | RAM | 16 GB | 32 GB | | Storage | 10 GB free | 20 GB free | | Internet | Required for setup | Not needed during services | > The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably. > The RTX 5060 Ti (production hardware) also confirmed working. --- ## Part 2 — NVIDIA Driver You need an up-to-date NVIDIA driver. You will also need the CUDA Toolkit (Part 2b below) — the driver alone is not sufficient for all components. 1. Open **GeForce Experience** (if installed) → Drivers → Check for updates. **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your GPU model, download and run the installer. 2. Choose **Express Installation**. 3. Restart the PC when prompted. 4. Verify the driver is working: - Press `Win + R`, type `cmd`, press Enter. - Type `nvidia-smi` and press Enter. - You should see a table with your GPU name and driver version. ```text +-----------------------------------------------------------------------------+ | NVIDIA-SMI 595.x Driver Version: 595.x CUDA Version: 13.x | +-----------------------------------------------------------------------------+ | RTX 5060 Ti ... ``` If this command is not found, the driver did not install correctly. --- ## Part 2b — CUDA Toolkit The NVIDIA driver alone is not enough for all GPU components. The CUDA Toolkit provides compiler tools (`nvcc`) and low-level libraries used by WhisperLiveKit. > `nvidia-smi` showing "CUDA Version: 13.x" means your *driver supports* up > to that version — it does **not** mean the toolkit is installed. 1. Go to [developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads) 2. Select: **Windows → x86_64 → 11 → exe (local)** 3. Download and run the installer. Choose **Custom install** and ensure **CUDA Runtime** and **cuBLAS** are ticked. 4. Restart the PC after installation. 5. Verify: ```cmd nvcc --version ``` Expected output ends with something like `release 13.x, V13.x.xxx` (the exact version will match whatever you downloaded). > If `nvcc` is not found, add the toolkit's `bin` folder to your system > PATH (e.g. `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin`) > using the same method as the Mosquitto PATH fix in Part 4. **Triton kernel warning** — after installing the CUDA Toolkit you will still see this at bridge startup: ```text Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation... ``` This message is **misleading**. The `triton` Python package does not support Windows — there is no Windows build. The fallback is expected and has no practical effect on transcription quality. --- ## Part 3 — Python 3.12 Python 3.12 is the required version. PyTorch (the AI engine that powers WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or 3.13, so newer versions will fail at the PyTorch install step. > If you already have Python 3.13 or 3.14 installed, **do not uninstall it** > — just install 3.12 alongside it. Windows supports multiple Python versions > at the same time and `install.bat` will automatically pick the right one. 1. Go to [python.org/downloads](https://www.python.org/downloads/) and look for the latest **Python 3.12.x** release. Download the **Windows installer (64-bit)**. 2. Run the installer. On the first screen: - **Tick "Add Python to PATH"** (important — do this before clicking Install Now) - Click **Install Now** 3. Once complete, verify in a new Command Prompt window: ```cmd py -3.12 --version ``` Expected output: `Python 3.12.x` --- ## Part 4 — Mosquitto (MQTT Broker) Mosquitto is the message relay between the transcription bridge and the display. 1. Download the Windows installer from [mosquitto.org/download](https://mosquitto.org/download/) — choose the `.exe` installer for Windows. 2. Run the installer, accept all defaults. 3. **Add Mosquitto to the system PATH** (the installer does not do this automatically). Run Command Prompt **as Administrator**: ```cmd setx /M PATH "%PATH%;C:\Program Files\mosquitto" ``` Close and reopen the Command Prompt window after running this — PATH changes don't take effect in the current window. 4. Start Mosquitto as a Windows service (still as Administrator): ```cmd net start mosquitto ``` 5. Set it to start automatically with Windows: ```cmd sc config mosquitto start=auto ``` 6. Verify the tools are working: ```cmd mosquitto_sub -h localhost -t test -v ``` Leave this running in the background. If it shows no errors, Mosquitto is working. Press `Ctrl+C` to stop the test. --- ## Part 5 — HuggingFace Account (required for speaker diarization) The automatic speaker detection uses a model from HuggingFace that requires accepting its licence terms. This is free — it just needs an account. 1. Go to [huggingface.co](https://huggingface.co) and create a free account. 2. Accept the licence for the diarization model: - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) - Click **"Agree and access repository"** - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) - Click **"Agree and access repository"** > If you skip this step, the server will fail to start with a 403 error. 3. Create an access token: - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) - Click **New token** - Name: `church-transcription` (or anything you like) - Role: **Read** - Click **Generate token** - Copy the token — it starts with `hf_` 4. **Save this token somewhere safe** (Notepad or a password manager). You will paste it into `start.bat` in Part 7. --- ## Part 6 — Run install.bat The `install.bat` script in this folder does the following automatically: - Creates a Python virtual environment in `.venv\` - Installs PyTorch with CUDA support - Installs WhisperLiveKit - Installs the bridge script dependencies **Steps:** 1. Open File Explorer and navigate to this project folder. 2. Double-click **`install.bat`**. A Command Prompt window will open. You will see packages downloading and installing. This will take **10–20 minutes** depending on your internet speed. The PyTorch download alone is ~2.5 GB. 3. Near the end you will see the Whisper model downloading for the first time: ```text Downloading model large-v3 (~3 GB) ... ``` Wait for this to complete. The model is cached after the first download. 4. When you see `Installation complete.` the window will pause. Press any key to close it. > **If install.bat fails** — see the Troubleshooting section at the bottom. --- ## Part 7 — Configure start.bat Before running the system for the first time, you need to add your HuggingFace token to the startup script. The token is passed as an **environment variable** — `start.bat` sets it automatically before launching WhisperLiveKit, so pyannote can download the diarization model. 1. Right-click **`start.bat`** → **Edit** (opens in Notepad). 2. Find this line near the top: ```bat set HF_TOKEN=PASTE_YOUR_TOKEN_HERE ``` 3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5. Example: ```bat set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ ``` 4. Save the file (`Ctrl+S`). --- ## Part 8 — First run 1. Double-click **`start.bat`**. Two Command Prompt windows will open: - **Window 1 — Bridge**: the transcription pipeline. Wait until you see `Audio pipeline running`. - **Window 2 — Admin**: the web server. Wait until it shows `Application startup complete`. 2. Open the speaker admin page: - Open a browser and go to `http://localhost:8001` - You should see the Speaker Admin table. 3. Open the display page on a tablet or spare screen: - On any device on the same WiFi, open `http://[PC-IP]:8001/display` - Press `F11` for fullscreen. A green dot in the corner means it is connected and receiving updates. 4. Send a test message to verify the full pipeline. Open a Command Prompt and run: ```cmd mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}" ``` The display page should update immediately with those three lines. 5. Full pipeline test: - Speak naturally into the microphone. - After a sentence or natural pause, text should appear on the display within 3–5 seconds. - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line should appear between their sections. --- ## Part 9 — Assigning speaker names The speaker admin page at `http://localhost:8001` shows all detected speakers. The system automatically labels them `SPEAKER_00`, `SPEAKER_01`, etc. - The defaults (Pastor, Reader, Guest, Choir) are loaded on first run. - When a new speaker appears, click their name in the table and type the correct name. Changes take effect within 5 seconds. - Speaker labels appear on the display as a gold heading line (e.g. `PASTOR`) whenever the speaker changes. --- ## Ongoing use (every Sunday) 1. Double-click `start.bat`. 2. Wait ~30 seconds for both windows to show "ready" status. 3. Open `http://[PC-IP]:8001/display` on the tablet and press `F11`. 4. Begin the service — transcription runs automatically. 5. Close both windows when done. --- ## Troubleshooting ### `SyntaxError: f-string expression part cannot include a backslash` WhisperLiveKit requires Python 3.12+. Your virtual environment was built with Python 3.11. To fix: 1. Install Python 3.12 or later from python.org/downloads (3.11 can stay — they coexist). 2. Delete the `.venv` folder in the project directory. 3. Run `install.bat` again — it will detect and use the newest compatible version. ### `mosquitto_sub` or `mosquitto_pub` is not recognised The Mosquitto installer sets up the Windows service but does not add its tools to the system PATH. Run Command Prompt **as Administrator** and execute: ```bat setx /M PATH "%PATH%;C:\Program Files\mosquitto" ``` Close and reopen the Command Prompt, then retry the command. ### `nvidia-smi` not found The NVIDIA driver is not installed or not in PATH. Re-run the driver installer and restart the PC. ### `python --version` shows wrong version or "not found" Python was not added to PATH. Re-run the Python installer, choose "Modify", and tick "Add Python to environment variables". ### install.bat fails with "torch" errors — `No matching distribution found` PyTorch does not publish pre-built packages for Python 3.14 (or very new versions). Install **Python 3.12** from python.org alongside your current version — they coexist safely. Then delete `.venv` and re-run `install.bat`; it will automatically select Python 3.12. If the error occurs on Python 3.12, the PyTorch download may have failed mid-way. Delete `.venv` and re-run `install.bat` with a stable connection. ### Whisper server fails with `401` or `403` Your HuggingFace token is incorrect, or you have not accepted the model licence terms. Re-check Part 5 — both model pages must have "Agree and access repository" clicked while logged into the same account that generated the token. ### Bridge starts but no text appears on the display Check that the correct audio input device is selected: - Open Windows **Sound Settings** → Input → ensure the microphone or audio interface is set as the default device. - Or set `AUDIO_DEVICE` to a specific device index in `bridge/bridge.py`. ### Display page does not update - Check the green/red dot in the bottom-right corner of the display page. Red means the browser lost its connection to the admin server. - Confirm `admin.py` is running and accessible at `http://[PC-IP]:8001`. - Confirm Mosquitto is running: `sc query mosquitto` - Verify the PC's IP has not changed — tablets store the URL, so update it if the PC was assigned a new address. ### `large-v3` is too slow (display lags more than 5–6 seconds) Switch to a faster model by editing `bridge/bridge.py`: ```python engine = TranscriptionEngine(model_size="distil-large-v3", ...) ``` `distil-large-v3` is ~50% faster with only a small accuracy reduction.