# Setup Guide — Live Transcription Display This guide walks through everything needed to get the system running on a Windows 11 PC from scratch. Follow each section in order. **Total setup time: approximately 30–60 minutes** (most of that is download time). --- ## Why no installer / executable? The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed natively on the host machine regardless. Packaging everything into a single `.exe` is not practical for software of this type. Instead this guide provides: - `install.bat` — run **once** to set everything up - `start.bat` — run each time to launch the full system After setup, operation is a double-click. --- ## Part 1 — System Requirements Before starting, confirm your PC meets these requirements: | Requirement | Minimum | Recommended | |---|---|---| | OS | Windows 10 64-bit | Windows 11 | | GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better | | VRAM | 6 GB | 8 GB+ | | RAM | 16 GB | 32 GB | | Storage | 10 GB free | 20 GB free | | Internet | Required for setup | Not needed during services | > The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably. --- ## Part 2 — NVIDIA Driver You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA Toolkit separately — PyTorch bundles everything it needs. 1. Open **GeForce Experience** (if installed) → Drivers → Check for updates. **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your GPU model, download and run the installer. 2. Choose **Express Installation**. 3. Restart the PC when prompted. 4. Verify the driver is working: - Press `Win + R`, type `cmd`, press Enter. - Type `nvidia-smi` and press Enter. - You should see a table with your GPU name and driver version. ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 560.x Driver Version: 560.x CUDA Version: 12.6 | +-----------------------------------------------------------------------------+ | RTX 4070 Super ... ``` If this command is not found, the driver did not install correctly. --- ## Part 2b — CUDA Toolkit 12.x The NVIDIA driver alone is not enough. WhisperLiveKit uses **faster-whisper** (via ctranslate2) for inference, which requires the CUDA runtime libraries to be installed separately. Without this you will see `cublas64_12.dll not found` and the server will fall back to CPU-only mode, making transcription too slow for live use. > `nvidia-smi` showing "CUDA Version: 12.6" means your *driver supports* up > to that version — it does **not** mean the toolkit is installed. 1. Go to [developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads) 2. Select: **Windows → x86_64 → 11 → exe (local)** 3. Download and run the installer. Choose **Custom install** and ensure **CUDA Runtime** and **cuBLAS** are ticked. 4. Restart the PC after installation. 5. Verify: ``` nvcc --version ``` Expected: `release 12.x, V12.x.xxx` > If `nvcc` is not found, add `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin` > to your system PATH (same method as the Mosquitto PATH fix in Part 4). --- ## Part 3 — Python 3.12 Python 3.12 is the required version. PyTorch (the AI engine that powers WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or 3.13, so newer versions will fail at the PyTorch install step. > If you already have Python 3.13 or 3.14 installed, **do not uninstall it** > — just install 3.12 alongside it. Windows supports multiple Python versions > at the same time and `install.bat` will automatically pick the right one. 1. Go to [python.org/downloads](https://www.python.org/downloads/) and look for the latest **Python 3.12.x** release. Download the **Windows installer (64-bit)**. 2. Run the installer. On the first screen: - **Tick "Add Python to PATH"** (important — do this before clicking Install Now) - Click **Install Now** 3. Once complete, verify in a new Command Prompt window: ``` py -3.12 --version ``` Expected output: `Python 3.12.x` --- ## Part 4 — Mosquitto (MQTT Broker) Mosquitto is the message relay between the PC and the display. 1. Download the Windows installer from [mosquitto.org/download](https://mosquitto.org/download/) — choose the `.exe` installer for Windows. 2. Run the installer, accept all defaults. 3. **Add Mosquitto to the system PATH** (the installer does not do this automatically). Run Command Prompt **as Administrator**: ``` setx /M PATH "%PATH%;C:\Program Files\mosquitto" ``` Close and reopen the Command Prompt window after running this — PATH changes don't take effect in the current window. 4. Start Mosquitto as a Windows service (still as Administrator): ``` net start mosquitto ``` 5. Set it to start automatically with Windows: ``` sc config mosquitto start=auto ``` 6. Verify the tools are working: ``` mosquitto_sub -h localhost -t test -v ``` Leave this running in the background. If it shows no errors, Mosquitto is working. Press `Ctrl+C` to stop the test. --- ## Part 5 — HuggingFace Account (required for speaker diarization) The automatic speaker detection uses a model from HuggingFace that requires accepting its licence terms. This is free — it just needs an account. 1. Go to [huggingface.co](https://huggingface.co) and create a free account. 2. Accept the licence for the diarization model: - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) - Click **"Agree and access repository"** - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) - Click **"Agree and access repository"** > If you skip this step, the server will fail to start with a 403 error. 3. Create an access token: - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) - Click **New token** - Name: `church-transcription` (or anything you like) - Role: **Read** - Click **Generate token** - Copy the token — it starts with `hf_` 4. **Save this token somewhere safe** (Notepad or a password manager). You will paste it into `start.bat` in Part 7. --- ## Part 6 — Run install.bat The `install.bat` script in this folder does the following automatically: - Creates a Python virtual environment in `.venv\` - Installs PyTorch with CUDA support - Installs WhisperLiveKit - Installs the bridge script dependencies **Steps:** 1. Open File Explorer and navigate to this project folder. 2. Double-click **`install.bat`**. A Command Prompt window will open. You will see packages downloading and installing. This will take **10–20 minutes** depending on your internet speed. The PyTorch download alone is ~2.5 GB. 3. Near the end you will see the Whisper model downloading for the first time: ``` Downloading model large-v3 (~3 GB) ... ``` Wait for this to complete. The model is cached after the first download. 4. When you see `Installation complete.` the window will pause. Press any key to close it. > **If install.bat fails** — see the Troubleshooting section at the bottom. --- ## Part 7 — Configure start.bat Before running the system for the first time, you need to add your HuggingFace token to the startup script. The token is passed as an **environment variable** — `start.bat` sets it automatically before launching WhisperLiveKit, so pyannote can download the diarization model. 1. Right-click **`start.bat`** → **Edit** (opens in Notepad). 2. Find this line near the top: ```bat set HF_TOKEN=PASTE_YOUR_TOKEN_HERE ``` 3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5. Example: ```bat set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ ``` 4. Save the file (`Ctrl+S`). --- ## Part 8 — First run 1. Double-click **`start.bat`**. Two windows will open: - **Window 1 — Whisper Server**: shows the transcription engine loading. On first run this downloads the speaker diarization model (~500 MB). Wait until you see `Server running on ws://0.0.0.0:8000`. - **Window 2 — Bridge**: the speaker name mapping window appears, and the Command Prompt behind it shows connection status. 2. Verify the Whisper server is working: - Open a browser and go to `http://localhost:8000` - You should see a simple web interface. Speak into the microphone — text should appear. 3. Verify the display: - With the ESP32 powered on and connected to the same WiFi, send a test message. Open a third Command Prompt and run: ``` mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}" ``` - The e-ink display should refresh within 2 seconds showing those three lines. 4. Full pipeline test: - Speak naturally into the microphone. - After a sentence or natural pause, text should appear on the display within 3–5 seconds. - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line should appear between their sections. --- ## Part 9 — Assigning speaker names The bridge window shows a **Speaker Name Mapping** panel. The system automatically detects different speakers and labels them SPEAKER_00, SPEAKER_01, etc. - The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the bridge starts. - If a different person is speaking than expected, type their name in the matching row and click **Apply**. - Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`) whenever the speaker changes. --- ## Ongoing use (every Sunday) 1. Double-click `start.bat`. 2. Wait ~30 seconds for both windows to show "ready" status. 3. The display will show `DISPLAY READY` when the ESP32 connects. 4. Begin the service — transcription runs automatically. 5. Close both windows when done. --- ## Troubleshooting ### `SyntaxError: f-string expression part cannot include a backslash` WhisperLiveKit requires Python 3.12+. Your virtual environment was built with Python 3.11. To fix: 1. Install Python 3.12 or later from python.org/downloads (3.11 can stay — they coexist). 2. Delete the `.venv` folder in the project directory. 3. Run `install.bat` again — it will detect and use the newest compatible version. ### `mosquitto_sub` or `mosquitto_pub` is not recognised The Mosquitto installer sets up the Windows service but does not add its tools to the system PATH. Run Command Prompt **as Administrator** and execute: ```bat setx /M PATH "%PATH%;C:\Program Files\mosquitto" ``` Close and reopen the Command Prompt, then retry the command. ### `nvidia-smi` not found The NVIDIA driver is not installed or not in PATH. Re-run the driver installer and restart the PC. ### `python --version` shows wrong version or "not found" Python was not added to PATH. Re-run the Python installer, choose "Modify", and tick "Add Python to environment variables". ### install.bat fails with "torch" errors — `No matching distribution found` PyTorch does not publish pre-built packages for Python 3.14 (or very new versions). Install **Python 3.12** from python.org alongside your current version — they coexist safely. Then delete `.venv` and re-run `install.bat`; it will automatically select Python 3.12. If the error occurs on Python 3.12, the PyTorch download may have failed mid-way. Delete `.venv` and re-run `install.bat` with a stable connection. ### Whisper server fails with `401` or `403` Your HuggingFace token is incorrect, or you have not accepted the model licence terms. Re-check Part 5 — both model pages must have "Agree and access repository" clicked while logged into the same account that generated the token. ### Whisper server starts but no text appears Check that the correct audio input device is selected: - Open Windows **Sound Settings** → Input → ensure the microphone or audio interface is set as the default device. - The bridge uses the Windows default input device. ### Display does not update - Check the ESP32 Serial Monitor for WiFi/MQTT connection messages. - Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` → look for the WiFi adapter IPv4 address). - Confirm Mosquitto is running: `sc query mosquitto` ### `large-v3` is too slow (display lags more than 5–6 seconds) Switch to a faster model by editing `start.bat`: ``` set WHISPER_MODEL=distil-large-v3 ``` `distil-large-v3` is ~50%% faster with only a small accuracy reduction.