SETUP.md 13 KB

Setup Guide — Live Transcription Display

This guide walks through everything needed to get the system running on a Windows 11 PC from scratch. Follow each section in order.

Total setup time: approximately 30–60 minutes (most of that is download time).


Why no installer / executable?

The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed natively on the host machine regardless. Packaging everything into a single .exe is not practical for software of this type.

Instead this guide provides:

  • install.bat — run once to set everything up
  • start.bat — run each time to launch the full system

After setup, operation is a double-click.


Part 1 — System Requirements

Before starting, confirm your PC meets these requirements:

Requirement Minimum Recommended
OS Windows 10 64-bit Windows 11
GPU NVIDIA GTX 1060 6 GB NVIDIA RTX 3070 or better
VRAM 6 GB 8 GB+
RAM 16 GB 32 GB
Storage 10 GB free 20 GB free
Internet Required for setup Not needed during services

The RTX 4070 Super (tested hardware) runs large-v3 in real time comfortably. The RTX 5060 Ti (production hardware) also confirmed working.


Part 2 — NVIDIA Driver

You need an up-to-date NVIDIA driver. You will also need the CUDA Toolkit (Part 2b below) — the driver alone is not sufficient for all components.

  1. Open GeForce Experience (if installed) → Drivers → Check for updates.

Or visit nvidia.com/drivers, enter your GPU model, download and run the installer.

  1. Choose Express Installation.

  2. Restart the PC when prompted.

  3. Verify the driver is working:

    • Press Win + R, type cmd, press Enter.
    • Type nvidia-smi and press Enter.
    • You should see a table with your GPU name and driver version.

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 595.x   Driver Version: 595.x   CUDA Version: 13.x              |
      +-----------------------------------------------------------------------------+
      | RTX 5060 Ti ...
      

If this command is not found, the driver did not install correctly.


Part 2b — CUDA Toolkit

The NVIDIA driver alone is not enough for all GPU components. The CUDA Toolkit provides compiler tools (nvcc) and low-level libraries used by WhisperLiveKit.

nvidia-smi showing "CUDA Version: 13.x" means your driver supports up to that version — it does not mean the toolkit is installed.

  1. Go to developer.nvidia.com/cuda-downloads

  2. Select: Windows → x86_64 → 11 → exe (local)

  3. Download and run the installer. Choose Custom install and ensure CUDA Runtime and cuBLAS are ticked.

  4. Restart the PC after installation.

  5. Verify:

    nvcc --version
    

Expected output ends with something like release 13.x, V13.x.xxx (the exact version will match whatever you downloaded).

If nvcc is not found, add the toolkit's bin folder to your system PATH (e.g. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin) using the same method as the Mosquitto PATH fix in Part 4.

Triton kernel warning — after installing the CUDA Toolkit you will still see this at bridge startup:

Failed to launch Triton kernels, likely due to missing CUDA toolkit;
falling back to a slower median kernel implementation...

This message is misleading. The triton Python package does not support Windows — there is no Windows build. The fallback is expected and has no practical effect on transcription quality.


Part 3 — Python 3.12

Python 3.12 is the required version. PyTorch (the AI engine that powers WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or 3.13, so newer versions will fail at the PyTorch install step.

If you already have Python 3.13 or 3.14 installed, do not uninstall it — just install 3.12 alongside it. Windows supports multiple Python versions at the same time and install.bat will automatically pick the right one.

  1. Go to python.org/downloads and look for the latest Python 3.12.x release. Download the Windows installer (64-bit).

  2. Run the installer. On the first screen:

    • Tick "Add Python to PATH" (important — do this before clicking Install Now)
    • Click Install Now
  3. Once complete, verify in a new Command Prompt window:

    py -3.12 --version
    

Expected output: Python 3.12.x


Part 4 — Mosquitto (MQTT Broker)

Mosquitto is the message relay between the transcription bridge and the display.

  1. Download the Windows installer from mosquitto.org/download — choose the .exe installer for Windows.

  2. Run the installer, accept all defaults.

  3. Add Mosquitto to the system PATH (the installer does not do this automatically). Run Command Prompt as Administrator:

    setx /M PATH "%PATH%;C:\Program Files\mosquitto"
    

Close and reopen the Command Prompt window after running this — PATH changes don't take effect in the current window.

  1. Start Mosquitto as a Windows service (still as Administrator):

    net start mosquitto
    
  2. Set it to start automatically with Windows:

    sc config mosquitto start=auto
    
  3. Verify the tools are working:

    mosquitto_sub -h localhost -t test -v
    

Leave this running in the background. If it shows no errors, Mosquitto is working. Press Ctrl+C to stop the test.


Part 5 — HuggingFace Account (required for speaker diarization)

The automatic speaker detection uses a model from HuggingFace that requires accepting its licence terms. This is free — it just needs an account.

  1. Go to huggingface.co and create a free account.

  2. Accept the licence for the diarization model:

If you skip this step, the server will fail to start with a 403 error.

  1. Create an access token:

    • Go to huggingface.co/settings/tokens
    • Click New token
    • Name: church-transcription (or anything you like)
    • Role: Read
    • Click Generate token
    • Copy the token — it starts with hf_
  2. Save this token somewhere safe (Notepad or a password manager). You will paste it into start.bat in Part 7.


Part 6 — Run install.bat

The install.bat script in this folder does the following automatically:

  • Creates a Python virtual environment in .venv\
  • Installs PyTorch with CUDA support
  • Installs WhisperLiveKit
  • Installs the bridge script dependencies

Steps:

  1. Open File Explorer and navigate to this project folder.

  2. Double-click install.bat.

A Command Prompt window will open. You will see packages downloading and installing. This will take 10–20 minutes depending on your internet speed. The PyTorch download alone is ~2.5 GB.

  1. Near the end you will see the Whisper model downloading for the first time:

    Downloading model large-v3 (~3 GB) ...
    

Wait for this to complete. The model is cached after the first download.

  1. When you see Installation complete. the window will pause. Press any key to close it.

If install.bat fails — see the Troubleshooting section at the bottom.


Part 7 — Configure start.bat

Before running the system for the first time, you need to add your HuggingFace token to the startup script. The token is passed as an environment variablestart.bat sets it automatically before launching WhisperLiveKit, so pyannote can download the diarization model.

  1. Right-click start.batEdit (opens in Notepad).

  2. Find this line near the top:

    set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
    
  3. Replace PASTE_YOUR_TOKEN_HERE with the token you copied in Part 5. Example:

    set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
    
  4. Save the file (Ctrl+S).


Part 8 — First run

  1. Double-click start.bat.

Two Command Prompt windows will open:

  • Window 1 — Bridge: the transcription pipeline. Wait until you see Audio pipeline running.
  • Window 2 — Admin: the web server. Wait until it shows Application startup complete.
  1. Open the speaker admin page:

    • Open a browser and go to http://localhost:8001
    • You should see the Speaker Admin table.
  2. Open the display page on a tablet or spare screen:

    • On any device on the same WiFi, open http://[PC-IP]:8001/display
    • Press F11 for fullscreen. A green dot in the corner means it is connected and receiving updates.
  3. Send a test message to verify the full pipeline. Open a Command Prompt and run:

    mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
    

The display page should update immediately with those three lines.

  1. Full pipeline test:

    • Speak naturally into the microphone.
    • After a sentence or natural pause, text should appear on the display within 3–5 seconds.
    • If two people take turns speaking, a [PASTOR] / [READER] label line should appear between their sections.

Part 9 — Assigning speaker names

The speaker admin page at http://localhost:8001 shows all detected speakers. The system automatically labels them SPEAKER_00, SPEAKER_01, etc.

  • The defaults (Pastor, Reader, Guest, Choir) are loaded on first run.
  • When a new speaker appears, click their name in the table and type the correct name. Changes take effect within 5 seconds.
  • Speaker labels appear on the display as a gold heading line (e.g. PASTOR) whenever the speaker changes.

Ongoing use (every Sunday)

  1. Double-click start.bat.
  2. Wait ~30 seconds for both windows to show "ready" status.
  3. Open http://[PC-IP]:8001/display on the tablet and press F11.
  4. Begin the service — transcription runs automatically.
  5. Close both windows when done.

Troubleshooting

SyntaxError: f-string expression part cannot include a backslash

WhisperLiveKit requires Python 3.12+. Your virtual environment was built with Python 3.11. To fix:

  1. Install Python 3.12 or later from python.org/downloads (3.11 can stay — they coexist).
  2. Delete the .venv folder in the project directory.
  3. Run install.bat again — it will detect and use the newest compatible version.

mosquitto_sub or mosquitto_pub is not recognised

The Mosquitto installer sets up the Windows service but does not add its tools to the system PATH. Run Command Prompt as Administrator and execute:

setx /M PATH "%PATH%;C:\Program Files\mosquitto"

Close and reopen the Command Prompt, then retry the command.

nvidia-smi not found

The NVIDIA driver is not installed or not in PATH. Re-run the driver installer and restart the PC.

python --version shows wrong version or "not found"

Python was not added to PATH. Re-run the Python installer, choose "Modify", and tick "Add Python to environment variables".

install.bat fails with "torch" errors — No matching distribution found

PyTorch does not publish pre-built packages for Python 3.14 (or very new versions). Install Python 3.12 from python.org alongside your current version — they coexist safely. Then delete .venv and re-run install.bat; it will automatically select Python 3.12.

If the error occurs on Python 3.12, the PyTorch download may have failed mid-way. Delete .venv and re-run install.bat with a stable connection.

Whisper server fails with 401 or 403

Your HuggingFace token is incorrect, or you have not accepted the model licence terms. Re-check Part 5 — both model pages must have "Agree and access repository" clicked while logged into the same account that generated the token.

Bridge starts but no text appears on the display

Check that the correct audio input device is selected:

  • Open Windows Sound Settings → Input → ensure the microphone or audio interface is set as the default device.
  • Or set AUDIO_DEVICE to a specific device index in bridge/bridge.py.

Display page does not update

  • Check the green/red dot in the bottom-right corner of the display page. Red means the browser lost its connection to the admin server.
  • Confirm admin.py is running and accessible at http://[PC-IP]:8001.
  • Confirm Mosquitto is running: sc query mosquitto
  • Verify the PC's IP has not changed — tablets store the URL, so update it if the PC was assigned a new address.

large-v3 is too slow (display lags more than 5–6 seconds)

Switch to a faster model by editing bridge/bridge.py:

engine = TranscriptionEngine(model_size="distil-large-v3", ...)

distil-large-v3 is ~50% faster with only a small accuracy reduction.