This guide walks through everything needed to get the system running on a Windows 11 PC from scratch. Follow each section in order.
Total setup time: approximately 30–60 minutes (most of that is download time).
The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
natively on the host machine regardless. Packaging everything into a single
.exe is not practical for software of this type.
Instead this guide provides:
install.bat — run once to set everything upstart.bat — run each time to launch the full systemAfter setup, operation is a double-click.
Before starting, confirm your PC meets these requirements:
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 64-bit | Windows 11 |
| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
| VRAM | 6 GB | 8 GB+ |
| RAM | 16 GB | 32 GB |
| Storage | 10 GB free | 20 GB free |
| Internet | Required for setup | Not needed during services |
The RTX 4070 Super (tested hardware) runs
large-v3in real time comfortably.
You need an up-to-date NVIDIA driver. You do not need to install the CUDA Toolkit separately — PyTorch bundles everything it needs.
Or visit nvidia.com/drivers, enter your GPU model, download and run the installer.
Choose Express Installation.
Restart the PC when prompted.
Verify the driver is working:
Win + R, type cmd, press Enter.nvidia-smi and press Enter.You should see a table with your GPU name and driver version.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 560.x Driver Version: 560.x CUDA Version: 12.6 |
+-----------------------------------------------------------------------------+
| RTX 4070 Super ...
If this command is not found, the driver did not install correctly.
The NVIDIA driver alone is not enough. WhisperLiveKit uses faster-whisper
(via ctranslate2) for inference, which requires the CUDA runtime libraries to
be installed separately. Without this you will see cublas64_12.dll not found
and the server will fall back to CPU-only mode, making transcription too slow
for live use.
nvidia-smishowing "CUDA Version: 12.6" means your driver supports up to that version — it does not mean the toolkit is installed.
Select: Windows → x86_64 → 11 → exe (local)
Download and run the installer. Choose Custom install and ensure CUDA Runtime and cuBLAS are ticked.
Restart the PC after installation.
Verify:
nvcc --version
Expected: release 12.x, V12.x.xxx
If
nvccis not found, addC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\binto your system PATH (same method as the Mosquitto PATH fix in Part 4).
Python 3.12 is the required version. PyTorch (the AI engine that powers WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or 3.13, so newer versions will fail at the PyTorch install step.
If you already have Python 3.13 or 3.14 installed, do not uninstall it — just install 3.12 alongside it. Windows supports multiple Python versions at the same time and
install.batwill automatically pick the right one.
Go to python.org/downloads and look for the latest Python 3.12.x release. Download the Windows installer (64-bit).
Run the installer. On the first screen:
Once complete, verify in a new Command Prompt window:
py -3.12 --version
Expected output: Python 3.12.x
Mosquitto is the message relay between the PC and the display.
Download the Windows installer from
mosquitto.org/download — choose the
.exe installer for Windows.
Run the installer, accept all defaults.
Add Mosquitto to the system PATH (the installer does not do this automatically). Run Command Prompt as Administrator:
setx /M PATH "%PATH%;C:\Program Files\mosquitto"
Close and reopen the Command Prompt window after running this — PATH changes don't take effect in the current window.
Start Mosquitto as a Windows service (still as Administrator):
net start mosquitto
Set it to start automatically with Windows:
sc config mosquitto start=auto
Verify the tools are working:
mosquitto_sub -h localhost -t test -v
Leave this running in the background. If it shows no errors, Mosquitto is
working. Press Ctrl+C to stop the test.
The automatic speaker detection uses a model from HuggingFace that requires accepting its licence terms. This is free — it just needs an account.
Go to huggingface.co and create a free account.
Accept the licence for the diarization model:
If you skip this step, the server will fail to start with a 403 error.
Create an access token:
church-transcription (or anything you like)hf_
Save this token somewhere safe (Notepad or a password manager). You will
paste it into start.bat in Part 7.
The install.bat script in this folder does the following automatically:
.venv\Steps:
Open File Explorer and navigate to this project folder.
Double-click install.bat.
A Command Prompt window will open. You will see packages downloading and installing. This will take 10–20 minutes depending on your internet speed. The PyTorch download alone is ~2.5 GB.
Near the end you will see the Whisper model downloading for the first time:
Downloading model large-v3 (~3 GB) ...
Wait for this to complete. The model is cached after the first download.
Installation complete. the window will pause. Press any key
to close it.If install.bat fails — see the Troubleshooting section at the bottom.
Before running the system for the first time, you need to add your HuggingFace
token to the startup script. The token is passed as an environment variable
— start.bat sets it automatically before launching WhisperLiveKit, so
pyannote can download the diarization model.
Right-click start.bat → Edit (opens in Notepad).
Find this line near the top:
set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
Replace PASTE_YOUR_TOKEN_HERE with the token you copied in Part 5.
Example:
set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
Save the file (Ctrl+S).
start.bat.Two windows will open:
Server running on ws://0.0.0.0:8000.Verify the Whisper server is working:
http://localhost:8000Verify the display:
With the ESP32 powered on and connected to the same WiFi, send a test message. Open a third Command Prompt and run:
mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
The e-ink display should refresh within 2 seconds showing those three lines.
Full pipeline test:
[PASTOR] / [READER] label line
should appear between their sections.The bridge window shows a Speaker Name Mapping panel. The system automatically detects different speakers and labels them SPEAKER_00, SPEAKER_01, etc.
[PASTOR])
whenever the speaker changes.start.bat.DISPLAY READY when the ESP32 connects.SyntaxError: f-string expression part cannot include a backslashWhisperLiveKit requires Python 3.12+. Your virtual environment was built with Python 3.11. To fix:
.venv folder in the project directory.install.bat again — it will detect and use the newest compatible version.mosquitto_sub or mosquitto_pub is not recognisedThe Mosquitto installer sets up the Windows service but does not add its tools to the system PATH. Run Command Prompt as Administrator and execute:
setx /M PATH "%PATH%;C:\Program Files\mosquitto"
Close and reopen the Command Prompt, then retry the command.
nvidia-smi not foundThe NVIDIA driver is not installed or not in PATH. Re-run the driver installer and restart the PC.
python --version shows wrong version or "not found"Python was not added to PATH. Re-run the Python installer, choose "Modify", and tick "Add Python to environment variables".
No matching distribution foundPyTorch does not publish pre-built packages for Python 3.14 (or very new
versions). Install Python 3.12 from python.org alongside your current
version — they coexist safely. Then delete .venv and re-run install.bat;
it will automatically select Python 3.12.
If the error occurs on Python 3.12, the PyTorch download may have failed
mid-way. Delete .venv and re-run install.bat with a stable connection.
401 or 403Your HuggingFace token is incorrect, or you have not accepted the model licence terms. Re-check Part 5 — both model pages must have "Agree and access repository" clicked while logged into the same account that generated the token.
Check that the correct audio input device is selected:
MQTT_HOST in main.cpp matches the PC's IP address (ipconfig →
look for the WiFi adapter IPv4 address).sc query mosquittolarge-v3 is too slow (display lags more than 5–6 seconds)Switch to a faster model by editing start.bat:
set WHISPER_MODEL=distil-large-v3
distil-large-v3 is ~50%% faster with only a small accuracy reduction.