Skip to main content

Build a self hosted speech to text server using Whisper

Whisper is a general-purpose speech recognition model introduced by OpenAi. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Whisper github

In this article, we are going to host a local spe ech recognition server using Whishper powered by whisper

INFO: Github - pluja/whishper > Documentation

Installation

We will be using docker and docker-compose to host the server. Make sure you have both installed on your machine.

We can simply run the shell script which automatically set up all necessary configurations and dependencies.

Get the script

curl -fsSL -o get-whishper.sh https://raw.githubusercontent.com/pluja/whishper/main/get-whishper.sh

INFO: script brackdown

  • [check docker installation](# Check if docker is installed). If not, try install.
  • [Confirm the docker volume location](# Ask if user wants to get everything in the current directory or in a new directory)
  • [If .env not exist, create a default one](# check if .env exists)
  • Finally generate all the dependencies and run docker compose

Then we can access the server on http://ip-address:8082

danger

You may run into the issue below if you try to access the server from different origin because by default, it allows only local host only accessable.

cat transcription.err.log
An error occured while synchronizing the model Systran/faster-whisper-tiny from the Hugging Face Hub:
Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input.
Trying to load the model directly from the local cache, if it exists.
/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1204: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
warnings.warn(
An error occured while synchronizing the model Systran/faster-whisper-small from the Hugging Face Hub:
Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input.

To fix it, we will need to change the WHISHPER_HOST in the .env file to 0.0.0.0:8082 and restart the server.

WHISHPER_HOST=your-whishper-server-ip:8082
WHISHPER_HOST=127.0.0.1:8082

And recreate the docker container.

sudo docker compose up -d
# Do not use restart, restart does not apply the docker-compose file change

Option 2

We can also set up everything by ourself

First, clone the repository and navigate into the directory using the following command

git clone https://github.com/pluja/whishper.git
cd whishper

Now, create a .env file and edit it

# Libretranslate Configuration
## Check out https://github.com/LibreTranslate/LibreTranslate#configuration-parameters for more libretranslate configuration options
LT_LOAD_ONLY=es,en,fr

# Whisper Configuration
WHISPER_MODELS=tiny,small
WHISHPER_HOST=http://0.0.0.1:8082

# Database Configuration
DB_USER=whishper
DB_PASS=whishper

You can change the port and model as per your requirement. The model can be one of tiny, base, small, medium, large.

After the setup is done, we can simply start the server using docker-compose

docker-compose up -d