Skip to main content

AI Tutorial

Transcribe and Translate Videos for Free Using Local AI

Learn how to transcribe and translate any video or audio file for free using a local AI setup with Whisper, fast, private, and easy to install.

Editorial StaffJune 9, 20263 min read

Share

In this walkthrough, you’ll learn how to process video or audio files locally to generate transcripts and translations. Skip unreliable online services and produce accurate results directly from your terminal in just a few minutes.

Who This Is For

  • Editors working with video or podcast content
  • Users who prioritize privacy when handling recordings
  • Anyone managing large volumes of media for transcription or translation

What You’ll Create

A local setup that allows you to transcribe any media file using a single command. You’ll also be able to automatically convert non-English speech into English. After setup, the system is reusable with no ongoing cost.

Step 1 Install ffmpeg and Whisper.webp

What You Need

  • A Mac or Windows machine
  • Python 3 installed (pre-installed on most Macs)
  • Homebrew (Mac) or Chocolatey (Windows) to install ffmpeg
  • A media file to process
  • Around five minutes for setup

Step 1: Install ffmpeg and Whisper

Launch your terminal. On Mac, press Cmd + Space, search “Terminal,” and open it. Start by installing ffmpeg, which allows Whisper to read media files.

Mac:
brew install ffmpeg

If Homebrew isn’t installed, use an AI assistant to install it quickly.

Windows:
choco install ffmpeg

Next, confirm Python is installed:

python3 --version

If a version appears, you’re set. Otherwise, install Python 3.

Now install Whisper:

pip3 install -U openai-whisper

That’s all, these commands only need to be run once.

Tip: If pip3 doesn’t work, try: python3 -m pip install -U openai-whisper.

Step 2: Transcribe a File

Choose a video or audio file and copy its file path.

Run the following command:

python3 -m whisper "[your file path]" --model base

Example:

python3 -m whisper "/Users/you/Downloads/video.mp4" --model base

Whisper will automatically detect the language and begin generating a transcript with timestamps. A short video typically takes just a couple of minutes.

Once complete, several files will appear:

  • .txt for plain text
  • .srt for subtitles (ideal for video editors)
  • .vtt, .tsv, and .json for other formats

To control output location and format:

python3 -m whisper "[your file]" --model base --output_dir "/Users/you/Downloads" --output_format txt

The base model balances speed and accuracy well, though larger models are available if needed.

Tip: Use the .srt file for captions, it works directly with most editing software.

Step 3: Translate Audio to English

To translate non-English speech, use the same command and add a translation flag: python3 -m whisper "[your file path]" --model base --task translate

Without this flag, the output remains in the original language. With it, you’ll get an English version automatically.

Take It Further

To process multiple files at once, navigate to a folder containing your videos and run: for f in *.mp4; do python3 -m whisper "$f" --model base; done

This command transcribes every file in the directory, allowing you to handle bulk content efficiently.

Editorial Staff

Editorial Staff

The Editorial Staff at AIChief is a team of Professional Content writers with extensive experience in the field of AI and Marketing. AIChief was Founded in 2025, AIChief has quickly grown to become the largest free AI resource hub in the industry. Stay connected with them on Facebook, Instagram and X for the latest updates.

View All Posts

User Comments

Filter:
No comments yet. Be the first to comment!