Whisper Audio Transcription

how to run whisper model locally to transcribe audio

Nov 13, 10:09pm

To run the Whisper model locally for audio transcription, you can use the official OpenAI Whisper repository or a community-maintained Python package like faster-whisper (for better performance) or whisper (official). Here’s a step-by-step guide:

1. Install Required Tools

Python 3.8 or later (recommended: 3.9+)
FFmpeg (for audio processing)

Install Python

Download from python.org
Ensure pip is updated: pip install --upgrade pip

Install FFmpeg

Windows: Download from FFmpeg’s official site and add to PATH.
Mac: brew install ffmpeg
Linux: sudo apt install ffmpeg

2. Install Whisper

Choose one of the following options:

Option A: Official Whisper (OpenAI)

bash
pip install -U openai-whisper

This installs the official Whisper model.

Option B: Faster-Whisper (Optimized)

bash
pip install faster-whisper

This is a faster, optimized version of Whisper.

3. Transcribe Audio

Using Official Whisper

bash
whisper audio_file.mp3 --model medium --language English

Replace audio_file.mp3 with your audio file.
--model can be tiny, base, small, medium, or large (larger = more accurate but slower).
--language is optional (e.g., English, French).

Using Faster-Whisper (Python Script)

python
from faster_whisper import WhisperModel

model_size = "medium"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_file.mp3", beam_size=5)

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Save this as transcribe.py and run with python transcribe.py.

4. Notes

GPU Acceleration: If you have an NVIDIA GPU, install faster-whisper with CUDA for faster processing:
```
bash
pip install faster-whisper --extra-index-url https://download.pytorch.org/whl/cu118
```
Output Formats: Whisper can output .txt, .srt, .vtt, or .json files. Use --output_format to specify.
Batch Processing: Use a loop in Python to process multiple files.

5. Troubleshooting

FFmpeg Errors: Ensure FFmpeg is installed and in your PATH.
CUDA Errors: Update your NVIDIA drivers and PyTorch.
Memory Issues: Use smaller models (tiny or base) if you run out of RAM.

Nov 13, 10:09pm

use ffmpeg to extract audio from 30 to 40 minutes in an mp3

Nov 13, 10:38pm

To extract a segment from 30 to 40 minutes in an MP3 file using FFmpeg, use the following command:

bash
ffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c copy output.mp3

Explanation of the command:

-i input.mp3: Specifies the input file.
-ss 00:30:00: Sets the start time to 30 minutes.
-to 00:40:00: Sets the end time to 40 minutes.
-c copy: Copies the audio stream without re-encoding (fast and lossless).
output.mp3: The output file name.

Notes:

If you encounter sync issues or errors, try re-encoding the audio:
```
bash
ffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c:a libmp3lame -q:a 2 output.mp3
```
- -c:a libmp3lame: Forces MP3 encoding.
- -q:a 2: Sets audio quality (2 = high quality, range is 0-9).

Nov 13, 10:38pm

i used faster-whisper - was expecting it to output text somehow?

Nov 13, 10:52pm

When using faster-whisper, the Python script you run does not automatically save the transcription to a file—it only prints the results to the console by default. To save the transcription as a text file, you need to modify the script slightly.

How to Save the Transcription to a Text File

1. Update the Script

Here’s a modified version of the script that saves the transcription to a .txt file:

python
from faster_whisper import WhisperModel

model_size = "medium"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_file.mp3", beam_size=5)

# Open a file to write the transcription
with open("transcription.txt", "w", encoding="utf-8") as f:
    for segment in segments:
        line = f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\n"
        print(line.strip())  # Print to console
        f.write(line)  # Write to file

print("\nTranscription saved to 'transcription.txt'")

2. Run the Script

Save the script as transcribe.py and run it:

bash
python transcribe.py

3. Check the Output

The transcription will be printed to the console and saved to transcription.txt in the same directory.

Optional: Save as SRT (Subtitle File)

If you want subtitles (.srt format), use this script:

python
from faster_whisper import WhisperModel

model_size = "medium"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_file.mp3", beam_size=5)

with open("transcription.srt", "w", encoding="utf-8") as f:
    for i, segment in enumerate(segments, start=1):
        start_time = f"{int(segment.start // 3600):02d}:{int((segment.start % 3600) // 60):02d}:{int(segment.start % 60):02d}"
        end_time = f"{int(segment.end // 3600):02d}:{int((segment.end % 3600) // 60):02d}:{int(segment.end % 60):02d}"
        f.write(f"{i}\n{start_time},000 --> {end_time},000\n{segment.text}\n\n")

print("Transcription saved to 'transcription.srt'")

Troubleshooting

If you get an error about the file not being found, ensure audio_file.mp3 is in the same directory as the script.
If the script runs but the output is empty, check if the audio file is valid and contains speech.

Nov 13, 10:52pm

This Chat is read-only. Login to resume chatting.

python

from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

python

from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) # Open a file to write the transcription with open("transcription.txt", "w", encoding="utf-8") as f: for segment in segments: line = f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\n" print(line.strip()) # Print to console f.write(line) # Write to file print("\nTranscription saved to 'transcription.txt'")

python

from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) with open("transcription.srt", "w", encoding="utf-8") as f: for i, segment in enumerate(segments, start=1): start_time = f"{int(segment.start // 3600):02d}:{int((segment.start % 3600) // 60):02d}:{int(segment.start % 60):02d}" end_time = f"{int(segment.end // 3600):02d}:{int((segment.end % 3600) // 60):02d}:{int(segment.end % 60):02d}" f.write(f"{i}\n{start_time},000 --> {end_time},000\n{segment.text}\n\n") print("Transcription saved to 'transcription.srt'")