how to run whisper model locally to transcribe audio
Nov 13, 10:09pm

To run the Whisper model locally for audio transcription, you can use the official OpenAI Whisper repository or a community-maintained Python package like faster-whisper (for better performance) or whisper (official). Here’s a step-by-step guide:


1. Install Required Tools

  • Python 3.8 or later (recommended: 3.9+)
  • FFmpeg (for audio processing)

Install Python

  • Download from python.org
  • Ensure pip is updated: pip install --upgrade pip

Install FFmpeg

  • Windows: Download from FFmpeg’s official site and add to PATH.
  • Mac: brew install ffmpeg
  • Linux: sudo apt install ffmpeg

2. Install Whisper

Choose one of the following options:

Option A: Official Whisper (OpenAI)

bash
pip install -U openai-whisper
  • This installs the official Whisper model.

Option B: Faster-Whisper (Optimized)

bash
pip install faster-whisper
  • This is a faster, optimized version of Whisper.

3. Transcribe Audio

Using Official Whisper

bash
whisper audio_file.mp3 --model medium --language English
  • Replace audio_file.mp3 with your audio file.
  • --model can be tiny, base, small, medium, or large (larger = more accurate but slower).
  • --language is optional (e.g., English, French).

Using Faster-Whisper (Python Script)

python
from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
  • Save this as transcribe.py and run with python transcribe.py.

4. Notes

  • GPU Acceleration: If you have an NVIDIA GPU, install faster-whisper with CUDA for faster processing:
    bash
    pip install faster-whisper --extra-index-url https://download.pytorch.org/whl/cu118
  • Output Formats: Whisper can output .txt, .srt, .vtt, or .json files. Use --output_format to specify.
  • Batch Processing: Use a loop in Python to process multiple files.

5. Troubleshooting

  • FFmpeg Errors: Ensure FFmpeg is installed and in your PATH.
  • CUDA Errors: Update your NVIDIA drivers and PyTorch.
  • Memory Issues: Use smaller models (tiny or base) if you run out of RAM.

Nov 13, 10:09pm
use ffmpeg to extract audio from 30 to 40 minutes in an mp3
Nov 13, 10:38pm

To extract a segment from 30 to 40 minutes in an MP3 file using FFmpeg, use the following command:

bash
ffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c copy output.mp3

Explanation of the command:

  • -i input.mp3: Specifies the input file.
  • -ss 00:30:00: Sets the start time to 30 minutes.
  • -to 00:40:00: Sets the end time to 40 minutes.
  • -c copy: Copies the audio stream without re-encoding (fast and lossless).
  • output.mp3: The output file name.

Notes:

  • If you encounter sync issues or errors, try re-encoding the audio:
    bash
    ffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c:a libmp3lame -q:a 2 output.mp3
    • -c:a libmp3lame: Forces MP3 encoding.
    • -q:a 2: Sets audio quality (2 = high quality, range is 0-9).
Nov 13, 10:38pm
i used faster-whisper - was expecting it to output text somehow?
Nov 13, 10:52pm

When using faster-whisper, the Python script you run does not automatically save the transcription to a file—it only prints the results to the console by default. To save the transcription as a text file, you need to modify the script slightly.


How to Save the Transcription to a Text File

1. Update the Script

Here’s a modified version of the script that saves the transcription to a .txt file:

python
from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) # Open a file to write the transcription with open("transcription.txt", "w", encoding="utf-8") as f: for segment in segments: line = f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\n" print(line.strip()) # Print to console f.write(line) # Write to file print("\nTranscription saved to 'transcription.txt'")

2. Run the Script

Save the script as transcribe.py and run it:

bash
python transcribe.py

3. Check the Output

  • The transcription will be printed to the console and saved to transcription.txt in the same directory.

Optional: Save as SRT (Subtitle File)

If you want subtitles (.srt format), use this script:

python
from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) with open("transcription.srt", "w", encoding="utf-8") as f: for i, segment in enumerate(segments, start=1): start_time = f"{int(segment.start // 3600):02d}:{int((segment.start % 3600) // 60):02d}:{int(segment.start % 60):02d}" end_time = f"{int(segment.end // 3600):02d}:{int((segment.end % 3600) // 60):02d}:{int(segment.end % 60):02d}" f.write(f"{i}\n{start_time},000 --> {end_time},000\n{segment.text}\n\n") print("Transcription saved to 'transcription.srt'")

Troubleshooting

  • If you get an error about the file not being found, ensure audio_file.mp3 is in the same directory as the script.
  • If the script runs but the output is empty, check if the audio file is valid and contains speech.
Nov 13, 10:52pm

This Chat is read-only. Login to resume chatting.

from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) # Open a file to write the transcription with open("transcription.txt", "w", encoding="utf-8") as f: for segment in segments: line = f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\n" print(line.strip()) # Print to console f.write(line) # Write to file print("\nTranscription saved to 'transcription.txt'")
from faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) with open("transcription.srt", "w", encoding="utf-8") as f: for i, segment in enumerate(segments, start=1): start_time = f"{int(segment.start // 3600):02d}:{int((segment.start % 3600) // 60):02d}:{int(segment.start % 60):02d}" end_time = f"{int(segment.end // 3600):02d}:{int((segment.end % 3600) // 60):02d}:{int(segment.end % 60):02d}" f.write(f"{i}\n{start_time},000 --> {end_time},000\n{segment.text}\n\n") print("Transcription saved to 'transcription.srt'")