To run the Whisper model locally for audio transcription, you can use the official OpenAI Whisper repository or a community-maintained Python package like faster-whisper (for better performance) or whisper (official). Here’s a step-by-step guide:
pip is updated: pip install --upgrade pipbrew install ffmpegsudo apt install ffmpegChoose one of the following options:
bashpip install -U openai-whisper
bashpip install faster-whisper
bashwhisper audio_file.mp3 --model medium --language English
audio_file.mp3 with your audio file.--model can be tiny, base, small, medium, or large (larger = more accurate but slower).--language is optional (e.g., English, French).pythonfrom faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
transcribe.py and run with python transcribe.py.faster-whisper with CUDA for faster processing:
bashpip install faster-whisper --extra-index-url https://download.pytorch.org/whl/cu118
.txt, .srt, .vtt, or .json files. Use --output_format to specify.tiny or base) if you run out of RAM.To extract a segment from 30 to 40 minutes in an MP3 file using FFmpeg, use the following command:
bashffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c copy output.mp3
-i input.mp3: Specifies the input file.-ss 00:30:00: Sets the start time to 30 minutes.-to 00:40:00: Sets the end time to 40 minutes.-c copy: Copies the audio stream without re-encoding (fast and lossless).output.mp3: The output file name.bashffmpeg -i input.mp3 -ss 00:30:00 -to 00:40:00 -c:a libmp3lame -q:a 2 output.mp3
-c:a libmp3lame: Forces MP3 encoding.-q:a 2: Sets audio quality (2 = high quality, range is 0-9).When using faster-whisper, the Python script you run does not automatically save the transcription to a file—it only prints the results to the console by default. To save the transcription as a text file, you need to modify the script slightly.
Here’s a modified version of the script that saves the transcription to a .txt file:
pythonfrom faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) # Open a file to write the transcription with open("transcription.txt", "w", encoding="utf-8") as f: for segment in segments: line = f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\n" print(line.strip()) # Print to console f.write(line) # Write to file print("\nTranscription saved to 'transcription.txt'")
Save the script as transcribe.py and run it:
bashpython transcribe.py
transcription.txt in the same directory.If you want subtitles (.srt format), use this script:
pythonfrom faster_whisper import WhisperModel model_size = "medium" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio_file.mp3", beam_size=5) with open("transcription.srt", "w", encoding="utf-8") as f: for i, segment in enumerate(segments, start=1): start_time = f"{int(segment.start // 3600):02d}:{int((segment.start % 3600) // 60):02d}:{int(segment.start % 60):02d}" end_time = f"{int(segment.end // 3600):02d}:{int((segment.end % 3600) // 60):02d}:{int(segment.end % 60):02d}" f.write(f"{i}\n{start_time},000 --> {end_time},000\n{segment.text}\n\n") print("Transcription saved to 'transcription.srt'")
audio_file.mp3 is in the same directory as the script.