Skip to content

transcribe β€” Audio/video to textΒΆ

Transcribe any audio or video file with word-level timestamps.

UsageΒΆ

praisonai-editor transcribe INPUT [OPTIONS]

OptionsΒΆ

Option Short Default Description
INPUT Audio or video file
--output -o stdout Output file path
--format -f srt Output format: srt, txt, json
--local False Use offline faster-whisper
--language auto Language code e.g. en, ta, es

ExamplesΒΆ

praisonai-editor transcribe video.mp4 --format srt --output video.srt
1
00:00:00,000 --> 00:00:03,240
Welcome everyone to today's session.

2
00:00:03,800 --> 00:00:07,100
We'll be covering the main topics.
praisonai-editor transcribe podcast.mp3 --format txt
praisonai-editor transcribe audio.mp3 --format json --output words.json
{
  "text": "Welcome everyone to today's session.",
  "words": [
    {"text": "Welcome", "start": 0.0, "end": 0.52, "confidence": 0.99},
    {"text": "everyone", "start": 0.58, "end": 1.10, "confidence": 0.99}
  ],
  "language": "en",
  "duration": 1823.4
}
praisonai-editor transcribe audio.mp3 --language ta
pip install "praisonai-editor[local]"
praisonai-editor transcribe audio.mp3 --local

How it worksΒΆ

flowchart LR
    A[audio/video] --> B[Extract MP3]
    B --> C{File > 10 min?}
    C -->|Yes| D[Split into\n10-min chunks]
    C -->|No| E[Single API call]
    D --> F[Whisper API\nΓ— N chunks]
    E --> F
    F --> G[Merge + offset\ntimestamps]
    G --> H[SRT / TXT / JSON]

Transcript cache

After transcribing, the result is cached at ~/.praisonai/editor/{file}/transcript.json. Running edit on the same file loads from cache β€” no repeat API calls.

Python API

from praisonai_editor.transcribe import transcribe_audio

result = transcribe_audio("podcast.mp3", language="en")
print(result.text)       # full text
print(result.to_srt())   # SRT subtitles
for w in result.words:
    print(w.start, w.text)