1egaki transcribe recording.mp3
1egaki transcribe recording.mp3 --model whisper-1
| Model | Provider | Notes |
whisper-1 | OpenAI | Reliable, good accuracy |
ink-whisper | Cartesia | Cheapest option |
scribe_v1 | ElevenLabs | High accuracy |
nova-3 | Deepgram | Fast |
whisper-large-v3 | Groq | Fast, open weights |
whisper-large-v3-turbo | Groq | Fastest |
distil-whisper-large-v3-en | Groq | English only, very fast |
gpt-4o-transcribe and gpt-4o-mini-transcribe do not support word timestamps.
The OpenAI API rejects verbose_json for these models.1egaki transcribe recording.mp3 -o transcript.json
1234567# 1. Generate TTS narration egaki speech "Your narration text." --voice <id> -m sonic-3.5 -o public/narration.mp3 # 2. Transcribe to get word timestamps egaki transcribe public/narration.mp3 --model whisper-1 # 3. Use the timestamps in your MDX video
startSecond to frame delays using FPS:12345<Caption words={[ { word: "Your", delay: 0 }, { word: "narration", delay: 0.26 * FPS }, { word: "text.", delay: 0.48 * FPS }, ]} />