1234567# 1. Generate TTS narration egaki speech "Your narration text." --voice <id> -m sonic-3.5 -o public/narration.mp3 # 2. Transcribe for word timestamps egaki transcribe public/narration.mp3 --model whisper-1 # 3. Use timestamps in MDX
startSecond to frame delays using FPS:123456<Caption words={[ { word: "Just", delay: 0 }, { word: "quit", delay: 0.26 * FPS }, { word: "your", delay: 0.48 * FPS }, { word: "job", delay: 0.62 * FPS }, ]} />
123456789101112131415161718192021222324252627function Caption({ words }: { words: { word: string; delay: number }[] }) { const frame = useCurrentFrame() return ( <AbsoluteFill style={{ display: 'flex', alignItems: 'flex-end', justifyContent: 'center', padding: '0 80px 120px', }}> <span style={{ fontSize: 42, fontWeight: 400, color: '#f5d442', fontFamily: '"Georgia", serif', textAlign: 'center', lineHeight: 1.4, maxWidth: '70%', }}> {words.map((w, i) => ( <span key={i} style={{ opacity: frame >= w.delay ? 1 : 0 }}> {i > 0 ? ' ' : ''}{w.word} </span> ))} </span> </AbsoluteFill> ) }
opacity, not conditional renderingopacity: 0/1delay values.
Stale timestamps from a previous audio file cause words to appear out of sync.