Captions

Add word-by-word captions synced to narration audio. The workflow is: transcribe audio for timestamps, convert to frame delays, render a Caption component.

Ask AI about this page

Workflow

1234567# 1. Generate TTS narration
egaki speech "Your narration text." --voice <id> -m sonic-3.5 -o public/narration.mp3

# 2. Transcribe for word timestamps
egaki transcribe public/narration.mp3 --model whisper-1

# 3. Use timestamps in MDX

Using timestamps in MDX

Convert each word's startSecond to frame delays using FPS:

123456<Caption words={[
  { word: "Just", delay: 0 },
  { word: "quit", delay: 0.26 * FPS },
  { word: "your", delay: 0.48 * FPS },
  { word: "job", delay: 0.62 * FPS },
]} />

Caption component pattern

The default style uses film-style subtitles: Georgia serif, soft yellow, bottom-positioned.

123456789101112131415161718192021222324252627function Caption({ words }: { words: { word: string; delay: number }[] }) {
  const frame = useCurrentFrame()
  return (
    <AbsoluteFill style={{
      display: 'flex',
      alignItems: 'flex-end',
      justifyContent: 'center',
      padding: '0 80px 120px',
    }}>
      <span style={{
        fontSize: 42,
        fontWeight: 400,
        color: '#f5d442',
        fontFamily: '"Georgia", serif',
        textAlign: 'center',
        lineHeight: 1.4,
        maxWidth: '70%',
      }}>
        {words.map((w, i) => (
          <span key={i} style={{ opacity: frame >= w.delay ? 1 : 0 }}>
            {i > 0 ? ' ' : ''}{w.word}
          </span>
        ))}
      </span>
    </AbsoluteFill>
  )
}

Key rules:

Render ALL words always; toggle visibility with opacity, not conditional rendering
This prevents layout shift as words appear
No fade animation; instant opacity: 0/1

When you regenerate TTS audio, always re-transcribe and update all delay values. Stale timestamps from a previous audio file cause words to appear out of sync.