Drop a wav or mp3. The browser detects tempo, builds the beat grid, finds the downbeats (your cut points), and traces the energy curve — all locally, before any cloud call.
tempo via autocorrelation · downbeats via 4/4 energy heuristic
Step 02 · The director · gemini-3.5-flash
Set the brief.
Meta and style anchor every scene prompt. Lyrics let the director sync visuals to words; leave blank for instrumental. The LLM cuts on your downbeats and sets shot length by energy.
TitleArtistVisual style — appended to every promptMood
Lyrics — line per cut, or leave blank
Generation models
9:16 · 15s
Keyframe model
~$0.04 / image
Video model
8s clip · 720×1280 · 24fps
Snap cuts to beat
≤52–56% F1 · editable after
POST /api/storyboard · temperature 0.2 · thinkingBudget 0
Step 03 · Storyboard · gapless scenes on the downbeat
The cut list.
Five gapless scenes covering 0–15s, boundaries snapped to downbeats. Edit any visual prompt before you spend a cent on rendering. Timings are a suggestion — adjust them.
Visual anchor · appended to every prompt
Desert noir music video, neon-lit gas station at midnight, 35mm anamorphic film grain, deep shadows, magenta & amber practicals, lonesome dark-trap atmosphere.
Timeline
0:00 → 0:15cuts on 94 BPM
0s3s6s9s12s15s
5 scenes · gapless · 2–5s each
Step 04 · Keyframes → Veo clips
Render the scenes.
Each scene gets a keyframe (Nano Banana) then an image-to-video clip (Veo). Trial image quota is ~2/min, so the server backs off on 429 — generating all takes a few minutes.
keyframes 0/5 · clips 0/5est. $5.04
POST /api/keyframe · POST /api/clip · Veo polls ~8s × up to 60
Step 05 · ffmpeg merge → finished MP4
Ship the clip.
Each Veo clip is trimmed to its scene length, concatenated, then muxed with your original audio. Out comes a 720×1280 vertical MP4 — beat-synced, cuts on the downbeat.