Comprehensive knowledge base for Google Flow (labs.google/fx/tools/flow), Google's AI filmmaking platform powered by Veo 3.1. Use this skill whenever the user asks about Google Flow, Veo 3/3.1, AI video generation with Google tools, or needs help creating cinematic video content, building a multi-platform AI video pipeline, prompting for video generation, or integrating Flow with Gemini image gen, Midjourney, Kling AI, CapCut, Pika Labs, or Runway. Trigger for questions like "how do I use Google Flow", "Veo 3 prompts", "AI video pipeline", "Flow vs Kling", "ingredients to video", "Flow audio generation", "how to get consistent characters in Flow", "how do I use Midjourney images in Flow", or any mention of AI video creation for dance, fashion, hip-hop, or urban content. Also use when designing any professional AI content creation workflow that includes video generation tools.
From maycrest-createnpx claudepluginhub coreymaypray/sloth-skill-tree --plugin maycrest-createThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Transforms raw data into narratives with story structures, visuals, and frameworks for executive presentations, analytics reports, and stakeholder communications.
Google Flow is Google's consolidated AI filmmaking platform at labs.google/fx/tools/flow, powered by Veo 3.1 from DeepMind. It is the first major AI video tool to generate synchronized dialogue, SFX, ambient audio, and music natively alongside video. The platform absorbed Google Whisk and ImageFX in a February 2026 redesign, making it a unified image → video → scene workspace.
Access: Public (149+ countries), web-only (Chrome recommended), 18+, no mobile app.
Status as of March 2026: Generally available, actively being updated. Scene Builder and some audio features still show instability.
Three DeepMind models power Flow in one interface:
Projects contain: asset library, generation interface with multiple modes, and Scene Builder timeline.
| Tool | Function |
|---|---|
| Extend | Continue a clip from its last frame; up to ~60s total (coherence degrades after 16–24s) |
| Insert Object | Add new elements to existing scenes via text prompt + drag-box placement |
| Remove Object | AI video inpainting — remove unwanted elements |
| Camera Controls | Post-generation camera redirection: pan, tilt, zoom, dolly, tracking, crane |
| Drawing Instructions | Draw directly on reference images to indicate desired movement direction |
Drag-and-drop timeline for multi-clip sequencing. Key features:
Known limitations (as of March 2026):
Veo 3/3.1 natively generates synchronized audio — this is Flow's primary competitive differentiator vs. all other AI video tools.
Audio types generated:
Prompting audio effectively:
She says, "We need to leave now."SFX: sneakers squeaking on hardwood floorUpbeat Chicago footwork beat, 808 bass, high-hat patternsNano Banana Pro is built directly into Flow — the most seamless image-to-video pipeline available.
Recommended two-stage workflow:
Ingredients workflow:
External images: Flow accepts uploads from any source — Midjourney, Photoshop, photography. Drag into workspace or use Add button.
1. Gemini → Script, prompt brainstorming, storyboard outlines
2. Midjourney → Hero character images, distinctive aesthetic frames
Nano Banana Pro → Supporting references, ecosystem-consistent images
3. Flow/Veo 3.1 → Dialogue-heavy scenes, audio-native narrative content
4. Kling AI → Fast B-roll, native 4K close-ups, motion-heavy dance/action
5. Pika Labs → Viral effects (explode, melt, inflate), stylized social content
6. CapCut → Final assembly, color normalization, captions, publishing
Estimated cost: $77–356/month depending on tier mix.
| Flow/Veo 3.1 | Kling 3.0 | |
|---|---|---|
| Audio | ✅ Native dialogue + SFX + music | ❌ Audio synthesis less mature |
| Resolution | Upscaled 4K (Ultra) | ✅ True native 4K |
| Motion quality | 24 FPS cinematic | ✅ 30 FPS, superior motion tracking |
| Clip length | 8s base, ~60s chained | 15s standard (5 min avatar) |
| Character consistency | Ingredients-dependent | ✅ AI Director system |
| Price (Pro) | $19.99/month | ~$37/month |
| Best for | Dialogue, audio-native content | Dance, fast B-roll, high-volume |
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
Example:
"Medium tracking shot of a dancer in streetwear performing explosive hip-hop footwork on a gritty urban sidewalk at dusk. Camera follows low, emphasizing rapid foot movements. Neon signs reflect off wet pavement. Cinematic, shallow depth of field. Heavy bass beat and sneaker scuffs on concrete."
Movement: dolly in/out, tracking shot, crane shot, aerial view, slow pan left/right, tilt up/down, POV shot, static shot, arc shot
Composition: wide shot, close-up, extreme close-up, low angle, two-shot, over-the-shoulder, medium shot
Lens: shallow depth of field, wide-angle lens, soft focus, macro lens, rack focus, 35mm lens look
cinematic — #1 quality trigger (professional lighting, color grading, composition)film grain — analog texturegolden hour — warm flattering lightneon-lit — urban/cyberpunkwarm orange-teal grade — popular cinematic color grading35mm lens look — classic street photography aestheticwet pavement — dramatically improves urban scene quality (reflections are a model strength)[00:00-00:02] Medium shot from behind a young explorer approaching a glowing cave entrance...
[00:02-00:04] Reverse shot of the explorer's face, eyes widening with wonder...
[00:04-00:06] Tracking shot following the explorer into the cave...
[00:06-00:08] Wide, high-angle crane shot revealing the vast crystal cavern...
No subtitles. No text overlays. when you don't want on-screen textDance/Footwork:
Fashion Showcases:
silk swirling, linen draping, leather catching lightUrban Street Scenes:
35mm lens look + wet pavement + neon-lit = reliable urban aestheticSFX: traffic, distant sirens, crowd murmurCinematic Hip-Hop Visuals:
| Limitation | Severity | Workaround |
|---|---|---|
| Character consistency | High | Ingredients to Video + verbatim character descriptions in every prompt |
| Human emotions (robotic expressions) | Medium | Lean into style over realism; use close-ups sparingly |
| 8-second clip ceiling | High | Chain + Extend; budget coherence for ~3–4 extensions max |
| Scene Builder resets on exit | High | Download clips immediately; reassemble in CapCut |
| Audio strips on Scene Builder export | High | Download individual clips; stitch externally |
| Unexpected cuts mid-generation | Medium | Regenerate; adjust pacing in prompt |
| Unwanted text/captions in output | Medium | Add "No subtitles. No text overlays." to every prompt |
| Jump To / Extend only on Veo 2 (no audio) | Medium | Accept silent extended sequences; add audio in CapCut |
| Interface bugs (stuck at 99%, slow load) | Medium | Refresh; retry; avoid peak hours |
| Tier | Price | Credits | Resolution | Watermark |
|---|---|---|---|---|
| Free | $0 | 100 + 50 daily | 720p | Visible + SynthID |
| Google AI Pro | $19.99/month | 1,000/month | 1080p upscale | Visible + SynthID |
| Google AI Ultra | $249.99/month | 25,000/month | 4K upscale | SynthID only |
Credit costs: T2V ~10 credits, F2V ~100 credits, Extend ~10 credits
Pro breakdown: 1,000 credits ≈ 100 T2V generations/month ≈ 3/day
Commercial use: GA features allow commercial use; Veo 3 Pre-GA features are explicitly prohibited from commercial use regardless of payment tier — verify current GA/Pre-GA status directly with Google before any commercial deployment.
All outputs carry invisible SynthID watermarks regardless of tier.
| Use Case | Best Tool |
|---|---|
| Dialogue-driven narrative content | Flow |
| Audio-native cinematic scenes | Flow |
| Complex choreography / dance | Kling AI |
| High-volume social B-roll | Kling AI |
| Character-consistent multi-scene | Runway Gen-4.5 |
| Viral transformation effects | Pika Labs |
| Distinctive aesthetic hero images | Midjourney |
| Ecosystem reference image generation | Nano Banana Pro (in Flow) |
| Final assembly + publishing | CapCut |