Question 1

Which Gemini Omni model should I use?

Accepted Answer

Use Veo 3.1 for general text-to-video and image-to-video. Use Gemini Omni Flash when you want the fast Gemini Omni video workflow. Use GPT Image 2 when you need still images or image edits.

Question 2

Does Gemini Omni Flash accept audio or video files as input?

Accepted Answer

No. The current workflow accepts a required text prompt and optional image URLs. Audio generation is optional output, not a direct audio-file input.

Question 3

Which models support audio generation?

Accepted Answer

The current video workflows expose optional audio generation through the VEO 3.1 official integration. GPT Image 2 does not generate audio.

Question 4

Which model costs the fewest credits?

Accepted Answer

GPT Image 2 starts lower for still images. Video costs more because it depends on duration, resolution, and audio. Preview the exact credit cost in the generator before submitting.

Model	Input	Output	Max Resolution	Duration	Audio	Credits	Best For
Veo 3.1	Text / Image	Video	4K where supported	4 / 6 / 8s	Yes	60+	Prompt-led video, image-to-video, optional audio, and higher-resolution final clips
Gemini Omni Flash	Text / Image	Video	4K where supported	4 / 6 / 8s	Yes	60+	Fast Gemini Omni video drafts from prompts or up to 3 image references
GPT Image 2	Text / Image	Image	Up to 4K	—	—	3+	AI images, reference frames, product visuals, and prompt-guided image edits

Which Gemini Omni Model Should You Use?

Browse Gemini Omni Models

Gemini Omni Video Models

Veo 3.1

Gemini Omni Flash

Gemini Omni Image Models

GPT Image 2

Model Comparison Table

Choose a Model by Task

Veo 3.1

Gemini Omni Flash

Gemini Omni Flash

GPT Image 2

Frequently Asked Questions