Back to Models Guide
xAIvideo

Complete Guide to Using Grok Imagine

xAI's text-to-video model with configurable modes, duration up to 10 seconds, and multi-resolution output.

Try This ModelTutorial

Overview

Grok Imagine Text-to-Video is xAI's entry into generative video, bringing the creative flexibility of the Grok platform to motion content. It generates videos from text prompts with configurable aspect ratios, generation modes, duration, and resolution settings.

The model offers three generation modes -- fun, normal, and spicy -- each producing different creative interpretations of the same prompt. This makes it easy to explore a range of visual approaches without rewriting your description. Duration options of 6 or 10 seconds cover both short social clips and longer narrative scenes.

Resolution options include 480p for rapid previews and 720p for higher fidelity output. With five aspect ratio choices spanning portrait, landscape, and square formats, Grok Imagine T2V adapts to virtually any distribution platform or content format.

Capabilities

  • Text-to-video generation with rich prompt support up to 5000 characters
  • Three creative modes (fun, normal, spicy) for varied interpretations
  • Configurable duration of 6 or 10 seconds
  • Multiple resolution options (480p, 720p)
  • Five aspect ratio presets including portrait and landscape

Use Cases

1

Social media video content with platform-specific aspect ratios

2

Creative exploration using different generation modes

3

Marketing video concepts and motion storyboards

4

Short narrative clips and scene previews

5

Animated visual content for presentations

Input Parameters

prompt
textarearequired

The text prompt describing the desired video motion (max 5000 chars).

Describe the video scene including action, setting, and mood. Include camera movement cues like 'slow dolly forward' or 'aerial tracking shot'. Maximum 5000 characters.

aspect_ratio
select

Specifies the width-to-height ratio of the generated content.

Choose based on your target platform. 9:16 for mobile-first content, 16:9 for YouTube/TV, 2:3 for Pinterest-style formats, 1:1 for Instagram.

Options
2:33:21:19:1616:9
Default: 2:3
mode
select

Generation mode.

'Normal' produces balanced, faithful interpretations. 'Fun' adds creative flair and stylization. 'Spicy' pushes the creative envelope with more experimental outputs.

Options
funnormalspicy
Default: normal
duration
slider

Length of the generated video in seconds (6–30).

6 seconds for short loops and social clips. 10 seconds for scenes that need more time to develop.

Min: 6Max: 30Default: 6
resolution
select

Resolution of the generated video.

480p for fast iteration and previews. 720p for production-quality output.

Options
480p720p
Default: 480p
NSFW Filter
toggle

Enable NSFW content filtering.

Default: false

Tips & Best Practices

Try all three modes
Front-load your prompt
Start with 480p for iteration

Related Models