Master every AI creation tool in your GEN arsenal β from text to talking avatars, this is your complete reference guide.
Every video starts with a creation card. These are GEN's AI-powered content generators that turn simple inputs into professional media assets. Whether you're generating images from prompts, creating talking avatar videos, or adding captions, each creation card is designed to work seamlessly together in your video creation workflow.
π‘ Did you know?
GEN users average 38+ videos per week using these creation cards β replacing what used to take a full content team.
Shared Features Across All Cards
Before diving into individual cards, here are the universal features that make creation cards powerful:
Variables: Fields marked with βΎοΈ accept ####{{variable_name}} syntax for dynamic content
Auto-update: Checkbox automatically regenerates content when referenced variables change
AI Text Fields: Switch from Text to AI text to generate Text from a prompt.
Input History: All previous settings are saved and restored when reopening cards
Output History: View all generated content versions at the bottom of your screen
Content Creation Cards
1. Text
The simplest creation card β direct text input that syncs in real-time with your cell. Perfect for headlines, descriptions, or any text content that doesn't need AI generation.
Key Inputs:
Text content (syncs with cell display)
What to expect: As you type in the creation card, text appears simultaneously in the cell. No AI processing β just clean, direct text input.
Quick tip: Include variables to go faster.
Quick tip: Use AI text to have AI create content from a prompt.
2. Media
Upload files or select from your assets library. Supports drag-and-drop for quick file additions and handles all media types β images, videos, audio, and documents.
Key Inputs:
File upload area (drag-and-drop supported)
Assets panel selection
What to expect: Clean file handling with immediate preview in your cell. Clicking the file area opens your assets panel for organized media selection.
Quick tip: Just drag and drop an audio file or video file into the the vidsheet to load it into a cell or layer.
3. Text Overlay
Add styled text overlays to your video content. Essential for calls-to-action, product names, or any text that needs to appear over your video content with custom positioning and styling.
Key Inputs:
Text content to display
Positioning settings
Styling options
What to expect: Text appears as a layer over your video with full control over placement, font, colors, and background styling.
Quick tip: Use the AI Text option to generate the title from the script or video.
AI Image Generation
4. Image from Text
Generate professional images from text descriptions using two powerful models. Nanobanana offers quick, basic generation while Midjourney provides advanced options with image references and style controls.
Available models:
Nanobanana: Fast generation with prompt input only
Midjourney: Advanced model supporting image references, style references, and omni references
Key Inputs:
Prompt text (required)
Aspect ratio configuration
What to expect: High-quality images generated in your chosen aspect ratio (default 9:16). Generation time varies by model β Nanobanana is faster, Midjourney offers more control.
Quick tip: For product shots, use Midjourney with style references to maintain consistent visual branding across all your generated images.
5. Image from Avatar
Generate images featuring your trained person avatars. Perfect for consistent brand representation when you need the same person appearing across multiple pieces of content.
Key Inputs:
Person avatar selection (required)
Descriptive prompt βΎοΈ
Noise scale adjustment
Skin blend opacity
Aspect ratio configuration
What to expect: Generated images that incorporate your trained avatar's features according to your prompt guidance. Consistent facial features and characteristics across all generations.
Quick tip: Train avatars of key team members or brand ambassadors for consistent human representation in your content.
AI Video Generation
6. Video from Text
Create videos entirely from text descriptions. Three powerful models offer different strengths for various video styles and complexity needs.
Available models:
Veo3: Advanced video generation with high quality output
Kling: Alternative generation style with unique motion characteristics
Wan 2.2: Optimized for specific video types and faster generation
Key Inputs:
Prompt text describing video content βΎοΈ
Negative prompt (exclude unwanted elements) βΎοΈ
Aspect ratio configuration
What to expect: Generated videos in your specified aspect ratio (default 9:16). Generation takes several minutes but produces professional-quality video content from just text descriptions.
Quick tip: Be specific in your prompts β describe camera movements, lighting, and actions for better results. Use negative prompts to avoid unwanted elements.
π How a faceless YouTube creator uses this: One creator launched a history channel using GEN's Image from Text, voice clone, and auto-captions. Posted 45 Shorts in 30 days β zero filming, zero face on camera. Result: 28K subscribers and monetized in 47 days.
7. Video from Image
Animate still images into dynamic video content. Perfect for bringing product photos, artwork, or any static image to life with realistic motion.
Available models:
Kling: Versatile image-to-video with smooth motion
Veo3: High-quality animation with advanced motion understanding
Seedance Lite/Pro: Fast generation with different quality tiers
Sora 2: Premium image animation with exceptional quality
Key Inputs:
First frame image
Last frame image
Motion description prompt
Aspect ratio (varies by model)
What to expect: Smooth video animation between your input images. When you provide both first and last frame, the AI creates realistic motion between them. Single image inputs generate natural movement and life.
Quick tip: Point the first frame to another column as a variable to work faster.
8. Video from Ingredients
Combine multiple images into a cohesive video sequence. Perfect for storytelling that requires multiple visual elements or product collections.
Available models:
Pika: Fixed 9:16 aspect ratio, optimized for mobile content
Kling: Configurable aspect ratio with smooth transitions
Veo 3.1: Advanced multi-image processing
Seedance Lite: Fast generation for quick iterations
Key Inputs:
Multiple images (required)
Descriptive prompt for combination
Aspect ratio (where available)
What to expect: A cohesive video that intelligently combines your images according to your prompt instructions. The AI determines transitions, timing, and visual flow.
Quick tip: Point the ingredients to other parts of the vidsheet as a variable to work faster!
9. Video from Talking Avatar
Create lip-synced videos using AI avatars and custom voices. Perfect for product explanations, testimonials, or any content that needs a human presenter without filming.
Options:
Create a new talking avatar on the spot or pick from an existing one.
Clone a voice within the vidsheet or or pick an existing voice!
Key Inputs:
Talking avatar selection (public library + your custom avatars)
Voice from voice library
Script text
"Enhance Voice" option (ElevenLabs only)
What to expect: Professional-quality videos with realistic lip-sync and natural facial expressions. Avatar thumbnails auto-play on hover for easy selection.
Quick tip: Connect your Eleven Labs API key to pull in voices from Eleven Labs.
Quick tip: Create a talking avatar with a new look every day so your characters can change outfits.
Audio Generation
10. Speech from Text
Convert text scripts to natural-sounding speech in 23 languages. Create custom voices or use the extensive voice library for consistent audio branding.
Options:
Select from an existing voice
Design a voice from scratch
Clone a voice
Key Inputs:
Voice selection (from library or create new)
Script text
Language selection (defaults to English)
"Enhance Voice" for ElevenLabs (adds audio tags like [laughing], [sighs])
What to expect: High-quality MP3 audio files with natural speech patterns. New voices automatically save to your library for future use.
Quick tip: For brand consistency, create a custom voice that matches your brand's personality and use it across all content.
π‘ Did you know? GEN supports 23 languages including Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, and Turkish.
11. Captions
Auto-generate synchronized text captions from any audio or video content. Essential for accessibility and social media engagement where most videos are watched without sound.
Key Inputs:
Audio file, video file, or video layer reference
Font and background styling options
Positioning and padding settings
What to expect: Perfectly synchronized text overlays with professional "liquid" background styling (5px spread, 10px roundness). Captions appear exactly when words are spoken.
Quick tip: Reference video layers by name to create dynamic captioning that updates automatically when you change the source content.
12. Lipsync
Synchronize lip movements in existing video with any audio track. Perfect for dubbing content, changing voice-overs, or matching lip movements to new speech.
Key Inputs:
Source video file
Audio file to sync with
What to expect: Your original video with lip movements precisely matched to the new audio track. The AI analyzes speech patterns and adjusts mouth movements naturally.
Quick tip: Use clear face shots in good lighting for optimal results. Higher quality audio leads to better synchronization accuracy.
Ready to see these creation cards in action? Start with our step-by-step video creation guide to build your first piece of content in minutes.
β‘ Pro Move: Bookmark this page β you'll reference specific creation cards as you build more complex video workflows. Each card links directly to its section for quick access.
Ready to start creating?
Browse our library of ready-made templates and launch your first video in minutes.











