Skip to main content

Creation Cards in Vidsheets

Every creation card in GEN explained: image, video, voice, avatars, captions, and more.

Updated over a week ago

Master every AI creation tool in your GEN arsenal β€” from text to talking avatars, this is your complete reference guide.

Every video starts with a creation card. These are GEN's AI-powered content generators that turn simple inputs into professional media assets. Whether you're generating images from prompts, creating talking avatar videos, or adding captions, each creation card is designed to work seamlessly together in your video creation workflow.

πŸ’‘ Did you know?
GEN users average 38+ videos per week using these creation cards β€” replacing what used to take a full content team.

Shared Features Across All Cards

Before diving into individual cards, here are the universal features that make creation cards powerful:

  • Variables: Fields marked with ♾️ accept ####{{variable_name}} syntax for dynamic content

  • Auto-update: Checkbox automatically regenerates content when referenced variables change

  • AI Text Fields: Switch from Text to AI text to generate Text from a prompt.

  • Input History: All previous settings are saved and restored when reopening cards

  • Output History: View all generated content versions at the bottom of your screen

Content Creation Cards

1. Text

The simplest creation card β€” direct text input that syncs in real-time with your cell. Perfect for headlines, descriptions, or any text content that doesn't need AI generation.

Key Inputs:

  • Text content (syncs with cell display)

What to expect: As you type in the creation card, text appears simultaneously in the cell. No AI processing β€” just clean, direct text input.

Quick tip: Include variables to go faster.

Quick tip: Use AI text to have AI create content from a prompt.

2. Media

Upload files or select from your assets library. Supports drag-and-drop for quick file additions and handles all media types β€” images, videos, audio, and documents.

Key Inputs:

  • File upload area (drag-and-drop supported)

  • Assets panel selection

What to expect: Clean file handling with immediate preview in your cell. Clicking the file area opens your assets panel for organized media selection.

Quick tip: Just drag and drop an audio file or video file into the the vidsheet to load it into a cell or layer.

3. Text Overlay

Add styled text overlays to your video content. Essential for calls-to-action, product names, or any text that needs to appear over your video content with custom positioning and styling.

Key Inputs:

  • Text content to display

  • Positioning settings

  • Styling options

What to expect: Text appears as a layer over your video with full control over placement, font, colors, and background styling.

Quick tip: Use the AI Text option to generate the title from the script or video.

AI Image Generation

4. Image from Text

Generate professional images from text descriptions using two powerful models. Nanobanana offers quick, basic generation while Midjourney provides advanced options with image references and style controls.

Available models:

  • Nanobanana: Fast generation with prompt input only

  • Midjourney: Advanced model supporting image references, style references, and omni references

Key Inputs:

  • Prompt text (required)

  • Aspect ratio configuration

What to expect: High-quality images generated in your chosen aspect ratio (default 9:16). Generation time varies by model β€” Nanobanana is faster, Midjourney offers more control.

Quick tip: For product shots, use Midjourney with style references to maintain consistent visual branding across all your generated images.

5. Image from Avatar

Generate images featuring your trained person avatars. Perfect for consistent brand representation when you need the same person appearing across multiple pieces of content.

Key Inputs:

  • Person avatar selection (required)

  • Descriptive prompt ♾️

  • Noise scale adjustment

  • Skin blend opacity

  • Aspect ratio configuration

What to expect: Generated images that incorporate your trained avatar's features according to your prompt guidance. Consistent facial features and characteristics across all generations.

Quick tip: Train avatars of key team members or brand ambassadors for consistent human representation in your content.

AI Video Generation

6. Video from Text

Create videos entirely from text descriptions. Three powerful models offer different strengths for various video styles and complexity needs.

Available models:

  • Veo3: Advanced video generation with high quality output

  • Kling: Alternative generation style with unique motion characteristics

  • Wan 2.2: Optimized for specific video types and faster generation

Key Inputs:

  • Prompt text describing video content ♾️

  • Negative prompt (exclude unwanted elements) ♾️

  • Aspect ratio configuration

What to expect: Generated videos in your specified aspect ratio (default 9:16). Generation takes several minutes but produces professional-quality video content from just text descriptions.

Quick tip: Be specific in your prompts β€” describe camera movements, lighting, and actions for better results. Use negative prompts to avoid unwanted elements.

πŸ“ˆ How a faceless YouTube creator uses this: One creator launched a history channel using GEN's Image from Text, voice clone, and auto-captions. Posted 45 Shorts in 30 days β€” zero filming, zero face on camera. Result: 28K subscribers and monetized in 47 days.

7. Video from Image

Animate still images into dynamic video content. Perfect for bringing product photos, artwork, or any static image to life with realistic motion.

Available models:

  • Kling: Versatile image-to-video with smooth motion

  • Veo3: High-quality animation with advanced motion understanding

  • Seedance Lite/Pro: Fast generation with different quality tiers

  • Sora 2: Premium image animation with exceptional quality

Key Inputs:

  • First frame image

  • Last frame image

  • Motion description prompt

  • Aspect ratio (varies by model)

What to expect: Smooth video animation between your input images. When you provide both first and last frame, the AI creates realistic motion between them. Single image inputs generate natural movement and life.

Quick tip: Point the first frame to another column as a variable to work faster.

8. Video from Ingredients

Combine multiple images into a cohesive video sequence. Perfect for storytelling that requires multiple visual elements or product collections.

Available models:

  • Pika: Fixed 9:16 aspect ratio, optimized for mobile content

  • Kling: Configurable aspect ratio with smooth transitions

  • Veo 3.1: Advanced multi-image processing

  • Seedance Lite: Fast generation for quick iterations

Key Inputs:

  • Multiple images (required)

  • Descriptive prompt for combination

  • Aspect ratio (where available)

What to expect: A cohesive video that intelligently combines your images according to your prompt instructions. The AI determines transitions, timing, and visual flow.

Quick tip: Point the ingredients to other parts of the vidsheet as a variable to work faster!

9. Video from Talking Avatar

Create lip-synced videos using AI avatars and custom voices. Perfect for product explanations, testimonials, or any content that needs a human presenter without filming.

Options:

  • Create a new talking avatar on the spot or pick from an existing one.

  • Clone a voice within the vidsheet or or pick an existing voice!

Key Inputs:

  • Talking avatar selection (public library + your custom avatars)

  • Voice from voice library

  • Script text

  • "Enhance Voice" option (ElevenLabs only)

What to expect: Professional-quality videos with realistic lip-sync and natural facial expressions. Avatar thumbnails auto-play on hover for easy selection.

Quick tip: Connect your Eleven Labs API key to pull in voices from Eleven Labs.

Quick tip: Create a talking avatar with a new look every day so your characters can change outfits.

Audio Generation

10. Speech from Text

Convert text scripts to natural-sounding speech in 23 languages. Create custom voices or use the extensive voice library for consistent audio branding.

Options:

  • Select from an existing voice

  • Design a voice from scratch

  • Clone a voice

Key Inputs:

  • Voice selection (from library or create new)

  • Script text

  • Language selection (defaults to English)

  • "Enhance Voice" for ElevenLabs (adds audio tags like [laughing], [sighs])

What to expect: High-quality MP3 audio files with natural speech patterns. New voices automatically save to your library for future use.

Quick tip: For brand consistency, create a custom voice that matches your brand's personality and use it across all content.

πŸ’‘ Did you know? GEN supports 23 languages including Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, and Turkish.

11. Captions

Auto-generate synchronized text captions from any audio or video content. Essential for accessibility and social media engagement where most videos are watched without sound.

Key Inputs:

  • Audio file, video file, or video layer reference

  • Font and background styling options

  • Positioning and padding settings

What to expect: Perfectly synchronized text overlays with professional "liquid" background styling (5px spread, 10px roundness). Captions appear exactly when words are spoken.

Quick tip: Reference video layers by name to create dynamic captioning that updates automatically when you change the source content.

12. Lipsync

Synchronize lip movements in existing video with any audio track. Perfect for dubbing content, changing voice-overs, or matching lip movements to new speech.

Key Inputs:

  • Source video file

  • Audio file to sync with

What to expect: Your original video with lip movements precisely matched to the new audio track. The AI analyzes speech patterns and adjusts mouth movements naturally.

Quick tip: Use clear face shots in good lighting for optimal results. Higher quality audio leads to better synchronization accuracy.

Ready to see these creation cards in action? Start with our step-by-step video creation guide to build your first piece of content in minutes.

⚑ Pro Move: Bookmark this page β€” you'll reference specific creation cards as you build more complex video workflows. Each card links directly to its section for quick access.


Ready to start creating?

Browse our library of ready-made templates and launch your first video in minutes.

Did this answer your question?