Wan 2.6 transforms text and images into videos with lip-sync, multi-character dialogue, and custom personas.
No history found
Generation failed
Wan is an open-source AI video generation model series developed by Alibaba Group's Tongyi Lab. The Wan family represents Alibaba's flagship effort in multimodal AI, designed to transform text prompts, images, and reference videos into high-quality video content with realistic motion and visual consistency.
Current Version: Wan 2.6 (December 2025)
Last updated: December 2025
Wan 2.6 launched shortly after version 2.5, focusing on tighter multimodal integration and expanded creative controls. This release addresses key limitations in earlier versions while introducing features designed for more complex content creation workflows.
Native audio generation upgraded: Audio quality has improved substantially compared to Wan 2.5, with more natural-sounding output, though it still trails behind premium competitors like Veo 3 and Sora 2 in voice realism
Extended duration: Support for up to 15-second clips at 1080P, with the ability to combine multiple clips for longer sequences
Character reference system: Upload up to three character references from video to maintain consistency across generations (Note: This feature is not yet available on Somake)
Personal avatar creation: Record your own face from multiple angles and voice samples to create a consistent AI persona (Note: This feature is not yet available on Somake)
Multi-character dialogue: Clean handling of conversations between multiple characters without speech overlap
Environment and wardrobe control: Change character clothing and scene environments through prompts
Fluid motion quality: Video output features convincing camera effects like zoom and blur with smooth movement
Character resemblance and voice matching can be inconsistent—faces and voices sometimes differ from reference material
Complex action sequences with multiple characters (such as fight scenes) may produce visual artifacts and distortions
Anime-style video generation produces weaker visual quality compared to realistic styles
Some feature inconsistencies may occur, including occasional language mismatches in output
Unexpected elements or surreal outputs can appear, a common challenge in current text-to-video AI
Version | Key Capabilities | Max Duration | Max Resolution | Audio Support |
|---|---|---|---|---|
Wan 2.1 | Text-to-video, Image-to-video, Visual text generation | 5 seconds | 720P | No |
Wan 2.2 | Improved efficiency, VACE integration, Open-source | 5 seconds | 720P | No |
Wan 2.5 | Audio-visual sync introduced, Enhanced motion | 10 seconds | 1080P | Basic |
Wan 2.6 | Multi-shot narratives, Character references, Custom personas | 15 seconds | 1080P | Improved native A/V |
Quick Social Media Ads: Need a catchy 10-second video for Instagram? Just type, "A dynamic shot of our new sneaker splashing through a puddle, cinematic, high-energy," and get a professional-looking ad in minutes.
Product Visualizations: Create videos showing your product in any setting imaginable. "Our new coffee mug on a desk in a cozy, rain-swept Parisian cafe, steam rising."
Visualizing History: A teacher could generate a clip of "Roman soldiers marching through a forest, seen from a low angle" to make lessons more engaging.
Explaining Science: A student could create a video to explain a complex topic, like "An animated journey through a plant cell, showing the mitochondria at work."
Rapid Prototyping: Quickly visualize a scene from your script to test if the mood and composition work, saving valuable time and resources.
Unique Visual Effects (VFX): Generate surreal, dream-like sequences or abstract background visuals that would be difficult or impossible to film in real life.
Multi-Shot Storytelling Prompt Template
A cinematic [genre] scene.
Shot 1: [Wide/Medium/Close-up] shot, [describe scene, character, and action].
Shot 2: [Camera angle], [describe transition and new focus].
Shot 3: [Camera angle], [describe resolution or final moment].
Style: [realistic/cinematic/stylized]. Lighting: [natural/dramatic/soft].
Character Reference Best Practices
Use front-facing footage with clear lighting for character references
Record reference videos showing multiple angles when creating personal avatars
Limit to 3 character references maximum for best consistency
For voice matching, provide clear audio samples without background noise
Expect some variation in face and voice reproduction—plan for multiple generations
Works well: Dialogue scenes, talking heads, single-character focus, simple interactions, conversational multi-character scenes
Use caution: Action sequences with multiple characters, fight choreography, rapid movement
Avoid or expect artifacts: Complex anime styles, highly dynamic group scenes
Enable prompt expansion when your input is simple or you want richer visual detail. The system adds descriptive elements to improve composition, style consistency, and visual coherence in the output.
Problem: Voice sounds robotic or unnatural → Solution: This is a current limitation of Wan 2.6. For projects requiring highly realistic voices, consider using the video output with separately generated or recorded audio.
Problem: Unexpected characters or surreal elements appear → Solution: AI artifacts are common in text-to-video generation. Simplify your prompt, reduce the number of characters or elements, and regenerate. Review outputs carefully before use.
Problem: Action scenes have visual distortions → Solution: Complex action sequences with multiple characters are a known weakness. Break dynamic scenes into simpler shots, focus on one or two characters per clip, and avoid choreographed fight sequences.
Problem: Anime-style output looks poor → Solution: Wan 2.6's anime generation is notably weak. For anime content, consider alternative models or use realistic style prompts instead.
Problem: Language mismatch in generated content → Solution: Some language inconsistencies may occur. Specify your target language clearly in the prompt and regenerate if output doesn't match expectations.
The intuitive interface lets anyone create professional visuals—just describe what you want and generate in seconds.
Handle both image and video generation on a single platform, streamlining your workflow from concept to final output.
Paid subscribers get full commercial rights to their creations, making it easy to use outputs in ads, campaigns, and client projects.
Not at all! That's the main benefit of our platform. We manage all the complex processing on our servers. All you need is a device with a web browser.
Yes! Any video you generate on our platform is yours to use. They are perfect for commercial use in marketing campaigns, for content on your monetized YouTube channel, or for any other business purpose.
Wan 2.6 is an open-source AI video generation model developed by Alibaba that creates videos from text, images, or reference videos. It features multi-shot storytelling, native audio synchronization, and character consistency tools, with output up to 15 seconds at 1080P resolution.
Audio quality has improved significantly from Wan 2.5 and approaches the quality of premium models, though voices can still sound noticeably robotic compared to Veo 3 and Sora 2.