Average Ratings 0 Ratings
Average Ratings 0 Ratings
Description
GLM-4.5V is an evolution of the GLM-4.5-Air model, incorporating a Mixture-of-Experts (MoE) framework that boasts a remarkable total of 106 billion parameters, with 12 billion specifically dedicated to activation. This model stands out by delivering top-tier performance among open-source vision-language models (VLMs) of comparable scale, demonstrating exceptional capabilities across 42 public benchmarks in diverse contexts such as images, videos, documents, and GUI interactions. It offers an extensive array of multimodal functionalities, encompassing image reasoning tasks like scene understanding, spatial recognition, and multi-image analysis, alongside video comprehension tasks that include segmentation and event recognition. Furthermore, it excels in parsing complex charts and lengthy documents, facilitating GUI-agent workflows through tasks like screen reading and desktop automation, while also providing accurate visual grounding by locating objects and generating bounding boxes. Additionally, the introduction of a "Thinking Mode" switch enhances user experience by allowing the selection of either rapid responses or more thoughtful reasoning based on the situation at hand. This innovative feature makes GLM-4.5V not only versatile but also adaptable to various user needs.
Description
Wan2.2 marks a significant enhancement to the Wan suite of open video foundation models by incorporating a Mixture-of-Experts (MoE) architecture that separates the diffusion denoising process into high-noise and low-noise pathways, allowing for a substantial increase in model capacity while maintaining low inference costs. This upgrade leverages carefully labeled aesthetic data that encompasses various elements such as lighting, composition, contrast, and color tone, facilitating highly precise and controllable cinematic-style video production. With training on over 65% more images and 83% more videos compared to its predecessor, Wan2.2 achieves exceptional performance in the realms of motion, semantic understanding, and aesthetic generalization. Furthermore, the release features a compact TI2V-5B model that employs a sophisticated VAE and boasts a remarkable 16×16×4 compression ratio, enabling both text-to-video and image-to-video synthesis at 720p/24 fps on consumer-grade GPUs like the RTX 4090. Additionally, prebuilt checkpoints for T2V-A14B, I2V-A14B, and TI2V-5B models are available, ensuring effortless integration into various projects and workflows. This advancement not only enhances the capabilities of video generation but also sets a new benchmark for the efficiency and quality of open video models in the industry.
API Access
Has API
API Access
Has API
Integrations
AIReel
Claude Code
Cline
ComfyUI
Fuser
Kilo Code
Lucy Edit AI
OpenRouter
Roo Code
SiliconFlow
Integrations
AIReel
Claude Code
Cline
ComfyUI
Fuser
Kilo Code
Lucy Edit AI
OpenRouter
Roo Code
SiliconFlow
Pricing Details
Free
Free Trial
Free Version
Pricing Details
Free
Free Trial
Free Version
Deployment
Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook
Deployment
Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook
Customer Support
Business Hours
Live Rep (24/7)
Online Support
Customer Support
Business Hours
Live Rep (24/7)
Online Support
Types of Training
Training Docs
Webinars
Live Training (Online)
In Person
Types of Training
Training Docs
Webinars
Live Training (Online)
In Person
Vendor Details
Company Name
Zhipu AI
Founded
2023
Country
China
Website
chat.z.ai/
Vendor Details
Company Name
Alibaba
Founded
1999
Country
China
Website
wan.video