ModelScope Description
This model is based upon a multi-stage text to video generation diffusion model. It takes a text description and returns a video which matches the text description. Only English input is supported.
This model is based upon a multi-stage text to video generation diffusion model. It takes a text description and returns a video which matches the text description. Only English input is supported.
The text-to video generation diffusion model is composed of three sub-networks, namely: text feature extraction; text feature-to video latent-space diffusion model; and video latent-space to video visual-space. The model parameters are approximately 1.7 billion. Support English input. The model uses the Unet3D structure and achieves video generation by iteratively denoising the pure Gaussian video.
Pricing
Integrations
Company Details
Product Details
ModelScope Features and Options
ModelScope User Reviews
Write a Review- Previous
- Next