Imagen is Google Research's text-to-image model. It uses advanced deep-learning techniques, primarily large Transformer-based architectural models, to generate photorealistic images using natural language descriptions. Imagen's core innovations are based on combining large language models, like those used in Google NLP research, with the generative abilities of diffusion models - a class of generative model known for creating detailed images by gradually refining noise.
Imagen is a unique system that produces highly detailed images and textures, often from complex text prompts. It builds on the advances in image generation made with models like DALLE, but it focuses heavily on fine detail and semantic understanding.