Text-to-image generation is the algorithmic process of the moment, with OpenAI’s Craiyon (formerly DALL-E mini) and Google’s Imagen AI’s unleashing tidal waves of wonderfully weird, procedurally generated art synthesized from the imaginations of humans and computers. On Tuesday, Meta announced that they too have developed an AI imaging engine that they hope will help build immersive worlds in the Metaverse and create high-quality digital art.
It’s a lot of work to create an image based only on the phrase “There’s a horse in the hospital” when using an AI generation. First, the phrase itself is fed through a transformer model, a neural network that analyzes the words of the sentence and develops a contextual understanding of their relationship to one another. Once it gets the gist of what the user is describing, the AI synthesizes a new image using a series of GANs (Generative Adversarial Networks).
Thanks to efforts over the last few years to train ML models on increasingly rich, high-resolution image sets with well-curated text descriptions, today’s cutting-edge AIs can create photorealistic images of almost any nonsense you feed them. The specific creation process differs between AIs.
For example, Google’s Imagen uses a diffusion model “that learns to turn a pattern of random dots into images,” according to a June keyword to blog. “These images start out at low resolution and then gradually increase in resolution.” Google’s Parti AI, on the other hand, “first converts a collection of images into a sequence of code inputs, much like puzzle pieces. A given text prompt is then translated into these code entries and a new image is created.”
While these systems can create almost anything written to them, the user has no control over the specific aspects of the output image. “To realize the potential of AI to fuel creative expression,” Meta CEO Mark Zuckerberg explained in Tuesday’s blog, “humans need to be able to shape and control the content generated by a system.”
The company’s “exploratory AI research concept” called Make-A-Scene does just that, incorporating user-created sketches into its text-based image generation and outputting a 2,048 x 2,048 pixel image. This combination allows the user not only to describe what they want in the image, but also to dictate the overall composition of the image. “It shows how people can use both text and simple drawings to convey their vision with greater specificity, using a variety of elements, shapes, arrangements, depth, compositions, and textures,” Zuckerberg said.
In tests, a panel of human evaluators overwhelmingly chose the text-and-sketch image over the text-only image as it compared better with the original sketch (99.54 percent of the time) and better 66 percent of the time matched the original text description. To further develop the technology, Meta has shared its Make-A-Scene demo with prominent AI artists such as Sofia Crespo, Scott Eaton, Alexander Reben and Refik Anadol, who will be using the system and providing feedback. There is no word on when the AI will be released to the public.
All products recommended by Engadget are selected by our editorial team independently from our parent company. Some of our stories contain affiliate links. If you buy something through one of these links, we may receive an affiliate commission.
https://www.engadget.com/metas-make-a-scene-ai-algorithmic-art-130058753.html?src=rss Meta’s ‘Make-A-Scene’ AI blends human and computer imagination into algorithmic art