What is the Function and Mechanism of Action?

When Google announced PaLM 2 and Gemini language models in mid-2023, the search giant emphasized that its AI was multimodal. This meant it could generate text, images, audio, and even video. Traditionally, language models like ChatGPT’s GPT-4 have only excelled at reproducing text. Google’s latest VideoPoet model challenges that notion, however, as it can convert text-based prompts into AI-generated videos.

With VideoPoet, Google has become the first tech giant to announce an AI capable of generating videos. And unlike prior attempts, Google says it can also generate scenes with lots of motion rather than just subtle movements. So what’s the magic behind VideoPoet and what can it do? Here’s everything you need to know.

What is Google VideoPoet?

Google VideoPoet is an experimental large language model that can generate videos from a text-based prompt. You can describe a fictional scene, even one as ridiculous as “A robot cat eating spaghetti,” and have a video ready to watch within seconds. If you’ve ever used an AI image generator like Midjourney or DALL-E 3, you already know what to expect from VideoPoet.

Google has invested in startups like Runway working on AI video generation, but VideoPoet comes courtesy of the company’s internal efforts. The VideoPoet technical paper enlists as many as 31 researchers from Google Research.

How does Google VideoPoet work?

In the aforementioned paper, Google’s researchers explained that VideoPoet differs from conventional text-to-image and text-to-video generators. Unlike Midjourney, for example, VideoPoet does not use a diffusion model to generate images from random noise. That approach works well for individual images but falls flat for videos where the model needs to account for motion and consistency over time.

VideoPoet has a robust foundation thanks to its training that allows it to perform tasks beyond text-to-video generation as well. For example, it can apply styles to existing videos, perform edits like adding background effects, change the look of an existing video with filters, and change the motion of a moving object in an existing video. Google demonstrated the latter with a raccoon dancing in various styles.

VideoPoet vs. rival AI video generators: What’s the difference?

Google’s VideoPoet differs from most of its rivals that rely on diffusion models to turn text into videos. However, it’s not exactly the first – a smaller number of Google Brain researchers presented Phenaki last year. Likewise, Meta’s Make-A-Video project made waves in the AI community for generating diverse videos without training on video-text pairs beforehand. However, neither models were publicly released.

So given that we don’t have access to any video-generating models, we can only rely on the information Google has provided about VideoPoet. With that in mind, the paper’s authors assert that “In many cases, even the current leading models either generate small motion or, when producing larger motions, exhibit noticeable artifacts.” VideoPoet, on the other hand, can handle more motion. VideoPoet can generate longer videos and handle motion more gracefully than the competition.

Google VideoPoet availability: Is it free?

While Google has published dozens of example videos to demonstrate the strengths of VideoPoet, it stopped short of announcing a public rollout. In other words, we don’t know when we’ll be able to use VideoPoet, if at all. Google hasn’t announced a product or release date for VideoPoet yet. As for pricing, we may have to take the hint from AI image generators like Midjourney that are only available via a subscription. Indeed, AI-generated images and videos are computationally expensive so opening up access to everyone may not be feasible, even for Google.

Related Articles

Latest Updates