The Astounding Development Rate Of nVidia's Text-To-Video AI
The new video generator from nVidia was unveiled at the IEEE Conference on Computer Vision and Pattern Recognition 2023.
It begins as a Latent Diffusion Model (LDM) trained to create images from text but adds an extra step where it tries to animate the image using what it has discovered from studying thousands of existing videos.
nVidia's new text-to-video AI
The LDM is tasked with determining what is probably to change in each area of an image over a specific period as a result of the addition of time as a tracked dimension.
A second LDM is then used to interpolate the frames between the keyframes, producing images of comparable quality for each image in the series.
It creates some keyframes throughout the sequence.
The system was put to the test using low-quality dash cam-style video, and nVidia discovered that it was capable of producing several minutes’ worth of this kind of video in a “temporally coherent” manner, at 512 x 1024-pixel resolution.
It is such a remarkable achievement in this quickly developing sector.
Also, it can function at far higher resolutions and with a huge variety of other graphic styles.
The system was used by the team to create a large number of example videos, all with a resolution of 1080 x 2048 pixels and just using text commands.
These films each have 113 frames and are displayed at 24 frames per second, making them roughly 4.7 seconds long.
Going far beyond that in terms of overall duration appears to break things and bring in a lot of strangeness.
There are still many weird errors to be discovered, and they are still unmistakably AI-generated. In many of the videos, the keyframes are also rather evident due to the strange speeding and slowing of motion that occurs around them.
nVidia's text-to-video AI is developing
Currently, nVidia is approaching this system more like a research endeavor than a finished consumer product.
It seems to be the reason that the corporation is unlikely to be interested in footing the bill for an open system's processing expenses, which are almost certainly high.
There are also additional risks to be avoided when these algorithms start producing realistic videos of events that never happened, in addition to any copyright concerns that may come from their training dataset.
Keep visiting Vidconverter to get the latest news!