Ever since ChatGPT has hit exponential growth of users in late 2022, it became the hottest topic on the Internet. With it naturally, the broader topic of generative AI as a whole, received much more public attention, resources and users, than it did in the past. Generative AI hasn’t been developed over night. Neither will it emerge into more complex media types such as movie over night, whatever one may assume due to the hype. But, there are vivid signs that it is closer to OTT than what we have thought last year. In this article we consider the available tools in generative photography and video, in order to compare them with the current OTT offering. We conclude with the potential field of application for such generative AI in OTT.
From a short photorealistic video to a long one
If at the time of writing there is an OTT platform that runs on the AI generated video, we are not aware of it. So far, attempts in generative video are still at the level of Meta’s Make-a-video from September.
Another well-known attempt is “Nothing, Forever”, a low-resolution AI generated copy of the sitcom “Seinfeld” that ran “infinitely” 24/7 on Twitch for weeks until it was removed from it, due to the Twitch Community Guidelines violation. Eventually, the show made a comeback in March 2023. It runs on the latest version of GPT (GPT-4 at the time of writing). Its audience in March did not return above the 10.000 concurrent sessions it has reached by February, but it is again at the growing trajectory. The video is decoupled from the dialogue and, its chat stream is full of users commenting that a joke was not funny, but the number of active users is the best indicator if this is the early precursor of OTT based on generative AI. Imagine your favourite video franchise with 8K graphics running new content whenever you opt for more, new content you can watch simultaneously with others… it sounds similar to what largest studios are publishing year after year.
Due to high interest in the topic, a top level figure at Microsoft Germany, shortly before the public demo of GPT-4 in March has publicly said that its multimodality includes videos. But the GPT-4 as of writing only includes images generation from the prompt and no video generation, whatsoever. Midjourney, which has the largest user base in AI image generation, has its video generation limited to few seconds around the images it generates. Although their images are astounding, Midjourney generated video demonstrates the required innovation in order to generate more complex videos through the prompt. One of the possibilities is to iterate related prompts from the GPT-4 to Midjourney via integration by using the same theme e.g. seed instance. A good example of such approach is this simple All about AI video with some API integration similar to the one “Nothing, Forever” uses. After their generation, images can be interpolated in order to make more images between them, such that all of those images can be joined in order to make videos from the sets of related pictures. Such attempts are still done manually, because same theme – seed, sometimes does not yield related images. Also, sometimes interpolation has too sharp transition between images because they are too distant. And, even when the video sequences start to emerge, adding realistic audio from the prompt is an epic issue that awaits to be solved successfully.
But, how does this all affect OTT?
So far, the only effects are in shrinking time to delivery and related costs.
For example, The Coca-Cola Company has launched a campaign to create their ad using GPT-4 & DALL-E 2 by contestants in March. Furthermore, they are backing their preliminary internal research through audience testing where in prompts based on the historical commercial ads library, The Coca-Cola Company has received positive results on audience tests of new commercials generated from the prompt, which made them substantial savings on time and budget, on this early example.
Wait, what about OTT?
Large global OTT solutions which are experimenting with adaptation of their platforms to a GPT-like queries takes its time. An important part of it is that search functions inside OTT are highly complex and secondly, they have to sustain heavy load in production environment.
Generative AI prompts have large oscillations in usage loads and in general they are not always available. This complicates the situation for OTT. But, probably one of the best places where OTT could utilize generative AI is in searching for the asset to watch.
On one side, it would enable the OTT to give recommendation based on the prompt similarly to the search used in OTT. Since typical OTT has a lot of assets (as much as it finds feasible to keep), in order for the numerous searches to be optimized, search results are indexed, sometimes even on multiple levels. Even some of the recommender engine results are periodically indexed in order to deliver them more quickly.
On the other side, if fed with the asset transcripts, Large Language Model transformer can not only provide briefs of each asset, but GPT-4 can do it based on the more subtle instruction set, e.g. they can use preferences of the user in order to deliver asset’s description in the style that is preferred by the user, e.g. short, funny, Borg style, in the mood of the outside weather, factual, or another more complex style that will prove better for the user.
An example of a more complex feature that can be executed with the existing technology is the summarization of the missed video stream in a short time-lapse or verbal/written two-sentence statement. If a viewer has to answer a call or is urgently needed elsewhere, upon returning back, instead of rewinding the asset to rewatch the missed parts and potentially forcing other viewers to rewatch the missing part again or disturb them with questions, the affected user can prompt the OTT platform and get a quick brief in its direction.
Would generative AI bypass OTT in 2023… based on everything we are seeing, no. Although it will take time to roll out such production features to large audiences, it is worth the wait.