May 10, 2023

Key Observations from NAB 2023: AI Will Make Us All Content Creators

We are inundated on a daily basis with images and stories regarding deepfakes and voice-cloning and their possible contribution to misinformation and disinformation. However, there are very promising aspects of how implementations of AI can have profoundly positive impacts on the content creation process. At NAB 2023, we saw the gradual evolution of applications that are incorporating technologies that are loosely used in an AI sense, so you will have to bear with me as this is not necessarily my definition. With that safe harbor statement made, consider the following:

Evolution of Video Editing: From Word Processor to Text-Driven Systems

In 1995, in one of my textbooks, I wrote about a day when there would be the ability to edit videos by editing text, instead of “marking in” and “marking out” on a digital representation of a frame. In essence, a “word processor for picture and sound”—somewhat playing on the position of the Montage Picture Processor—used to edit motion pictures and television shows and was a precursor to the digital nonlinear systems.

At NAB 2023, automatic speech-to-text (STT) has found its way into a variety of product offerings—the most mature of which have been automated closed-captioning products. At the same time, the “gradual” evolution of digital nonlinear editing systems has arrived with the introduction of text-driven editing systems. At NAB 2023, you could see this evolution wherein a video was digitized and STT was then applied, resulting in a transcription of what was spoken in that video. From there, the editing interface enabled the user to cut, copy and paste text from place to place with the synchronized video and audio following dutifully along.

AI in Generative Media

What does this mean for the content creation process? It means that there is an implied promise that more and more individuals who are not classically trained editors can now interact with and manipulate content by doing what they do daily with word processing software.

The Impact of Generative AI in Media Production

AI was present in many other implementations. “Generative AI” includes a large number of topics and a wide variety of media types. At NAB 2023, there were several examples that stood out as helpful, evolving and impactful to the day-to-day content creation process. Voice synthesis can be very useful in creating alternative methodologies to traditional automatic dialogue replacement (ADR). Think of the way someone watches a film today whose original language is English but is now watching in, say, Italian. A combination of voice synthesis and spoken model lip sync generators can vastly improve the viewing experience. Think of how quickly multiple languages can be implemented to meet the ever-decreasing time allotted for the content creation process.

STT was only the beginning of what is now evolving into the incorporation of a variety of methods to better address the indexing and classification of content. Again, let me take you, via the M&E time machine, to the late 1980s. Why the journey? Well, in the late ‘80s, logging content was a manual process—a time-consuming, long, laborious process.

Pros and Cons of AI-Generated Content

There were pros and cons to this process. It was beneficial in the sense that you had to look at the footage, make a determination of what it was and decide how to classify it. In that sense, you had time with the footage and could make mental, as well as physical, notes about the raw footage. The well-known editor, Walter Murch, explained this to me when I was writing my book, “The Making of a Motion Picture Editor,” wherein he said that when he used to edit film, he would see things while running the footage backward that he didn’t see while running it forward. Much of that experience, of course, is somewhat outdated as digital systems are now the norm.

The Drawbacks of Manual Content Indexing and Classification

The cons were substantial. Conservatively, it would take four human hours to adequately log an hour of footage—and all of that before you could realistically begin working with the content.

In 2019, I was in a broadcaster’s facility in Latin America. This was a massive facility. In one section, there were rows and rows (and more rows) of cubicles and in each was a person who was logging content that had been digitized. And I saw that when the logger couldn’t identify a person, they would launch a search engine and start trying to combine the date that they knew with website articles and newspaper headlines to identify people, locations and situations contained in a video. It was absolutely mind-boggling when you take into account that of 200,000 hours of original content, only 10 percent had been digitized. In other words, that’s a lot of work to be done.

How Can Large Language Models (LLMs) Improve Content Indexing and Discovery?

Let’s now consider using technologies such as Large Language Models (LLM) that can be utilized for automatic content indexing, tagging and to assist in classification, search and discovery. If a user has a video of a boat on a body of water and people on the boat, there are a few automatic derivations that become possible and can be suggested to the user. LLMs are very promising in that they combine text, image, video, audio, etc., into a meaningful way which then allows the user to issue a simple instruction: “Show me all of the footage I have that has water and people.” 

In February 2023, YouTube® reported that an average of 271,330 hours of video is uploaded each day to its site. Finding what is important for you to use as you are working with content and finding what is important and relevant to you as you are searching for and consuming content requires a system—a system that is enhanced and enabled by technologies.

How does the combination of Generative AI, LLMs, and visual analysis impact content consumption?

What happens when we see the combination of Generative AI, LLMs and the incorporation of visual analysis and data integration? We get the current state of the industry where, in April 2023, we, as viewers, can watch a live sporting event where the main audio commentary is provided by human announcers. Nothing has changed there. However, we have the ability to consume different angles and different streams of that live event where, approximately live plus 35 seconds later, we are listening to artificially generated audio commentary describing the action taking place.

One could ask why this is necessary. But, taking into account that there is so much more content available than could possibly be covered by, say, just two human commentators, the only way to adequately commentate on the content is to implement another method. Also, let’s consider those individuals who desire to interact with content but may require assistance in what they can hear and/or see. Much more content can be made available, and that content will be substantially enriched.

Next Entry: The Role of Cloud Technology in Virtual Production

Advances in virtual production, virtual sets and real-time performance capture mapped to real-time animation have helped us to create more economically achievable content, yet the flexibility of the Cloud will further reduce costs and drive efficiency to a new level. In my next entry, I will cover the advantages of virtual production, the role of the Cloud, and how it can effectively solve real-time interactivity issues.


The National Association of Broadcasters (NAB) is an annual conference and trade show that brings together professionals from the broadcast and media industries. The conference features keynote speeches, panel discussions, and exhibits showcasing the latest technology and trends in broadcasting. NAB provides a platform for industry leaders to network, exchange ideas, and learn about the latest developments in the field. Whether you are a broadcaster, content creator, technology provider, or a consulting firm like Alvarez & Marsal, NAB is the ultimate destination for anyone involved in the world of broadcasting (media & entertainment). 

Read the other blogs in the series:
In the first installment of "Key Observations from NAB 2023," A&M Director Thomas Ohanian reminisces on his 32nd straight NAB and shares his thoughts on how IP has dramatically changed the M&E industry.
In the third installment of "Key Observations from NAB 2023," A&M Director Thomas Ohanian shares his insights on the advantages of virtual production, the role of the Cloud in virtual production, and how it effectively improves the performance and real-time interactivity in broadcasting.
In the final installment of "Key Observations from NAB 2023," A&M Director Thomas Ohanian shares his insights into the nature of NextGen TV, its advantages and disadvantages, how can NextGen benefit local broadcasters and its role in IP transmission.
Authors

Thomas Ohanian

Director
United States
FOLLOW & CONNECT WITH A&M