The state of generative AI

Generative AI (GenAI) is one of the most exciting and rapidly-evolving fields in artificial intelligence. By teaching machines to create new content, generative AI is opening up a world of possibilities that were once the exclusive domain of human creativity.

Today, we’re at an important juncture in the development of GenAI. Recent advances in the field have been remarkable — with deep learning algorithms generating increasingly realistic images, text and audio — and the rapid speed of development of generative AI is transforming the way we think about technology.

GenAI has not only opened up a world of creative possibilities, but it has also made many everyday tasks easier and more efficient.

In this article, we will take a closer look at the current state of GenAI, talk about the latest advancements in the field, and explore some predictions about what we can expect to see in the future.

Text

Every day, we spend much of our time reading, writing or searching for information.

Text generation models can assist in these tasks. The good news is that text generation is one of the areas where generative AI is the furthest along in terms of development and usage by real people.

The language models available today can:

- Answer questions.

- Summarize blocks of text.

- Auto-complete text.

- Generate responses to natural language queries, like chatbot conversations or voice assistants.

- Translate text into different languages.

The leading model that is accessible to developers and others is OpenAI’s GPT-3, a 175 billion-parameter model.

It is rumored that OpenAI’s GPT-4 will launch sometime this year, and experts believe its output will be even more “human-like,” which will further advance the use cases below.

Use cases for text generation AI

A few common use cases of text generation that are seeing customer traction include:

1. Text generation for marketing and sales

This includes tasks like generating copy for websites, blog posts for content marketing, captions for social media or ads, and cold outreach emails for sales.

2. Reading and writing assistants

This use case focuses on helping users accelerate their research, and read and write better or faster (e.g., editing emails and articles, conducting research for writing, and auto-completing text). Many of these are essentially building new forms of text editors.

3. Text generation for specific situations and specialties

This includes tasks like drafting contracts, legal documents, product requirements or other specialized text, which often require additional structure and knowledge.

While GPT-3 can generate this text on its own, using the model for tasks like this typically requires some fine-tuning, or it may require writing-specific prompts — which is where today’s generative AI companies are trying to add value. For example, Spellbook by Rally is an AI-assisted plugin that helps lawyers draft contracts.

4. Search, triage and synthesis

Individuals and companies can use generative AI for horizontal use cases (like product development, customer support and more), as well as vertical use cases (such as finance or healthcare).

There are opportunities to build products that can summarize and synthesize text, or generate appropriate responses to questions — which can automate a lot of manual work.

For example, companies can use AI-based tools to make customer support more efficient or analyze customer feedback to inform decision-making.

Code

Most LLMs that can create text can also generate code, but some versions of models are specifically trained and fine-tuned for code generation.

For example, OpenAI Codex, based on GPT-3, is an all-purpose programming model.

Use cases for code generation AI

The use cases for AI-enabled code generation include:

1. AI-assisted programming

GitHub CoPilot — an AI pair programmer that offers autocomplete-style suggestions as you code — became publicly available in 2022. It is the flagship product in code generation, and it is transforming the way developers work. For developers who have it installed, it can write up to 40% of the code on any particular project.

There are also AI-supported code documentation products on the market, including Mintlify.

2. SQL generation and data analysis

SQL is just a subset of AI-assisted programs, but it’s worth calling out as a separate use case. When users have questions about the data in their organizations, SQL generation can make data analysis faster, and make business analytics available to people who don’t know SQL.

There are several plugins that use GPT-3 and other tools to pull and analyze data from Google Sheets and other spreadsheet applications.

3. App builders

The holy grail in code generation is being able to use natural language prompts to build custom software or apps — essentially, low-code or no-code on steroids.

While AI code generation is unlikely to work for super complicated use cases, or as a way to maintain and build upon a large app in production — it could be a useful tool for simple apps that stitch together a bunch of existing products. Perhaps AI-based app builders could replace tools like Zapier in the future.

Images

OpenAI’s Dall-E launched way back in January 2021, but image generation models truly exploded in 2022.

Last year, the text-to-image generation models Dall-E 2, Stable Diffusion and Midjourney took the internet by storm.

Here is what the prompt for “kitten” produced for different versions of Midjourney, as it evolved.

Use cases for image generation AI

These models can be used directly for a number of different use cases, and companies will also be able to customize and fine-tune image generation tools and products for specific situations and workflows.

Let’s discuss some of the things we’ve started to see in the world of AI-assisted image generation.

1. Consumer social

Given the nature of social media, images are sometimes more appealing than text — so there are several consumer social use cases where image generation models have taken off.

Lensa, an AI avatar generation tool, took off at the tail end of 2022, and at one point it was generating more than $2 million a day through app stores (that number has now settled down to approximately $200,000 a day).

I expect more use cases like this, including a better version of Bitmoji, iMessage apps that help visualize text conversations based on avatars, and other similar products.

I also predict we’ll see more digital influencers, like Lil Miquela, who has 2.6 million followers on Instagram but doesn’t exist in real life.

2. Marketing and sales

Images are a key storytelling tool for websites, presentations and advertisements — and image generation is a great fit for these use cases.

Image generation products have already popped up to help people create images for marketing and sales, including tools that help you:

- Design ads for social media or magazines.

- Create product shots for e-commerce websites.

- Locate the perfect assets for presentations or sales collateral.

AI-generated images can also replace the need for stock images or photos in many cases. Some platforms, such as Adobe Stock and Shutterstock, have embraced AI and now allow their creators to sell AI-generated images. Canva also has a text-to-image feature you can use to create images to drop into your posters, ads and social media assets.

3. Graphic design and UI

Image models can also generate effective UI designs. Consider this AI-generated website UI, created by Midjourney.

It’s easy to see how image generation tools can be used to assist designers and non-designers by providing inspiration or starting points for graphic and UI design, video game assets, character portraits, interior design and even architecture.

Audio

One area where generative AI shows particular promise is audio and video synthesis. Thanks to advances in machine learning algorithms and the availability of large amounts of training data, generative models can now create highly realistic audio and video content that was previously impossible to produce with traditional methods.

With tools like GANs (Generative Adversarial Networks), researchers can train models to create realistic speech, music and even environmental sounds like rain or traffic. This has numerous potential applications, from creating new music and sound effects for movies to helping people with speech impairments communicate more effectively.

First, let’s discuss where we are in terms of the state of audio models. Broadly, I think of audio as music and non-music/voice.

On the voice side, Microsoft recently announced VALL-E, a model that is capable of taking 3 seconds of someone’s voice and synthesizing it for use in any application. It can take any text prompt, then output that text as audio in the person’s voice. The user will be able to control the emotion and tone of the audio output.

There are also several companies working on models that generate music. Some companies use stable diffusion to create “images” of spectrograms that are converted to audio, and others can be fine-tuned to learn particular albums and output songs in specific styles.

Use cases for audio generation AI

So what are the use cases for audio synthesis?

1. Media and advertising

The ability to generate distinct voices marks a significant breakthrough in the media industry, with audio synthesis offering a wide range of applications in video games, movies and television.

AI can facilitate dubbing into various languages using an actor’s voice, or enable the creation of new characters with specific voices in video games. Audio synthesis can also streamline filming and editing processes in movies and television, potentially saving time and resources.

2. Call centers

Call centers rely on having a set of high-quality voices with low latency for real-time use cases. Previously, they only had a few voices to choose from, typically from big vendors like Google and Microsoft.

With audio generation, call centers can potentially create voices and accents local to a customer's region, and use those voices to create customized responses to support requests.

3. Narration and accessibility

Creators and tech companies can use synthetic voices generated by AI for audiobooks, hardware devices like smart speakers, text-to-speech accessibility options in web browsers, and educational tools.

Many companies that generate voices have a text-to-speech API that users can harness for different practical uses.

4. Music generation

There are a number of companies building applications that enable users to generate and edit music.

Some tools help users create and upload music to streaming platforms and earn money from their creations, and others allow creators to specify music attributes such as mood and genre to generate customized songs. AI can assist musicians throughout their entire creative process, from separating vocals to modifying beats to changing pitch.

5. Audio transcription

Audio transcription isn’t technically “generative AI,” but in many cases, transcription platforms are using Large Language Models (LLMs) to convert spoken words into text.

Using machine learning algorithms and speech recognition technology, generative models can accurately transcribe audio with high levels of accuracy — which has led to far more efficient and cost-effective transcription solutions.

The application of audio generation AI in transcription also has the potential to significantly improve accessibility by enabling real-time captioning for live events.

Video

Generative AI has also made significant strides in the video space.

Today, we don’t have any real models in the wild that can do video generation from images and text, but several companies are making progress on this technology (e.g., tools that can create simplistic videos from text prompts or images, and AI-assisted video editing software).

With continued advances in generative AI, it's likely that we'll see impressive video applications in the years to come.

OpenAI has confirmed that they are working on a video generation model, but the company has not announced a definitive timeline.

Use cases for video generation AI

While full video generation may not be available to the public yet, there are still several video applications where AI is being used — and the list of companies emerging to solve video-related challenges will continue to expand.

Let’s take a look at some situations where generative video AI can be helpful.

1. Sales, training and support

Today, we can already generate videos of human avatars “speaking” vocal tracks. There are a number of companies working to make it simple to generate these avatar videos for use cases such as sales, training, customer support and more. Want personalized outreach at scale? AI-generated videos might be the perfect solution.

2. Marketing

Increasingly, the default ad format on channels like Facebook, Instagram and TikTok is video. However, video ads are currently difficult and expensive to create, especially for SMBs. As generative video evolves, companies will be able to autogenerate a 10 to 30-second video ad from a simple image of their product or service. That ad will likely be a big upgrade from a video of a talking avatar.

While I’ve not come across any services that have this full functionality yet, we can see how Meta’s “Make a Video'' service or Google’s Imagen might be leveraged to convert an image into a short video. Other companies are using AI to understand what causes certain ads to perform better than others — and they’ll use those analytics to help advertisers create highly profitable video assets.

3. Insight extraction

A lot of knowledge and insights are contained in videos, including meetings and other conversations that can be hard to parse, search or scan. Fortunately, there are companies working directly on making use of the knowledge in these types of files. While not technically “generative video,” these companies do leverage LLMs to summarize, synthesize and extract insights from video files.

Imagine coming out of a Zoom team meeting with an AI-generated summary and list of action items. That kind of functionality will be coming soon!

4. Consumer social

While we’re not quite there yet, it’s not hard to imagine a world where many of the videos we see on TikTok and Instagram Reels will be generated by AI. In particular, videos that tend to have a specific formula and are more constrained in what they show (i.e., the avatar use cases above) will be easy to pick off first.

As one datapoint, the hashtag #deepfake has 1.3 billion views, and multiple deep fake videos go viral every day. Here is one example of a deepfake video of Harry Styles.

Multimodal

To close, I want to touch on multimodality — the use of more than one mode of communication to create meaning. In one sense, video is already multimodal, because it requires stitching together audio, images and (likely) text, but there are a few other things worth considering when we’re talking about multiple modes in generative AI.

1. Multimodal image, text and video

Today, most image models are unable to add text to their image output. But in many cases — like in design or marketing — a user might want to overlay text that incorporates the style of the image. Or the model may need to understand the components of the image, so it knows where to place the text. A similar thing may be needed in videos.

For now, companies working on generative AI applications will likely have to plug in gaps with post-processing layers on their image or video models.

Storytelling is another example — most stories and presentations contain a mix of text and images in context.

Tome is an AI-powered storytelling platform that can generate presentations of texts and images from text prompts. I gave it a prompt to create a presentation about the State of Generative AI and this is what it created:

Caption: Check out the full tome, which I generated from a simple text prompt.

Another interesting multimodal use case is chat interfaces. While ChatGPT has shown a lot of people what is possible with AI, today ChatGPT responses are text-only. In future versions of chat interfaces, we might see models that can respond with images or videos as well, depending on the questions and responses.

2. Actions and tasks

Generating images, text and videos is fine — but wouldn’t it be great if AI could also interact with other tech programs and take action on our behalf?

AI could potentially handle simple personal tasks like:

- Canceling or changing travel plans.

- Setting up daily, weekly or monthly reminders.

- Purchasing products on a grocery list.

- Mapping out driving routes.

Devices like Siri and Alexa can already accomplish some of these tasks, with varying degrees of accuracy and success. And we’re now consistently seeing AI assistants in movies and pop culture (Ironman’s Jarvis is a great example).

But AI assistants could get more powerful, and in the future could have the ability to take action on any interface — even those that aren’t pre-built with integrations. We could type in a prompt as text or speak a command, and the output might be an action.

I predict that a lot of companies will be developing tools in this area, either directly by aiming for artificial general intelligence, or by working to create more useful AI assistants (no offense to Siri).

The future is bright for GenAI

With continued research and development, we can expect to see further improvements in the quality and diversity of GenAI outputs, leading to new and exciting applications and use cases. While there are still technical challenges to overcome, the future of generative AI looks promising, and it will unquestionably play an increasingly important role in our lives.

For my recommended tools in each of these categories, read these issues of my Substack newsletter: here and here! Then be sure to subscribe, so my weekly updates go straight to your inbox.

Read Full Article

April 25, 2023

There have been remarkable advancements in generative AI in the past few years. Read on to explore the latest news on how current tools are producing realistic and impressive outputs in various domains — and get my predictions on what’s in store for the future of GenAI.