Subscribe to our Artificial Intelligence E-Alert.

" "

Google’s launch of Gemini can be seen as the latest advancement in generative AI, highlighting a shift toward multimodality.

At launch, ChatGPT (GPT3.5) revolutionized content production, and subsequent large multimodal models (LMMs) like GPT4 and Gemini have the potential to revolutionize sectors such as manufacturing, e-commerce, and agriculture.

These new LMMs are trained on images and code, rather than on text alone. Gemini adds audio and video, allowing the AI to directly perceive the physical world.

The race is on among tech companies and open source communities to add new modalities that enhance LMMs’ industrial applications.

Learn More About GenAI
Learn More About GenAI
Generative AI
Generative artificial intelligence is a form of AI that uses deep learning and GANs for content creation. Learn how it can disrupt or benefit businesses.
AI Hero Video
Artificial Intelligence
Scaling artificial intelligence can create a massive competitive advantage. Learn how our AI-driven initiatives have helped clients extract value.

The So What

Such multimodal capability will be transformational for industry, says Leonid Zhukov, director of the BCG Global AI Institute.

Traditional AI is constrained by preset rules—users decide what they want the AI to do and train it for that task. While GenAI models break free from this constraint, LMMs go even further. They can take in so many forms of data that they could respond to seemingly unlimited situations in the physical world, including those that users can’t predict, Zhukov explains.

Companies’ current 10-20% efficiency gains from GenAI bots could expand into new domains with LMMs, he says.

And this is just the beginning. “Today’s LMMs can see and hear the world. Tomorrow they could also be trained on digital signals from equipment, IoT sensors, or customer transaction data—to create a complete picture of your enterprise’s health on its own, without explicit instruction,” Zhukov says.

Here are just a few potential industrial applications:

  • Predictive maintenance and plant optimization. Instead of simply flagging known fault points, LMMs could take in video, sounds, and vibrations throughout the production line—independently monitoring for subtle changes and identifying unexpected signs of deterioration.
  • Digesting visual data to drive understanding. At a sorting plant, algorithms can already be tasked with detecting individual items, such as plastic bottles for recycling. LMMs could independently see and analyze all waste, filter large mixes of objects, and identify unpredicted items.
  • Medical advances. LMMs could improve the accuracy of AI models that analyze scans such as MRI, CT, and X-rays by layering in sound data such as heart beats, and then use natural language to engage with the doctors on personalized treatment plans.
  • Accessible shopping experiences. LMMs could convert data from a retailer’s physical and digital presence into the best source of real-time information for a customer’s needs—for instance, visual or auditory support—providing a more inclusive shopping experience.

Now What

Firms need to prepare to integrate multimodal models. According to Zhukov, leaders should:

  • Drastically revisit your data strategy and operations. LMMs promise to deliver enormous value from underutilized (or uncollected) data. This is significant because, according to a study by Seagate, companies are currently underutilizing up to 70% of data they collect. Companies also need to make sure the data has the right features, for example time stamps, to be fed into the models.
  • Decide whether to build or partner. AI services will likely evolve from a few large models toward many smaller industrial ones. And unlike pure text models, multimodal models are unlikely to offer out of the box solutions right away, because industrial data is not publicly available. Some large industry players may choose to build their own models and offer them as a service for others; smaller firms will need to find the right partners. That choice will determine the type of training and hiring needed to support and integrate the models.
  • Monitor GenAI’s jagged frontier. LMMs have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows, Zhukov believes.

About BCG X

About BCG Henderson Institute

Subscribe to our Artificial Intelligence E-Alert.