Senior Partner & Managing Director
Ten building blocks are essential to designing and assembling AI systems. Vendors provide the basic functionality that each building block possesses, but companies often modify blocks to create customized applications. The simplest AI use cases often consist of a single building block, but over time they often evolve to combine two or more blocks. The exhibit below organizes building blocks according to whether they pertain primarily to data, to processing, or to action.
Machine vision is the classification and tracking of real-world objects based on visual, x-ray, laser, or other signals. Optical character recognition was an early success of machine vision, but deciphering handwritten text remains a work in progress.
The quality of machine vision depends on human labeling of a large quantity of reference images. The simplest way for machines to start learning is through access to this labeled data. Within the next five years, video-based computer vision will be able to recognize actions and predict motion—for example, in surveillance systems.
Speech recognition involves the transformation of auditory signals into text. In a relatively quiet environment, such applications as Siri and Alexa can identify most words in a general vocabulary. As vocabulary becomes more specific, tailored programs such as Nuance’s PowerScribe for radiologists become necessary. We are still a few years away from producing a virtual assistant that can take accurate notes in noisy environments with many people speaking at the same time.
Natural-language processing is the parsing and semantic interpretation of text. This capability recognizes spam, fake news, and even sentiments such as happiness, sadness, and aggression. Today, NLP can provide basic summaries of text and, in some instances, infer intent. For example, chatbots attempt to categorize callers based on what they perceive to be the callers’ intention. NLP is likely to improve significantly over the next several years, but a full understanding of complex texts remains one of the holy grails of artificial intelligence.
Information processing covers all methods of search, knowledge extraction, and unstructured text processing for the purpose of providing answers to queries. Closely related to NLP, this building block involves searching billions of documents or constructing rudimentary knowledge graphs that identify relationships in text. (Using data from the Wikipedia entry for Angela Merkel, such a graph can tag Merkel as a woman, the chancellor of Germany, and someone who has met Donald Trump.) It also might involve semantic reasoning—for example, determining that Trump is president of the US from the sentence “Trump is the Merkel of the US.” Despite the rapid growth of knowledge databases, this type of learning based on reasoning is likely to remain rudimentary for the next few years.
Learning from data is essentially machine learning—the ability to predict values or classify information on the basis of historic data. While machine learning is an element in other building blocks, such as machine vision and NLP, it is also a building block in its own right. It is the basis of systems such as Netflix’s movie recommendations, cybersecurity programs that employ anomaly detection, and standard regression models for predicting customer churn, given previous churn data.
One challenge in business applications involves removing human bias from data. Systems designed to identify fraud, predict crime, or calculate credit scores, for example, encode the implicit biases of agents, police officers, and bank officials. Cleaning the data can be challenging.
Finally, many machine learning models today are inherently black boxes. Data scientists may need to design transparency into such systems, especially in regulated environments, even if doing so involves some tradeoffs in performance. Because of the intensive ongoing research in this field, transparency is likely to improve in the next five years.
Planning and exploring agents can help identify the best sequence of actions to achieve a goal. Self-driving cars rely heavily on this building block for navigation. Identifying the best sequence of actions becomes vastly more difficult as additional agents and actions enter the picture. A fast-growing subfield, reinforcement learning, emphasizes receiving an occasional hint or reward rather than explicit instructions. Reinforcement learning was instrumental in Google DeepMind’s success in the game of Go and is also closely associated with the way the human brain learns through trial and error.
Image generation is the opposite of machine vision; it creates images based on models. Still in its infancy, this building block can complete images in which the background is missing, for example, or can alter a photograph to render it in the style of, say, Vincent van Gogh. Image generation is the engine behind virtual- and augmented-reality tools such as Snapchat’s masks. It is currently an active M&A target for large tech companies.
Speech generation covers both data-based text generation and text-based speech synthesis. Alexa exemplifies the capabilities of text-to-speech generation today. This building block is starting to allow journalism organizations to automate the writing of basic sports and earnings reports, such as game summaries and financial news releases. Within the next five years, speech generation will likely be able to incorporate rhythm, stress, and intonations that make speech sound natural. Music generation will become more personalized in the near future, too.
Handling and control refers to interactions with real-world objects. For example, robots already learn from humans on the factory floor, but they have trouble with novel or fluid tasks such as slicing bread or feeding elderly people. As companies globally pour money into this field, robots should become much better at picking up novel items in warehouses and displaying fluid, humanlike motion and flexibility.
Navigating and movement covers the ways in which robots move through a given physical environment. Self-driving cars and drones do reasonably well with their wheels and rotors, but walking on legs—especially a single pair of legs—is a much more difficult challenge. Robots that can fluidly climb stairs or open doors will not arrive for a few more years. Four-legged robots require less balance, however, and current models are already able to navigate environments that are effectively inaccessible to wheeled vehicles.
You might be interested in
AI has entered the business world. What happens next?Browse the Collection
The BCG Henderson Institute is The Boston Consulting Group’s internal think tank, dedicated to exploring and developing valuable new insights from business, technology, and science by embracing the powerful technology of ideas. The Institute engages leaders in provocative discussion and experimentation to expand the boundaries of business theory and practice and to translate innovative ideas from within and beyond business. For more ideas and inspiration from the Institute, please visit: Ideas & Inspiration