Artificial intelligence is no longer a single thing. It’s a collection of distinct capabilities, each solving a different class of problem. That distinction matters for product leaders. Knowing what each capability does, and what it can’t do on its own, is what allows you to move from “we should use AI” to a clear answer about which capabilities, in what combination, address the specific problem your users are facing.
Here’s a working map of the capabilities most relevant to modern product development, plus three examples of how familiar products combine them to solve real user problems.
The Core Capabilities
Natural Language Processing (NLP) enables machines to read, interpret, and respond to human language. It underpins everything from search and summarization to customer service automation and contract analysis.
Translation converts text or speech from one language to another while preserving meaning and, ideally, tone. Automated translation has improved substantially in recent years, though high-stakes professional translation still benefits from human review.
Semantic search retrieves results based on meaning rather than exact keyword matching. It understands that “how to reduce employee turnover” and “improving staff retention” are asking the same thing. This is how enterprise knowledge tools and modern search products surface relevant results even when the user’s phrasing doesn’t match the source document.

Speech recognition and synthesis converts spoken language to text and text back to speech. Recognition powers voice assistants and real-time transcription. Synthesis generates spoken audio from text, including AI-generated voices that approximate human intonation. Combined, these capabilities form the audio interface layer for a growing number of applications.
Speaker identification goes beyond transcription to determine who is speaking. It distinguishes between individuals based on the unique characteristics of their voice. This is what enables meeting transcription tools to attribute statements to the right person, or security systems to authenticate users by voice.
Computer vision enables machines to extract meaning from images and video. It covers tasks including identifying objects, reading documents, detecting defects, and understanding scenes. Industrial quality control, autonomous navigation, and medical imaging analysis are all built on this capability.
Face recognition is a specialized application of computer vision that identifies or verifies a person based on the geometric features of their face. It is used in device authentication, building access, photo organization, and law enforcement, and carries more regulatory and ethical weight than most other capabilities on this list.
Content generation produces original text, images, audio, video, or code in response to a prompt or set of instructions. Practical applications range from first-draft copywriting and code generation to synthetic data creation and personalized customer communications.
Content prediction anticipates what a user wants to see, hear, or do next, based on their past behavior and the behavior of similar users. Recommendation engines are the clearest example. The distinction from semantic search is intent: semantic search answers a query, while content prediction surfaces relevant content before a query is even formed.
Anomaly detection identifies data points that deviate meaningfully from an established baseline. While content prediction asks “what comes next?” anomaly detection asks “what doesn’t belong?” It’s used in fraud detection, cybersecurity, equipment monitoring, and supply chain integrity.

Forecasting and predictive analytics uses historical patterns to generate quantitative predictions about future outcomes, such as demand, churn, revenue, or equipment failure timelines. Unlike content prediction, which is about personalization, forecasting produces actionable forward-looking estimates from structured data. It’s a core capability in operations, finance, and supply chain product development.
Sensor data analysis interprets real-time data streams from physical devices such as GPS modules, temperature sensors, cameras, and accelerometers. Predictive maintenance in manufacturing, patient monitoring in healthcare, and real-time traffic inference in navigation apps all depend on this capability.
Emotion recognition attempts to infer emotional states from facial expressions, voice, or written language. It’s used in customer experience research, mental health support tools, and human-computer interaction design. Accuracy and cultural generalizability remain active areas of research, and the ethical considerations around consent and inference are significant.
Three Examples from Products You Know
1. Apple Face ID: Face Recognition + Sensor Data Analysis

Passwords are easy to forget, easy to steal, and slow to type. Apple built Face ID to eliminate that friction entirely, making device authentication invisible to the user.
When you glance at your phone, the camera system reads a stream of data about the physical geometry of your face. That is sensor data analysis: converting a real-time input from hardware into something the system can work with. According to Apple’s Face ID security documentation, a separate neural network then takes that representation and compares it against the template created when you enrolled your face. That is face recognition.
Sensor data analysis produces the input. Face recognition makes the decision. The user experiences neither. Just a phone that unlocks when they look at it.
2. Google Maps: Sensor Data Analysis + Forecasting + Content Prediction

Driving is unpredictable. Traffic jams appear without warning, conditions change mid-trip, and the fastest route at departure may not be the fastest route ten minutes later. Google added AI-driven traffic prediction to Maps to take that uncertainty off the driver’s plate.
The foundation is sensor data analysis. GPS speed and location signals from millions of phones moving through road networks tell the system where traffic is moving and where it isn’t. But knowing current conditions isn’t enough to give you an accurate arrival time, because the road you’ll be on in 20 minutes may look nothing like it does right now. This is where forecasting comes in. According to Google’s account of the system, Maps combines live conditions with years of historical traffic patterns to estimate what roads will look like further along your route. In a partnership with DeepMind, Google improved ETA accuracy in a number of cities, as described in DeepMind’s published account of that work.
Content prediction handles the last part of the problem. Maps doesn’t wait for you to ask for an alternate route. It surfaces one before you think to look, based on changing conditions ahead. The mid-trip reroute suggestion is content prediction: the system anticipating what you’d want next without you asking for it.
3. Spotify: Content Prediction + NLP + Semantic Search + Content Generation

Users want to find music they’ll love without spending time looking for it, and they want variety without irrelevance. Spotify developed its recommendation system to solve that, surfacing the right music for each listener without requiring them to go looking for it.
Content prediction is the core of Discover Weekly and similar features. It analyzes listening behavior across users to identify patterns and infer what you’re likely to want next, before you’ve searched for anything. But content prediction alone would surface popular tracks, not personally relevant ones. This is where natural language processing adds depth: NLP is applied to playlist titles, editorial descriptions, and social text to understand the cultural meaning of a song, not just its audio characteristics.
Semantic search plays a supporting role as well. When Spotify users describe what they want in natural language, such as “something like a late-night drive playlist,” the system needs to match that intent to actual tracks. That requires understanding meaning, not just matching keywords to metadata.
Spotify’s own research describes how the company has integrated large language models to generate personalized explanations for recommendations, giving listeners context for why a track might resonate with them beyond “listeners like you also enjoyed.” The AI DJ feature extends this further: it combines content prediction to select tracks with content generation to produce spoken commentary that contextualizes the music in real time. The DJ doesn’t just play songs; it narrates a listening session.
What these examples have in common is that the user problem drove the capability combination. That kind of judgment starts with knowing the landscape. Understanding what each capability does, and what it can’t do, is what allows you to assess whether AI belongs in a solution at all, and if it does, which capabilities to reach for.
