Traditional demand planning has a lot of problems. It relies on siloed data like ERP or historical sales and struggles to integrate real-time or unstructured data. Forecasts are typically updated on a fixed cadence (weekly or monthly), which means planners often react to demand shifts too late. And trying to explore “what-if” scenarios can test the patience of even the most seasoned planner. Bring generative AI (GenAI) into the process, though, and things get faster and more flexible.
For example, at a large distributor, we developed an Abnormal Margin Co-Analyst that automatically flags, prioritizes, and diagnoses unusual margin patterns, surfacing root causes and actionable insights in real time. This innovation is expected to generate $15–20 million in annualized savings by helping teams focus on what truly matters. We also built an AI-powered chat assistant to support planners in constructing econometric forecasts. The assistant can answer questions about forecast builds, explain model configurations, trigger specific pipelines, and make targeted adjustments, dramatically reducing preparation time and improving agility in decision-making.
Together, these use cases illustrate how GenAI can reimagine demand planning, moving it from a reactive, rigid process to one that’s dynamic, data-rich, and continuously learning.
This post proposes a production-ready framework for integrating the best elements of GenAI into the demand-planning workflows. By combining deterministic calculations, AI-guided analysis, and a traceable state model, organizations can go far beyond traditional planning to deliver precision, consistency, and explainability.
Challenges in Demand Planning
The analytics of demand planning rely on several interlocking requirements that make it both technically demanding and operationally critical. The analytics depend, first of all, on the ability to compare forecasts across different time periods or against actual outcomes and then identify significant deviations. They also involve planning data that may span millions of rows across multiple dimensions, requiring efficient filtering to identify meaningful variances–all of which is very difficult to accomplish with traditional planning tools.
Successful analytics demand the ability to use both statistical methods and domain knowledge to identify what constitutes “significant” variance (is a 10% error on a high-volume product more important than a 50% error on a low-volume product?). Once the analytics have identified the variances, the demand planners are obliged to trace these variances to their sources using multiple datasets and causal reasoning. When they’ve concluded their planning process, they have to communicate the results to stakeholders in a format that is both consistent and conforms to their organization’s standards.
Limitations of Purely Prompt-Based GenAI
Not that GenAI by itself is the perfect solution. While it can conceptually handle all these tasks, it does have its limitations. It’s not designed for high-precision calculations and may, for example, inconsistently round error-concentration calculations up or down of just make errors in arithmetic. GenAI is also prone to create sui generis outputs. Ask it to “find forecast errors and explain them” and you will get a different answer for each run. This is a big problem for business users who need stable, auditable insights. Add to that the fact that there’s still no way to know how AI’s Black Box reasoning arrives at its conclusions. Then throw in the additional fact that using simple prompts to make complex calculations like error-concentration metrics may not end well–and that LLMs have a reputation for generating convincing-sounding explanations for whatever answers they deliver, and it might be tempting for some demand planners to contemplate reverting to traditional methods. The following article will explain to you how to remove or minimize these issues so you can being using GenAI to take your demand planning to an entirely new level.
A Structured Approach: Graph-Constrained Agent Workflows with State Management
Rather than relying solely on prompts, our BCG X team has found that a more effective approach is to use a structured workflow with clearly defined roles:
Workflow graph with validation cycles and logic loopbacks
Implement a cyclic workflow where:
This pattern enables powerful control-loop mechanisms that enable agents to self-correct through feedback by evaluating their own outputs then initiating remedial actions when quality thresholds aren’t met. This process includes dynamically implementing adaptive sampling and adjusting data granularity based on detected error patterns, establishing validation checkpoints that must be passed before proceeding to subsequent analysis stages, and creating learning loops to track successful analysis patterns and achieve improvements over time.
For example, if a data query returns unexpected results, the system can retry with clarifications:
- Global State Management
To maintain context and track execution, we recommend creating a well-structured global state management system. The following example system uses an AnalysisState typology with these key components:
This state object enables planners to comprehensively track the analysis process, making the system’s reasoning transparent and debuggable. Note that this approach to state management requires careful consideration of memory tradeoffs, such as raw versus consolidated data, summary versus full data, and in-memory versus persistent storage. But with the correct design decisions, the approach ensures that the workflow remains responsive—even with large datasets—while maintaining a complete audit trail of the analysis process.
- Task Separation
Rather than using the AI as a monolithic system, we recommend delegating key tasks to specialized components:
- SQL Generation: The LLM translates structured prompts into precise SQL queries, executes them, and performs initial validation to ensure completeness and correctness.
- Metric Calculation: All statistical computations are handled deterministically using pandas, ensuring precision and repeatability without relying on the LLM.
- Threshold Application: The LLM applies domain-informed, subjective thresholds to algorithmically flag significant contributors based on calculated metrics.
- Market Context Integration: Leveraging the existing output, the LLM dynamically determines which supplementary data sources to query (e.g. Nielsen, DAR) and how to enrich the analysis with them.
- Natural Language Insights: The LLM interprets the structured data and generates contextualized, human-readable explanations to clearly communicate root causes and patterns.
For example, when calculating forecast error metrics, the system uses deterministic functions rather than asking the LLM to perform math:
- Data-First Approach
Our approach prioritizes data collection and structuring before analysis. This data-first approach ensures that instead of speculating on incomplete data, the AI has all relevant information available when forming conclusions:
- Query base forecast vs. actual data
- Calculate error metrics (MAPE, bias, etc.)
- Identify significant contributors bases on statistical thresholds
- Pull in contextual data (Nielsen market share, DAR drivers)
- Only then perform analysis and generate insights
- Analysis at the Right Stage
Only after the data is properly aggregated and structured, is it time to apply AI’s strength in pattern recognition and natural language generation:
Putting analysis at the end of the workflow is a good idea for several reasons:
- Rapid iteration on analysis logic: With all data pre-collected and structured, analysts can modify prompts to quickly generate insights–without having to re-run the entire data pipeline.
- Complex analysis without complex code: The LLM, for its part, can generate sophisticated multi-factor analyses by examining relationships across various data sources.
- Contextual adaptation: Because different business scenarios require different analysis approaches, our structured system can adjust prompts based on detected patterns.
- Prompt-engineering efficiency: Since all relevant data is already in the context, prompt engineers can put their attention on analysis quality rather that extraction logic. This enables non-technical business experts to refine analysis prompts without having to ask more technically minded staff to change the underlying data pipeline.
- Maximum leverage of context windows: And because modern LLM have such expansive context windows, they can process all relevant data simultaneously, enabling cross-referencing between metrics and levels that in traditional programming would require complex logic. This is tough for humans to pull off.
Here’s a practical example of the benefit of this approach: When planners need to incorporate a new logic to map pipeline volume in error calculation, all they have to do is update the prompt template to include this dimension—without having to modify the data collection or calculation mode. The system leverages already collected metrics to immediately generate new insights.
Prescriptive Prompting for Structured Reasoning
We’re also excited about the way we’ve implemented structured causal reasoning frameworks within prompts that guide the AI through domain-specific analytical pathways. Just as our approach uses carefully designed workflows to process data, it also applies equally rigorous structure to the reasoning process itself. Our prescriptive approach ensure that AI analysis follows expert human reasoning patterns, rather than taking those all-too-familiar and unpredictable paths that might seem plausible–but lack grounding in domain expertise.For example, when analyzing forecast errors, our system doesn’t just ask the AI to “explain the variance.” Instead, it provides a step-by-step causal reasoning framework that mirrors how an experienced, human demand planner would approach the problem:
In effect, this approach guides the AI to follow the same reasoning processes as a human analyst, while staying grounded in the actual data.
The Workflow in Action: A Demand-Planning Example
Now let’s walk through how this structured approach works in practice.
- Data Collection and Validation
The workflow begins by gathering forecast data vs. actual data at various hierarchy levels. The graph’s first stage queries the database for each level (brand, category, SKU), validates the results, and stores the raw data in the state:
NOTE: Each level is processes systematically with validation to ensure data quality.
- Metric calculation and contributor identification
Once raw data is available, the workflow calculates forecast-error metrics deterministically:
For significant-contributor identification, the system first calculates a concentration ratio to understand error distribution, then dynamically adjusts thresholds based on this concentration, and finally selects records meeting either coverage or contribution criteria.
- Contextual data integration
To enable root cause analysis, the workflow conditionally fetches additional data sources, including real-time or unstructured data:
These retry loops ensure robustness against temporary data issues, implementing automatic retries with increasingly detailed context:
- Analysis with structured reasoning
Finally, when all data is available, the analysis step applies structured reasoning:
The AI is given precise instructions on how to approach the reasoning:
This structured approach ensures the reasoning follows business logic and remains consistent across analyses.
Conclusion
Integrating GenAI into demand planning isn’t just a matter of adding prompts to an existing workflow. It requires the creation of an architecture that separates deterministic calculation from AI interpretation, enforces validation cycles to guarantee data quality, and maintains comprehensive state tracking for transparency and debugging. Equally important is the prioritization of data structuring before AI reasoning, ensuring that the model operates on validated facts rather than on assumptions.
By combining the precision of traditional analytics with the interpretive power of GenAI, organizations can produce demand planning analyses that are both consistent and insightful. The right balance is all about taking what AI does best—recognizing patterns, interpreting context, and communicating insights—and combining it with deterministic methods for the calculations that underpin trustworthy analysis. This structured approach combines these two approaches to transform GenAI from a novelty into a production-grade tool for enterprise demand planning–a tool that delivers insights that are not only intelligent but explainable, repeatable, and actionable.