Eight Tips to Make GenAI Do What You Want

I’ve been accused of being a futurist. My background as an astrophysicist speaks to my deep interest in the forces driving space and time. In my PhD research, I looked at clouds of gas billions of light years from Earth with the world’s largest optical telescopes to test whether the fundamental constants of physics differ depending on time and space. This work led me into the fields of statistics and programming that set me up for my career at BCG, deep in the AI revolution.

During the height of the pandemic, I led BCG’s epidemiology modeling work supporting the firm’s response to COVID-19. I’m a founding member of BCG X, the firm’s tech design and build unit, where I work to realize the potential from AI agents.

Given the tremendous leap in AI capabilities in recent years, I’m benefiting greatly from an AI-first approach to work and productivity. I always consider AI as the first option for most tasks. Can I do it better or faster with AI? Can I achieve more? On a typical day, I may have 30 to 100 conversations with a GenAI service.

If AI will reduce my cognitive load from, say, ten minutes of complex thinking to two, I’ll put a large language model (LLM) to work in the background while I solve other problems. If I’m in a taxi, I’ll converse with a GenAI chat service to sharpen my thinking as I prepare for an upcoming meeting.

But AI will not automatically do your work for you. Getting great answers from LLMs requires a rigorous approach. The following eight best practices break down my most useful insights from the last few years:

Always ask twice. If you query ChatGPT or other GenAI tool, and the output feels satisfying, reject the impulse to be done. Make a habit of prompting the model a second time, every time. “Answer /refine” is the name of the game.

Mathematically, your chances of an accurate output improve substantially after two queries. Let’s assume your first prompt generates an answer with an error rate of 30%, and your second prompt has a 20% error rate. The error rate then falls to 6%—not bad for a single extra query.

The lesson: people expect computer levels of precision from what is essentially a virtual person, and people are not perfect. People rework, edit, and improve most of their work output. Why should we not expect the same from a virtual person?

Keep your prompts simple. Just like people, LLMs cannot readily multitask. If you feed a model too many tasks at once, the quality of the output will suffer.

This observation is closely related to studies on air traffic controllers showing that their performance can degrade notably even if cognitive load increases only incrementally. Plan your work and implement it through a logical sequence of interactions.

Sequence your questions properly. The order in which you query the model significantly affects the answers it provides.
LLMs are autoregressive. They generate content based on what they were asked, not what you are about to ask. Because the model operates in a strictly left-to-right fashion, it does not know what comes next—it can only make a best guess based on the past. That’s another reason why a generate-and-refine or generate-critique-update approach makes sense.

Specify your tasks clearly. LLMs cannot read your mind. If your request is imprecise, you might get an unexpected output. When this happens, the most common cause is failure to provide sufficiently precise instructions and information.

To generate reliable outputs, be clear and precise in what you are asking. Don’t contradict yourself by, for example, asking for more detail and brevity at the same time. Define what is in and out of scope. Provide sample outputs. It’s okay if your prompt is several paragraphs long. By formalizing your task, you will improve the quality of the output.

Prompt for reasoning before recommending. You must ask the model to provide reasoning before making a recommendation, not after. If the recommendation comes first, the reasoning will be post hoc justification.

When AI first considers all potential choices and provides reasoning for them, its ultimate recommendation will be more grounded in logic and evidence. The latest batch of “reasoning models” try to do this—think before acting.

Consider the difference:

The right way: Reasoning 🡒 Recommending
Compare the Italian restaurants in Boston and pick the best one.
The wrong way: Recommending 🡒 Reasoning
Recommend a great Italian restaurant in Boston and explain why it’s the best.

Prioritize output, then structure. If you impose strong constraints that focus on the structure of the output, the actual output is less likely to be accurate. The LLM is so focused on producing syntactically valid code or answers that it has less cognitive capacity to solve the actual task.

The key is to separate content from form. Get the content right, then fine-tune your overall output with a follow-up request for structural elegance.

LLMs tend to reason less effectively when asked to think and produce strict formats like JSON (a format for storing and exchanging data) at the same time. To avoid this trap, ask for a natural language response, and then query the model a second time to convert it to JSON.

Ask for bite-size answers. All models have limits in the length of output they can generate. Many of the key commercial LLMs will not output more than about 1,000 words, no matter how hard you beg. And quality and output length are inversely proportional.

To combat this reality, break down the writing or coding into manageable chunks. You can still create long documents by stitching together a series of shorter ones.

Check for hallucinations. While LLMs are trained on vast amounts of data, they can still fabricate details or misinterpret inputs. Review all output carefully.

Hallucinations can often be detected by simply asking if the LLM has erred. Take the LLM to task by asking: Are you sure your answer is factually correct? Have you made a claim that is untrue?

As a scientist, I base my hypotheses on a series of small observations—bite-size inquiries—that reveal larger patterns. As consultants, we take large, amorphous problems and break them down into small, solvable problems. The art of prompting is not that different. Try to write specific prompts that solve specific problems or answer specific questions. By trying to do less with each prompt, you improve reliability. These small building blocks can then become the foundation for tremendous productivity enhancement.

Weekly Insights Subscription

航空宇宙・防衛

自動車業界

消費財業界

Within 消費財業界

教育

Within 教育

エネルギー

Within エネルギー

金融機関

Within 金融機関

ヘルスケア業界

Within ヘルスケア業界

産業財業界

Within 産業財業界

保険業界

Within 保険業界

プリンシパル・インベスター、プライベート・エクイティ

Within プリンシパル・インベスター、プライベート・エクイティ

パブリックセクター

Within パブリックセクター

流通業界

Within 流通業界

テクノロジー、メディア、通信

Within テクノロジー、メディア、通信

運輸・物流

Within 運輸・物流業界

旅行・観光業界

Within 旅行・観光業界

都市計画

Within 都市計画

AI

Within AI

パーパス（存在意義）

ビジネス・レジリエンス

トランスフォーメーション

Within トランスフォーメーション

気候変動・サステナビリティ

Within 気候変動・サステナビリティ

コーポレートファイナンス＆ストラテジー

Within コーポレートファイナンス＆ストラテジー

コストマネジメント

Within コストマネジメント

顧客インサイト

Within 顧客インサイト

デジタル/テクノロジー/データ

Within デジタル/テクノロジー/データ

イノベーション戦略策定・実行

Within イノベーション戦略策定・実行

グローバルビジネス

Within グローバルビジネス

製造

Within 製造

マーケティング・セ－ルス

Within マーケティング・セ－ルス

M&A、トランザクション、PMI

Within M&A, Transactions, and PMI

オペレーション

Within オペレーション

組織

Within 組織

人材戦略

Within 人材戦略

プライシング・レベニューマネジメント

Within プライシング・レベニューマネジメント

リスクマネジメント、コンプライアンス

Within リスクマネジメント、コンプライアンス

社会貢献

Within 社会貢献

最新の論考

注目テーマ

CEOアジェンダ

BCGヘンダーソン研究所（BHI）

My Subscriptions

リーダーシップチーム

人材とカルチャー

Within People and Culture

BCGオフィス紹介

Eight Tips to Make GenAI Do What You Want

Key Takeaways

Stay ahead with BCG insights on artificial intelligence