This is Article 6 of 9 in our series: Advanced Prompt Engineering Mastery. Previous: Mastering Multimodal Prompting. Next: Intelligent Prompt Chaining and Meta-Prompting.

From Answering to Acting

Every technique we have covered so far — context engineering, branching reasoning, self-reflection, hallucination control, multimodal analysis — has one thing in common: the model produces an output and stops. You take that output, decide what to do with it, and if needed start another conversation. The human stays in the loop between every model call.

Agentic AI breaks that pattern. Instead of producing one output and waiting, an agentic system receives a goal, breaks it into steps, decides which tools to use at each step, executes those tools, observes the results, and continues until the goal is reached — or until it determines it cannot proceed and asks for guidance. The human may not intervene at all between the start and the end.

This is not a distant future capability. As of 2026, Claude, GPT-4o, and Gemini all support tool-use APIs that allow developers to give models access to web search, code execution, file systems, databases, and external services. Frameworks like LangChain, LlamaIndex, and Anthropic’s own Claude tooling make it practical to build multi-step agent pipelines without deep engineering expertise. And increasingly, non-developer tools — Zapier AI, Make with AI, Notion AI — are building agentic workflows into products that freelancers and professionals use every day.

Understanding how to prompt agentic systems is now a practical skill, not an academic one.

What Makes a System Agentic

Three properties distinguish an agentic system from a standard model call:

Planning. The system decomposes a high-level goal into a sequence of subtasks and decides in what order to address them. This planning may happen in a single initial step (plan-then-execute) or incrementally as the task unfolds (dynamic planning).

Tool use. The system can call external functions — search engines, code interpreters, APIs, databases, file readers — and incorporate their outputs into its reasoning. The model does not pretend to know things it cannot know; it fetches what it needs.

Memory and state. The system tracks what it has done and what it has learned across multiple steps. Without memory, each step starts from zero. With memory — whether in the context window, an external database, or a structured scratchpad — the system can build on previous results and avoid repeating work.

The shift from prompting to agentic design is the shift from giving instructions to defining objectives. You stop telling the model what to do and start telling it what you want achieved.

The ReAct Framework: Reason, Act, Observe

The most widely adopted conceptual framework for agentic prompting is ReAct (Reason + Act), introduced by Yao et al. at Princeton and Google in a 2022 paper. ReAct structures each step of an agentic task as a three-phase cycle:

Thought: The model reasons about the current state — what has been done, what is needed next, what tool or action is appropriate.
Action: The model selects and calls a tool (search, calculator, API, code runner) or takes an action in the environment.
Observation: The model receives the result of the action and incorporates it into its reasoning before the next cycle.

The paper showed that ReAct outperformed models that either reasoned without acting (pure chain-of-thought) or acted without reasoning (pure reinforcement learning) on tasks requiring information retrieval and decision-making. The combination of transparent reasoning with grounded action was the key.

Template 1: The ReAct Agent Prompt

This is the foundational template for building a ReAct-style agent. It works with any model that supports tool definitions (Claude, GPT-4o, Gemini) and can be adapted for tool-less environments where the model simulates tool output from its knowledge.

You are an autonomous agent working to achieve the following goal.

GOAL: [describe the end state you want achieved, not the steps]

AVAILABLE TOOLS:
- search(query): Returns top web search results for a query.
- run_code(code): Executes Python code and returns output.
- read_file(path): Reads the contents of a file.
- write_file(path, content): Writes content to a file.
[add or remove tools as available in your environment]

WORKING METHOD:
At each step, follow this structure exactly:

THOUGHT: [reason about the current state — what do you know,
          what do you still need, what should you do next?]
ACTION: [name the tool and the exact input you will pass to it]
OBSERVATION: [the result returned by the tool — do not fabricate;
              wait for the actual result before continuing]

Repeat THOUGHT → ACTION → OBSERVATION until the goal is achieved.

When the goal is complete, output:
FINAL ANSWER: [the result or deliverable]
STEPS TAKEN: [brief numbered list of what you did]
CONFIDENCE: [High / Medium / Low, with a one-sentence reason]

Template 2: Plan-Then-Execute Agent

For complex tasks where upfront planning reduces wasted effort, use this two-phase structure. The planning phase produces a verifiable roadmap before any action is taken — which means you can review and correct the plan before the agent starts executing.

PHASE 1 — PLANNING:

You have been given the following goal:
GOAL: [describe what needs to be achieved]

CONSTRAINTS:
- [list any non-negotiable limits: time, tools available, scope]

Before taking any action, produce a complete plan:
1. Break the goal into sequential subtasks.
2. For each subtask, identify:
   - What tool or information source is needed.
   - What a successful output looks like.
   - What could go wrong and how you will handle it.
3. Identify any step where human review is required before continuing.

Output the plan as a numbered list. Do not begin execution.

---

[After human reviews and approves the plan:]

PHASE 2 — EXECUTION:

Execute the plan above step by step.
For each step, use the THOUGHT → ACTION → OBSERVATION format.
If a step produces an unexpected result, pause and explain
the discrepancy before deciding whether to continue or adapt.

The pause between phases is not optional. One of the most common agentic failures is an agent that misunderstands the goal at step one and then executes twenty subsequent steps perfectly — toward the wrong outcome. Human review of the plan costs minutes. Undoing twenty wrong steps costs much more.

Defining Tools Effectively

The quality of an agent’s tool use depends almost entirely on how well the tools are defined. Vague tool definitions produce vague tool calls. A well-defined tool tells the model exactly what the tool does, what inputs it accepts, what it returns, and when it is and is not appropriate to use.

TOOL DEFINITIONS:

search(query: str) → list[str]
  Purpose: Find current information not in your training data.
  Use when: The task requires recent facts, prices, names, or
            events that may have changed since your knowledge cutoff.
  Do NOT use when: The answer is stable general knowledge
                   you are confident about.
  Returns: A list of text snippets from web results.
  Limit: Maximum 3 searches per task unless explicitly permitted.

run_code(code: str, language: str = "python") → str
  Purpose: Execute code and return the output or error message.
  Use when: The task requires computation, data processing,
            or file manipulation that text reasoning cannot do.
  Do NOT use when: A simpler text-based answer suffices.
  Returns: stdout output or error traceback as a string.
  Limit: Do not execute code that modifies files outside
         the designated working directory.

ask_user(question: str) → str
  Purpose: Request clarification or approval from the human.
  Use when: You reach a decision point where two valid paths exist
            and the correct one depends on human preference.
  Use when: A planned action is irreversible (deleting files,
            sending messages, making purchases).
  Returns: The human's text response.

Notice the ask_user tool. Building an explicit human-in-the-loop tool is one of the most important safety mechanisms in agentic design. Without it, agents may complete irreversible actions — sending emails, deleting files, posting content — without the human ever having a chance to review.

Agentic Failure Modes and How to Design Against Them

Agentic systems fail in ways that single-pass prompts do not. Understanding the failure modes lets you design prompts that prevent them.

Failure Mode	What Happens	Design Fix
Goal drift	Agent pursues a subtask so deeply it loses sight of the original goal	Include the goal in every cycle’s context; add a “check against goal” step after every 3 actions
Hallucinated tool calls	Agent fabricates tool outputs instead of waiting for real ones	Explicitly instruct: “Do not generate OBSERVATION content — wait for the actual tool return”
Infinite loops	Agent repeats the same action when it does not produce the expected result	Set a maximum step count; instruct the agent to escalate to ask_user after 2 failed attempts on any subtask
Scope creep	Agent takes actions beyond the defined task boundary	Define explicit out-of-scope actions in the system prompt; require ask_user for any action not listed as permitted
Irreversible action	Agent deletes, sends, or publishes before human review	Classify all tools as reversible or irreversible; require ask_user confirmation before any irreversible action

Practical Agentic Use Cases for Professionals

Agentic workflows are not only for developers. Any professional who currently performs multi-step research, synthesis, or content production tasks can benefit from agentic design — even using off-the-shelf tools.

For freelance researchers and writers: A research agent can be instructed to search for sources on a topic, extract relevant claims from each, check them for consistency using the anti-hallucination techniques from Article 4, and produce a structured brief. What takes two hours of manual tab-switching and note-taking can run as a supervised agent workflow in a fraction of the time. (See our article: AI for Freelancers — 10 Tasks You Can Finish in Half the Time.)

For translators and localisation professionals: An agentic translation workflow can process a document section by section, applying terminology lookup at each step, running the anti-hallucination self-consistency check on ambiguous terms, and compiling a glossary of decisions made during the translation. Each step in the chain is a tool call; the human reviews the output at defined checkpoints rather than continuously.

For content managers: A content agent can be instructed to audit a set of published URLs against a style guide, flag non-compliant passages, suggest corrections, and produce a prioritised revision list — all without the human having to open each page manually.

Template 3: Minimal Agentic Workflow for Non-Developers

Not every agentic workflow requires code. This template uses only a chat interface but structures the interaction to produce agentic behaviour through careful prompting.

You are working through a multi-step task. Follow these rules strictly:

GOAL: [state the end result you need]

RULES:
1. Complete one step at a time. After each step, stop and output:
   STEP COMPLETE: [what you did and what you found]
   NEXT STEP: [what you plan to do next]
   WAITING FOR: [anything you need from me before continuing]

2. If you need information you do not have (a file, a URL,
   a number, a decision), ask for it explicitly before proceeding.
   Do not guess or fabricate.

3. Before any action that cannot be undone, state what you are
   about to do and wait for my confirmation.

4. After every 3 steps, output a PROGRESS CHECK:
   - Goal: [restate it]
   - Completed: [list what is done]
   - Remaining: [list what is left]
   - On track: Yes / No / Uncertain — explain if not Yes.

Begin with Step 1.

Model Capabilities for Agentic Tasks in 2026

Model	Native Tool Use	Multi-Step Planning	Best Agentic Scenario
Claude 3.7 Sonnet	Yes (tool_use API)	Excellent	Long research tasks, document workflows, careful reasoning under uncertainty
GPT-4o	Yes (function calling)	Strong	Broad task types; best with established frameworks like LangChain
Gemini 2.0 Pro	Yes	Strong	Tasks involving large documents, video, or search-heavy research
Llama 3.3 / Mistral	Via frameworks	Moderate	Self-hosted agentic pipelines with privacy requirements

Common Mistakes

Describing steps instead of goals. “Search for three articles, summarise each one, and combine the summaries” is a procedure, not a goal. It constrains the agent to your method even if a better one exists. Replace with: “Produce a well-sourced overview of [topic] that a professional non-specialist can act on.” Let the agent choose its approach.

No stop conditions. An agent without clear completion criteria will continue indefinitely, refining outputs past the point of diminishing returns or branching into unintended territory. Always define what “done” looks like.

Trusting agentic output without review. Agentic systems amplify both good and bad decisions across multiple steps. A misunderstood instruction in step one affects every subsequent step. Structured checkpoints — every three steps, or before any irreversible action — are not overhead; they are the mechanism that keeps agentic output trustworthy.

Building agents for tasks that do not need them. If a task has one or two steps, a standard prompt handles it better than an agent. Agentic complexity is justified when there are genuine decision points that depend on intermediate results — when what you do in step three depends on what you found in step two.

Exercises

ReAct simulation: Pick a research task you would normally do manually. Run Template 1 in a chat interface, manually providing “tool outputs” for the search and read steps (paste real search results or document excerpts). Compare the structured output to what you would have produced with a single-pass prompt.
Plan review: Use Template 2, Phase 1 only, on a complex task you are currently working on. Review the plan the agent produces before approving it. Note whether the plan matches how you would have approached the task — and where it diverges.
Failure mode audit: Review an agentic or multi-step AI workflow you have used recently (in any tool). Map each step against the failure mode table. Identify which failure modes are currently unguarded against in your workflow.

Next in the series: Article 7 — Intelligent Prompt Chaining and Meta-Prompting: Systems That Generate Superior Prompts.

References

Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. Princeton / Google. arxiv.org/abs/2210.03629
Anthropic (2024). Tool Use documentation. docs.anthropic.com
Zy Yazan Platform — AI for Freelancers: 10 Tasks in Half the Time. zyyazan.sy
Zy Yazan Platform — Advanced Context Engineering (series intro). zyyazan.sy

🌐 Read this article in Arabic

Advanced Prompt Engineering Mastery 2026