AI Agents and agentic systems: definitions and patterns

We hear a lot about AI Agents. According to LangChain's State of AI Agents survey, 51% of the respondents "have agents in production". Use cases span from performing research and summarization to streamlining tasks for personal productivity or assistance.

As Large Language Model usage evolves, new patterns and gray zones emerge, as Andrew Ng pointed out.

More and more people are building systems that prompt a large language model multiple times using agent-like design patterns. But there’s a gray zone between what clearly is not an agent (prompting a model once) and what clearly is (say, an autonomous agent that, given high-level instructions, plans, uses tools, and carries out multiple, iterative steps of processing).

Effectively, there are different "agents" and its definition is not fully shared among creators and users of such systems.

Recently, Erik Schluntz and Barry Zhang at Anthropic released an essay, Building Effective Agents, providing clear definitions and guidance when creating agentic systems.

They introduce agentic systems as an overarching term before distinguishing between "workflows" — which involve the orchestration of multiple LLMs using established patterns — and "agents", in which the LLMs autonomously manage their processes and tools usage.

Workflows are systems where LLMs and tools are orchestrated through predefined code paths.

Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

The distinction makes workflows suitable when predictability and consistency are required, and the tasks are well-defined. Agents, on the other hand, are the better option when flexibility and model-driven decision-making are needed. The definition of agents is further expanded with:

Agents begin their work with either a command from, or interactive discussion with, the human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement. During execution, it's crucial for the agents to gain “ground truth” from the environment at each step (such as tool call results or code execution) to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it’s also common to include stopping conditions (such as a maximum number of iterations) to maintain control.

As well as definitions, the article provides names for five different workflow patterns and practical guidance on building systems that combine multiple LLM calls:

Prompt chaining: Subsequent LLM calls are used to process users' input, where each call uses the output of the previous one, e.g., generating a document and then translating it into a different language.
Routing: An initial LLM call decides which specialized follow-up path should be used next, e.g., directing different types of customer service queries into different paths based on the specific customer and service.
Parallelization: A task is broken up and run in parallel (e.g., summarization of different document sections) or processed by some kind of voting mechanism.
Orchestrator-workers: An orchestrator triggers multiple LLM calls that are then synthesized together, for example, running searches against multiple sources and combining the results.
Evaluator-optimizer: One LLM checks the work of another in a loop (clear evaluation requirements must be in place).

When is it appropriate to implement full agents?

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all.

Going to full agents means that the problems are open-ended, and it's difficult to predict a path in a given number of steps.

The LLM will potentially operate for many turns, and you must have some level of trust in its decision-making.

The autonomous nature of agents means higher costs and the potential for compounding errors. We recommend extensive testing in sandboxed environments, along with the appropriate guardrails.