The Four Paradigms
The infographic from GenAI.works captures something practitioners often get confused about: these four terms are not competing products. They are four different architectural patterns for building AI-powered software, each adding a new capability layer on top of the previous.
Think of it as a stack. Every RAG system contains an LLM. Every AI Agent contains either a raw LLM or a RAG system. Every Agentic AI system is composed of multiple Agents. You cannot skip a tier — the tiers build upward.
This guide walks you through building each tier from scratch using a single running example: a customer support system for a fictional SaaS product, “Clarion”. Keeping the use case constant lets you see exactly what each tier adds — and why you might, or might not, need it.
0.1Quick comparison
| Pattern | Uses external data? | Takes actions? | Multiple AI instances? | Complexity |
|---|---|---|---|---|
| LLM | No | No | No | ★☆☆☆ |
| RAG | Yes | No | No | ★★☆☆ |
| AI Agent | Optional | Yes | No | ★★★☆ |
| Agentic AI | Yes | Yes | Yes | ★★★★ |
anthropic SDK. Install with pip install anthropic. You will need a Claude API key set as the environment variable ANTHROPIC_API_KEY.
Building with LLMs
The Thinker. An LLM is a neural network trained on vast amounts of text. You send it a message; it generates a response. That is the entire primitive. Everything else in this guide is built on top of this single interaction.
1.1What it actually is
When you call Claude’s API, you are sending a structured HTTP request containing a list of messages and receiving back a generated completion. The model has no memory, no internet access, no tools — it only knows what you put in the request and what it learned during training.
This makes it ideal for tasks that only require language reasoning: writing, summarising, classifying text, answering general knowledge questions, drafting, explaining concepts, and transforming text from one format to another.
Watch out for: The model has no knowledge of your internal data. It cannot look up a user’s account. It cannot tell you yesterday’s sales figures. It will hallucinate specific facts it does not know. For the Clarion support case, a raw LLM can only answer general product questions — and your specific product was not in its training data.
1.2Step-by-step build
We are building: A basic support chat widget for Clarion. The user types a question; Claude responds using only its general knowledge.
-
1Get your API key
Sign up at console.anthropic.com. Copy the key and set it as the environment variable
ANTHROPIC_API_KEY— never hardcode it in source files. -
2Write a system prompt
The system prompt defines the AI’s role, tone, and constraints. For support, you want it to be helpful, concise, and honest about what it does not know.
-
3Call the Messages API
Pass the system prompt and the user’s message. Read back
response.content[0].text. -
4Maintain conversation history
Claude is stateless. For multi-turn conversations you must accumulate all previous messages and pass them with every request.
-
5Surface the response
Print it, stream it into a UI, or pipe it to your application layer.
1.3Code walkthrough
import anthropic # 1. Initialise client — reads ANTHROPIC_API_KEY from environment client = anthropic.Anthropic() # 2. System prompt: who the assistant is and how it should behave SYSTEM = """You are a friendly support assistant for Clarion, a project management SaaS. Answer questions clearly and concisely. If you do not know a specific answer, say so honestly rather than guessing. Never fabricate feature names or pricing details.""" # 3. In-memory conversation history (list of message dicts) history = [] def chat(user_message: str) -> str: history.append({"role": "user", "content": user_message}) # 4. Call the Messages API with full history on every turn response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system=SYSTEM, messages=history ) assistant_text = response.content[0].text # Append assistant turn so next call includes it history.append({"role": "assistant", "content": assistant_text}) return assistant_text # 5. Simple interactive loop if __name__ == "__main__": print("Clarion Support (LLM only). Type 'quit' to exit.\n") while True: q = input("You: ") if q.lower() == "quit": break print(f"\nClarion: {chat(q)}\n")
Building RAG Systems
The Researcher. RAG is the pattern of retrieving relevant documents from a knowledge base and injecting them into the LLM’s context window before it generates a response. The model’s answer is grounded in your actual data, dramatically reducing hallucination on domain-specific facts.
2.1What it actually is
RAG separates knowledge storage from reasoning. You maintain a searchable collection of documents. At query time, you retrieve the most relevant chunks and prepend them to the user’s question before sending it to Claude. Claude then answers based on that retrieved context.
The key insight: you are not fine-tuning the model. You are using in-context learning — stuffing relevant facts into the prompt at runtime. This means your knowledge base can be updated instantly, without retraining.
Watch out for: RAG quality depends entirely on retrieval quality. If the relevant chunk is not in your knowledge base, or if your search fails to find it, the model will either hallucinate or say it does not know. A poorly curated knowledge base produces a poorly performing RAG system.
2.2Step-by-step build
We are building: Clarion Support with product knowledge. The bot can now accurately answer “Does Clarion integrate with Jira?” because that answer lives in our help docs.
-
1Prepare your knowledge base
Collect your documents: help articles, FAQs, product specs. Split them into chunks of 200–500 tokens each. Overlapping chunks (50-token overlap) improve retrieval continuity.
-
2Embed your chunks
Convert each chunk to a vector using an embedding model. Store vectors in a vector database (Pinecone, Weaviate, Chroma, or an in-memory list for prototyping).
-
3Embed the query at runtime
Convert the user’s question to a vector using the same embedding model you used at indexing time.
-
4Retrieve top-K chunks
Compute cosine similarity between the query vector and all stored vectors. Return the top 3–5 most similar chunks.
-
5Augment the prompt
Inject the retrieved chunks into the system or user message. Instruct Claude to answer from this context and to say when it does not have sufficient information.
-
6Call Claude and return the grounded answer
The model’s response is now anchored to your actual documentation.
2.3Code walkthrough
For simplicity, this example uses a pure-Python in-memory vector store with word-overlap similarity. In production replace the embedding function with OpenAI, Cohere, or a local model. The RAG logic stays identical.
import anthropic, math client = anthropic.Anthropic() # ── STEP 1: Knowledge base (normally loaded from files or a DB) ── DOCS = [ "Clarion integrates with Jira, Asana, Linear, and GitHub via OAuth.", "Clarion pricing: Starter $12/user/month, Pro $28/user/month, Enterprise custom.", "To reset your password: go to Settings > Security > Reset Password.", "Clarion supports SSO via SAML 2.0 on Pro and Enterprise plans.", "File upload limit is 50 MB per file on Starter; 500 MB on Pro.", "API rate limit is 1,000 requests/hour on Pro; 10,000/hour on Enterprise.", ] # ── STEP 2: Toy embedding (word-overlap; replace with a real model) ── def embed(text): words = text.lower().split() vec = {} for w in words: vec[w] = vec.get(w, 0) + 1 return vec def cosine(a, b): keys = set(a) & set(b) dot = sum(a[k] * b[k] for k in keys) mag = (math.sqrt(sum(v**2 for v in a.values())) * math.sqrt(sum(v**2 for v in b.values()))) return dot / mag if mag else 0 doc_vecs = [(embed(d), d) for d in DOCS] # ── STEP 3 + 4: Retrieve top-K chunks ── def retrieve(query, k=3): qvec = embed(query) scored = [(cosine(qvec, dv), doc) for dv, doc in doc_vecs] scored.sort(reverse=True) return [doc for _, doc in scored[:k]] # ── STEP 5 + 6: Augment and call Claude ── def rag_chat(question): chunks = retrieve(question) context = "\n".join(f"- {c}" for c in chunks) prompt = f"""Answer using ONLY the context below. If the answer is not there, say: "I don't have that in my knowledge base." Context: {context} Question: {question}""" r = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=512, messages=[{"role": "user", "content": prompt}] ) return r.content[0].text if __name__ == "__main__": for q in [ "Does Clarion integrate with Jira?", "What is the file upload limit on the Starter plan?", "Can I use my own email domain?", # not in KB ]: print(f"Q: {q}\nA: {rag_chat(q)}\n")
Building AI Agents
The Doer. An AI Agent is an LLM that has been given tools — callable functions that can read and write to the world. The model decides which tool to call, calls it, reads the result, and continues reasoning until the task is complete. It acts, not just answers.
3.1What it actually is
Claude’s tool use works like this: you describe your available tools as JSON schemas. When Claude determines a tool call is needed, it returns a tool_use content block. Your code executes the function and sends back a tool_result message. Claude continues reasoning — potentially calling more tools — until it produces a final text answer.
This loop — reason → call tool → observe result → reason again — is the agent loop. A single-agent system runs this loop until the task is complete or a maximum iteration limit is reached.
Watch out for: Tool definitions must be precise. Ambiguous descriptions lead to incorrect calls. Always cap the agent loop at a maximum iteration count, validate tool inputs server-side, and log every tool call and result.
3.2Step-by-step build
We are building: A Clarion support agent that can look up account status and create tickets.
-
1Define your tools as JSON schemas
Each tool needs a
name, adescription(what it does and when to call it), and aninput_schema. The description is what Claude reads to decide whether to invoke it — make it precise. -
2Implement the actual tool functions
These are regular Python functions that call your database, REST API, email service — whatever the agent needs.
-
3Write the agent loop
Call Claude with
tools=your_tools. If the response contains atool_useblock, execute it, append atool_result, and call Claude again. Repeat untilstop_reason == "end_turn"with only text blocks. -
4Set guardrails
Cap the loop at N iterations (10 is a safe start). Log every tool call. Validate inputs before destructive operations.
3.3Code walkthrough
import anthropic, json client = anthropic.Anthropic() # ── STEP 1: Tool definitions ────────────────────────────────────── TOOLS = [ { "name": "get_account_status", "description": "Look up a Clarion account by email. Returns plan, status, and last login.", "input_schema": { "type": "object", "properties": {"email": {"type": "string"}}, "required": ["email"] } }, { "name": "create_ticket", "description": "Create a support ticket when the issue cannot be resolved in chat.", "input_schema": { "type": "object", "properties": { "email": {"type": "string"}, "subject": {"type": "string"}, "priority": {"type": "string", "enum": ["low", "medium", "high"]} }, "required": ["email", "subject", "priority"] } } ] # ── STEP 2: Tool implementations (mocked — replace with real calls) ─ def get_account_status(email): db = { "alice@example.com": {"plan": "Pro", "status": "active", "last_login": "2026-05-02"}, "bob@example.com": {"plan": "Starter", "status": "suspended", "last_login": "2026-03-10"}, } return db.get(email, {"error": "Account not found"}) def create_ticket(email, subject, priority): ticket_id = f"TKT-{abs(hash(email+subject)) % 10000:04d}" print(f" [TICKET CREATED] {ticket_id}: {subject} ({priority}) for {email}") return {"ticket_id": ticket_id, "status": "created"} def execute_tool(name, inputs): if name == "get_account_status": return get_account_status(**inputs) elif name == "create_ticket": return create_ticket(**inputs) raise ValueError(f"Unknown tool: {name}") # ── STEP 3: The agent loop ──────────────────────────────────────── def run_agent(user_message): messages = [{"role": "user", "content": user_message}] for step in range(10): # STEP 4: max 10 iterations response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=TOOLS, messages=messages, system="You are a Clarion support agent. Use get_account_status " "before answering any account-specific questions." ) tool_results = [] for block in response.content: if block.type == "tool_use": result = execute_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result) }) messages.append({"role": "assistant", "content": response.content}) if not tool_results: return next(b.text for b in response.content if b.type == "text") messages.append({"role": "user", "content": tool_results}) return "Max steps reached without a final answer." if __name__ == "__main__": q = "My account is alice@example.com and I can't upload files larger than 10 MB." print(f"User: {q}\n") print(f"Agent: {run_agent(q)}")
Building Agentic AI
The Coordinator. Agentic AI is a system of multiple AI agents working together, orchestrated by a central planner. It spans workflows across teams, tools, and timeframes. It adapts when conditions change and recovers from sub-agent failures without human intervention.
4.1What it actually is
An Agentic AI system has at minimum two layers: an Orchestrator that plans and delegates, and Sub-agents that execute specialised tasks. The orchestrator itself is an LLM whose “tools” are calls to other agents rather than calls to raw APIs.
Each sub-agent can be independently tuned — given its own system prompt, its own knowledge base, its own tools — while the orchestrator maintains the overall goal, handles routing, retries, and consolidates results into a single coherent response.
Watch out for: Agentic AI is the hardest pattern to debug and audit. Errors compound across agents. Instrument every agent call, maintain a full event log, and implement human-in-the-loop checkpoints for high-stakes actions. Start with two agents before building a fleet.
4.2Step-by-step build
We are building: A Clarion multi-agent support system with an orchestrator that routes to a Billing Agent or a Technical Agent, then consolidates and responds.
-
1Design your agent topology
Map out which specialised agents you need and what each owns. Keep agents focused — a billing agent should not know about bug reports.
-
2Implement each sub-agent as a callable function
Each sub-agent is the single-agent loop from Section 3, packaged as a Python function that accepts a task string and returns a result string.
-
3Build the orchestrator
The orchestrator is a Claude call where the “tools” are your sub-agent functions. It decides which agent(s) to invoke, in what order, and how to merge their outputs.
-
4Add observability
Log every orchestrator decision and sub-agent invocation. Track wall time and token usage per request. Set per-agent timeouts.
-
5Define human-in-the-loop checkpoints
For actions above a defined risk threshold (refunding over $X, deleting data, sending external communications), require explicit human confirmation before proceeding.
4.3Code walkthrough
import anthropic, json client = anthropic.Anthropic() # ══ SUB-AGENT 1: Billing Agent ════════════════════════════════════ BILLING_TOOLS = [{ "name": "check_invoice", "description": "Check invoice and payment status for a Clarion account.", "input_schema": { "type": "object", "properties": {"email": {"type": "string"}}, "required": ["email"] } }] def check_invoice(email): return {"invoice": "INV-0042", "amount": "$28", "status": "paid", "due": "2026-05-01"} def billing_agent(task): print(f" [BILLING AGENT] {task}") messages = [{"role": "user", "content": task}] for _ in range(5): r = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=512, system="You are the Clarion billing specialist. Only handle billing and payment questions.", tools=BILLING_TOOLS, messages=messages ) tool_results = [] for b in r.content: if b.type == "tool_use": res = check_invoice(**b.input) tool_results.append({"type": "tool_result", "tool_use_id": b.id, "content": json.dumps(res)}) messages.append({"role": "assistant", "content": r.content}) if not tool_results: return next(b.text for b in r.content if b.type == "text") messages.append({"role": "user", "content": tool_results}) return "Billing agent could not resolve." # ══ SUB-AGENT 2: Technical Agent ══════════════════════════════════ def technical_agent(task): print(f" [TECH AGENT] {task}") r = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=512, system="You are Clarion's technical support specialist. Diagnose technical issues. " "If you cannot resolve, recommend opening a bug report.", messages=[{"role": "user", "content": task}] ) return r.content[0].text # ══ ORCHESTRATOR ══════════════════════════════════════════════════ ORCH_TOOLS = [ { "name": "route_to_billing", "description": "Route to the billing specialist for invoices, payments, or plan changes.", "input_schema": {"type": "object", "properties": {"task": {"type": "string"}}, "required": ["task"]} }, { "name": "route_to_technical", "description": "Route to the technical specialist for bugs, errors, integrations, or performance.", "input_schema": {"type": "object", "properties": {"task": {"type": "string"}}, "required": ["task"]} } ] def orchestrate(user_message): messages = [{"role": "user", "content": user_message}] for _ in range(6): r = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="You are the Clarion support orchestrator. Analyse the user's issue " "and route to the appropriate specialist. You may call multiple agents " "if the issue spans billing and technical concerns.", tools=ORCH_TOOLS, messages=messages ) tool_results = [] for b in r.content: if b.type == "tool_use": result = (billing_agent(b.input["task"]) if b.name == "route_to_billing" else technical_agent(b.input["task"])) tool_results.append({"type": "tool_result", "tool_use_id": b.id, "content": result}) messages.append({"role": "assistant", "content": r.content}) if not tool_results: return next(b.text for b in r.content if b.type == "text") messages.append({"role": "user", "content": tool_results}) return "Orchestrator could not resolve." if __name__ == "__main__": msg = ("Hi, I'm alice@example.com. My last invoice looks wrong AND " "the Jira integration stopped syncing yesterday.") print(f"User: {msg}\n") print(f"System: {orchestrate(msg)}")
Which Should You Use?
The most common mistake practitioners make is reaching for Agentic AI when a plain LLM would do, or stopping at a raw LLM when the use case clearly needs tools. Use this decision framework:
| If your use case is… | Start with | Why |
|---|---|---|
| Summarise meeting notes | LLM | Pure language task, no domain data or action needed |
| Answer questions about your product docs | RAG | Needs private knowledge, no action required |
| Let users check their order status via chat | AI Agent | Needs to query a live database |
| Incident triage across Slack, PagerDuty, and Jira | Agentic AI | Multiple systems, specialised sub-tasks, coordination |
| Generate a personalised email from CRM data | RAG or Agent | Live API → Agent; static export → RAG |
| End-to-end employee onboarding | Agentic AI | Multi-step, multi-system, coordination required |