The Four Paradigms

The infographic from GenAI.works captures something practitioners often get confused about: these four terms are not competing products. They are four different architectural patterns for building AI-powered software, each adding a new capability layer on top of the previous.

Think of it as a stack. Every RAG system contains an LLM. Every AI Agent contains either a raw LLM or a RAG system. Every Agentic AI system is composed of multiple Agents. You cannot skip a tier — the tiers build upward.

Agentic AI (Orchestrator)
↳ coordinates
Agent A
·
Agent B
·
Agent C
↳ each agent uses
RAG
+
Tools
↳ built on
LLM (Claude)

This guide walks you through building each tier from scratch using a single running example: a customer support system for a fictional SaaS product, “Clarion”. Keeping the use case constant lets you see exactly what each tier adds — and why you might, or might not, need it.

0.1Quick comparison

Pattern Uses external data? Takes actions? Multiple AI instances? Complexity
LLMNoNoNo★☆☆☆
RAGYesNoNo★★☆☆
AI AgentOptionalYesNo★★★☆
Agentic AIYesYesYes★★★★
⚠️ Prerequisites: All code examples use Python 3.10+ and the anthropic SDK. Install with pip install anthropic. You will need a Claude API key set as the environment variable ANTHROPIC_API_KEY.

Building with LLMs

LLM · Large Language Model

The Thinker. An LLM is a neural network trained on vast amounts of text. You send it a message; it generates a response. That is the entire primitive. Everything else in this guide is built on top of this single interaction.

1.1What it actually is

When you call Claude’s API, you are sending a structured HTTP request containing a list of messages and receiving back a generated completion. The model has no memory, no internet access, no tools — it only knows what you put in the request and what it learned during training.

This makes it ideal for tasks that only require language reasoning: writing, summarising, classifying text, answering general knowledge questions, drafting, explaining concepts, and transforming text from one format to another.

Draft email Summarise meeting notes Answer general Q&A Rewrite content Classify intent Explain a concept

Watch out for: The model has no knowledge of your internal data. It cannot look up a user’s account. It cannot tell you yesterday’s sales figures. It will hallucinate specific facts it does not know. For the Clarion support case, a raw LLM can only answer general product questions — and your specific product was not in its training data.

1.2Step-by-step build

We are building: A basic support chat widget for Clarion. The user types a question; Claude responds using only its general knowledge.

  • 1
    Get your API key

    Sign up at console.anthropic.com. Copy the key and set it as the environment variable ANTHROPIC_API_KEY — never hardcode it in source files.

  • 2
    Write a system prompt

    The system prompt defines the AI’s role, tone, and constraints. For support, you want it to be helpful, concise, and honest about what it does not know.

  • 3
    Call the Messages API

    Pass the system prompt and the user’s message. Read back response.content[0].text.

  • 4
    Maintain conversation history

    Claude is stateless. For multi-turn conversations you must accumulate all previous messages and pass them with every request.

  • 5
    Surface the response

    Print it, stream it into a UI, or pipe it to your application layer.

1.3Code walkthrough

Python · llm_basic.py
import anthropic

# 1. Initialise client — reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()

# 2. System prompt: who the assistant is and how it should behave
SYSTEM = """You are a friendly support assistant for Clarion,
a project management SaaS. Answer questions clearly and concisely.
If you do not know a specific answer, say so honestly rather than
guessing. Never fabricate feature names or pricing details."""

# 3. In-memory conversation history (list of message dicts)
history = []

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    # 4. Call the Messages API with full history on every turn
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=SYSTEM,
        messages=history
    )

    assistant_text = response.content[0].text

    # Append assistant turn so next call includes it
    history.append({"role": "assistant", "content": assistant_text})
    return assistant_text

# 5. Simple interactive loop
if __name__ == "__main__":
    print("Clarion Support (LLM only). Type 'quit' to exit.\n")
    while True:
        q = input("You: ")
        if q.lower() == "quit": break
        print(f"\nClarion: {chat(q)}\n")
💡 What you have now: A working chatbot that can answer general questions about software and project management. It cannot answer “Does Clarion integrate with Jira?” accurately because it has no knowledge of your specific product. That is the problem RAG solves.

Building RAG Systems

RAG · Retrieval-Augmented Generation

The Researcher. RAG is the pattern of retrieving relevant documents from a knowledge base and injecting them into the LLM’s context window before it generates a response. The model’s answer is grounded in your actual data, dramatically reducing hallucination on domain-specific facts.

2.1What it actually is

RAG separates knowledge storage from reasoning. You maintain a searchable collection of documents. At query time, you retrieve the most relevant chunks and prepend them to the user’s question before sending it to Claude. Claude then answers based on that retrieved context.

The key insight: you are not fine-tuning the model. You are using in-context learning — stuffing relevant facts into the prompt at runtime. This means your knowledge base can be updated instantly, without retraining.

User query
Vector search
Top-K chunks
combined ↓
Query + chunks
Claude (LLM)
Grounded answer
Internal HR chatbot Product docs Q&A Legal document search Support knowledge base Research assistant

Watch out for: RAG quality depends entirely on retrieval quality. If the relevant chunk is not in your knowledge base, or if your search fails to find it, the model will either hallucinate or say it does not know. A poorly curated knowledge base produces a poorly performing RAG system.

2.2Step-by-step build

We are building: Clarion Support with product knowledge. The bot can now accurately answer “Does Clarion integrate with Jira?” because that answer lives in our help docs.

  • 1
    Prepare your knowledge base

    Collect your documents: help articles, FAQs, product specs. Split them into chunks of 200–500 tokens each. Overlapping chunks (50-token overlap) improve retrieval continuity.

  • 2
    Embed your chunks

    Convert each chunk to a vector using an embedding model. Store vectors in a vector database (Pinecone, Weaviate, Chroma, or an in-memory list for prototyping).

  • 3
    Embed the query at runtime

    Convert the user’s question to a vector using the same embedding model you used at indexing time.

  • 4
    Retrieve top-K chunks

    Compute cosine similarity between the query vector and all stored vectors. Return the top 3–5 most similar chunks.

  • 5
    Augment the prompt

    Inject the retrieved chunks into the system or user message. Instruct Claude to answer from this context and to say when it does not have sufficient information.

  • 6
    Call Claude and return the grounded answer

    The model’s response is now anchored to your actual documentation.

2.3Code walkthrough

For simplicity, this example uses a pure-Python in-memory vector store with word-overlap similarity. In production replace the embedding function with OpenAI, Cohere, or a local model. The RAG logic stays identical.

Python · rag_support.py
import anthropic, math

client = anthropic.Anthropic()

# ── STEP 1: Knowledge base (normally loaded from files or a DB) ──
DOCS = [
    "Clarion integrates with Jira, Asana, Linear, and GitHub via OAuth.",
    "Clarion pricing: Starter $12/user/month, Pro $28/user/month, Enterprise custom.",
    "To reset your password: go to Settings > Security > Reset Password.",
    "Clarion supports SSO via SAML 2.0 on Pro and Enterprise plans.",
    "File upload limit is 50 MB per file on Starter; 500 MB on Pro.",
    "API rate limit is 1,000 requests/hour on Pro; 10,000/hour on Enterprise.",
]

# ── STEP 2: Toy embedding (word-overlap; replace with a real model) ──
def embed(text):
    words = text.lower().split()
    vec = {}
    for w in words: vec[w] = vec.get(w, 0) + 1
    return vec

def cosine(a, b):
    keys = set(a) & set(b)
    dot  = sum(a[k] * b[k] for k in keys)
    mag  = (math.sqrt(sum(v**2 for v in a.values())) *
            math.sqrt(sum(v**2 for v in b.values())))
    return dot / mag if mag else 0

doc_vecs = [(embed(d), d) for d in DOCS]

# ── STEP 3 + 4: Retrieve top-K chunks ──
def retrieve(query, k=3):
    qvec   = embed(query)
    scored = [(cosine(qvec, dv), doc) for dv, doc in doc_vecs]
    scored.sort(reverse=True)
    return [doc for _, doc in scored[:k]]

# ── STEP 5 + 6: Augment and call Claude ──
def rag_chat(question):
    chunks  = retrieve(question)
    context = "\n".join(f"- {c}" for c in chunks)

    prompt = f"""Answer using ONLY the context below.
If the answer is not there, say: "I don't have that in my knowledge base."

Context:
{context}

Question: {question}"""

    r = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    return r.content[0].text

if __name__ == "__main__":
    for q in [
        "Does Clarion integrate with Jira?",
        "What is the file upload limit on the Starter plan?",
        "Can I use my own email domain?",    # not in KB
    ]:
        print(f"Q: {q}\nA: {rag_chat(q)}\n")
💡 What you have now: A support bot that answers accurately about your product and correctly says “I don’t have that information” when a question isn’t in the knowledge base. But it still cannot take action — it cannot look up a user’s account, create a ticket, or send an email. That is what an AI Agent adds.

Building AI Agents

AI Agent · The Doer

The Doer. An AI Agent is an LLM that has been given tools — callable functions that can read and write to the world. The model decides which tool to call, calls it, reads the result, and continues reasoning until the task is complete. It acts, not just answers.

3.1What it actually is

Claude’s tool use works like this: you describe your available tools as JSON schemas. When Claude determines a tool call is needed, it returns a tool_use content block. Your code executes the function and sends back a tool_result message. Claude continues reasoning — potentially calling more tools — until it produces a final text answer.

This loop — reason → call tool → observe result → reason again — is the agent loop. A single-agent system runs this loop until the task is complete or a maximum iteration limit is reached.

User request
Claude (reason)
tool_use block
↓ your code runs the function
Claude (reason again)
tool_result
↓ loop until stop_reason = end_turn
Final answer (text)
Look up order status Create support ticket Send email notification Update CRM record Run a database query

Watch out for: Tool definitions must be precise. Ambiguous descriptions lead to incorrect calls. Always cap the agent loop at a maximum iteration count, validate tool inputs server-side, and log every tool call and result.

3.2Step-by-step build

We are building: A Clarion support agent that can look up account status and create tickets.

  • 1
    Define your tools as JSON schemas

    Each tool needs a name, a description (what it does and when to call it), and an input_schema. The description is what Claude reads to decide whether to invoke it — make it precise.

  • 2
    Implement the actual tool functions

    These are regular Python functions that call your database, REST API, email service — whatever the agent needs.

  • 3
    Write the agent loop

    Call Claude with tools=your_tools. If the response contains a tool_use block, execute it, append a tool_result, and call Claude again. Repeat until stop_reason == "end_turn" with only text blocks.

  • 4
    Set guardrails

    Cap the loop at N iterations (10 is a safe start). Log every tool call. Validate inputs before destructive operations.

3.3Code walkthrough

Python · agent_support.py
import anthropic, json

client = anthropic.Anthropic()

# ── STEP 1: Tool definitions ──────────────────────────────────────
TOOLS = [
    {
        "name": "get_account_status",
        "description": "Look up a Clarion account by email. Returns plan, status, and last login.",
        "input_schema": {
            "type": "object",
            "properties": {"email": {"type": "string"}},
            "required": ["email"]
        }
    },
    {
        "name": "create_ticket",
        "description": "Create a support ticket when the issue cannot be resolved in chat.",
        "input_schema": {
            "type": "object",
            "properties": {
                "email":    {"type": "string"},
                "subject":  {"type": "string"},
                "priority": {"type": "string", "enum": ["low", "medium", "high"]}
            },
            "required": ["email", "subject", "priority"]
        }
    }
]

# ── STEP 2: Tool implementations (mocked — replace with real calls) ─
def get_account_status(email):
    db = {
        "alice@example.com": {"plan": "Pro",     "status": "active",    "last_login": "2026-05-02"},
        "bob@example.com":   {"plan": "Starter", "status": "suspended", "last_login": "2026-03-10"},
    }
    return db.get(email, {"error": "Account not found"})

def create_ticket(email, subject, priority):
    ticket_id = f"TKT-{abs(hash(email+subject)) % 10000:04d}"
    print(f"  [TICKET CREATED] {ticket_id}: {subject} ({priority}) for {email}")
    return {"ticket_id": ticket_id, "status": "created"}

def execute_tool(name, inputs):
    if   name == "get_account_status": return get_account_status(**inputs)
    elif name == "create_ticket":      return create_ticket(**inputs)
    raise ValueError(f"Unknown tool: {name}")

# ── STEP 3: The agent loop ────────────────────────────────────────
def run_agent(user_message):
    messages = [{"role": "user", "content": user_message}]

    for step in range(10):                        # STEP 4: max 10 iterations
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
            system="You are a Clarion support agent. Use get_account_status "
                   "before answering any account-specific questions."
        )

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     json.dumps(result)
                })

        messages.append({"role": "assistant", "content": response.content})

        if not tool_results:
            return next(b.text for b in response.content if b.type == "text")

        messages.append({"role": "user", "content": tool_results})

    return "Max steps reached without a final answer."

if __name__ == "__main__":
    q = "My account is alice@example.com and I can't upload files larger than 10 MB."
    print(f"User: {q}\n")
    print(f"Agent: {run_agent(q)}")
💡 What you have now: An agent that looks up Alice’s account (discovering she’s on Pro, where 500 MB is allowed), reasons about the discrepancy, and creates a ticket if needed — all within a single conversation turn. But what if the support workload involves specialised sub-tasks like billing vs. technical, spanning multiple systems? That is where Agentic AI comes in.

Building Agentic AI

Agentic AI · The Coordinator

The Coordinator. Agentic AI is a system of multiple AI agents working together, orchestrated by a central planner. It spans workflows across teams, tools, and timeframes. It adapts when conditions change and recovers from sub-agent failures without human intervention.

4.1What it actually is

An Agentic AI system has at minimum two layers: an Orchestrator that plans and delegates, and Sub-agents that execute specialised tasks. The orchestrator itself is an LLM whose “tools” are calls to other agents rather than calls to raw APIs.

Each sub-agent can be independently tuned — given its own system prompt, its own knowledge base, its own tools — while the orchestrator maintains the overall goal, handles routing, retries, and consolidates results into a single coherent response.

Inbound request
Orchestrator (Claude)
↓ routes to specialist agents
Billing Agent
+
Technical Agent
+
Escalation Agent
↑ results merged by orchestrator
Orchestrator → final unified response
Multi-team incident response End-to-end hiring pipeline Automated research + report Cross-system data migration Autonomous QA pipeline

Watch out for: Agentic AI is the hardest pattern to debug and audit. Errors compound across agents. Instrument every agent call, maintain a full event log, and implement human-in-the-loop checkpoints for high-stakes actions. Start with two agents before building a fleet.

4.2Step-by-step build

We are building: A Clarion multi-agent support system with an orchestrator that routes to a Billing Agent or a Technical Agent, then consolidates and responds.

  • 1
    Design your agent topology

    Map out which specialised agents you need and what each owns. Keep agents focused — a billing agent should not know about bug reports.

  • 2
    Implement each sub-agent as a callable function

    Each sub-agent is the single-agent loop from Section 3, packaged as a Python function that accepts a task string and returns a result string.

  • 3
    Build the orchestrator

    The orchestrator is a Claude call where the “tools” are your sub-agent functions. It decides which agent(s) to invoke, in what order, and how to merge their outputs.

  • 4
    Add observability

    Log every orchestrator decision and sub-agent invocation. Track wall time and token usage per request. Set per-agent timeouts.

  • 5
    Define human-in-the-loop checkpoints

    For actions above a defined risk threshold (refunding over $X, deleting data, sending external communications), require explicit human confirmation before proceeding.

4.3Code walkthrough

Python · agentic_support.py
import anthropic, json

client = anthropic.Anthropic()

# ══ SUB-AGENT 1: Billing Agent ════════════════════════════════════
BILLING_TOOLS = [{
    "name": "check_invoice",
    "description": "Check invoice and payment status for a Clarion account.",
    "input_schema": {
        "type": "object",
        "properties": {"email": {"type": "string"}},
        "required": ["email"]
    }
}]

def check_invoice(email):
    return {"invoice": "INV-0042", "amount": "$28",
            "status": "paid", "due": "2026-05-01"}

def billing_agent(task):
    print(f"  [BILLING AGENT] {task}")
    messages = [{"role": "user", "content": task}]
    for _ in range(5):
        r = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=512,
            system="You are the Clarion billing specialist. Only handle billing and payment questions.",
            tools=BILLING_TOOLS, messages=messages
        )
        tool_results = []
        for b in r.content:
            if b.type == "tool_use":
                res = check_invoice(**b.input)
                tool_results.append({"type": "tool_result",
                                     "tool_use_id": b.id, "content": json.dumps(res)})
        messages.append({"role": "assistant", "content": r.content})
        if not tool_results:
            return next(b.text for b in r.content if b.type == "text")
        messages.append({"role": "user", "content": tool_results})
    return "Billing agent could not resolve."

# ══ SUB-AGENT 2: Technical Agent ══════════════════════════════════
def technical_agent(task):
    print(f"  [TECH AGENT] {task}")
    r = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=512,
        system="You are Clarion's technical support specialist. Diagnose technical issues. "
               "If you cannot resolve, recommend opening a bug report.",
        messages=[{"role": "user", "content": task}]
    )
    return r.content[0].text

# ══ ORCHESTRATOR ══════════════════════════════════════════════════
ORCH_TOOLS = [
    {
        "name": "route_to_billing",
        "description": "Route to the billing specialist for invoices, payments, or plan changes.",
        "input_schema": {"type": "object",
                         "properties": {"task": {"type": "string"}},
                         "required": ["task"]}
    },
    {
        "name": "route_to_technical",
        "description": "Route to the technical specialist for bugs, errors, integrations, or performance.",
        "input_schema": {"type": "object",
                         "properties": {"task": {"type": "string"}},
                         "required": ["task"]}
    }
]

def orchestrate(user_message):
    messages = [{"role": "user", "content": user_message}]

    for _ in range(6):
        r = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=1024,
            system="You are the Clarion support orchestrator. Analyse the user's issue "
                   "and route to the appropriate specialist. You may call multiple agents "
                   "if the issue spans billing and technical concerns.",
            tools=ORCH_TOOLS, messages=messages
        )
        tool_results = []
        for b in r.content:
            if b.type == "tool_use":
                result = (billing_agent(b.input["task"]) if b.name == "route_to_billing"
                          else technical_agent(b.input["task"]))
                tool_results.append({"type": "tool_result",
                                     "tool_use_id": b.id, "content": result})
        messages.append({"role": "assistant", "content": r.content})
        if not tool_results:
            return next(b.text for b in r.content if b.type == "text")
        messages.append({"role": "user", "content": tool_results})
    return "Orchestrator could not resolve."

if __name__ == "__main__":
    msg = ("Hi, I'm alice@example.com. My last invoice looks wrong AND "
           "the Jira integration stopped syncing yesterday.")
    print(f"User: {msg}\n")
    print(f"System: {orchestrate(msg)}")
💡 What you have now: The orchestrator reads Alice’s message, recognises two distinct issues, routes to both the Billing Agent and the Technical Agent, collects both results, and synthesises a single coherent response. Claude supports parallel tool use, so both sub-agents can be invoked in the same turn. You have a genuine multi-agent system.

Which Should You Use?

The most common mistake practitioners make is reaching for Agentic AI when a plain LLM would do, or stopping at a raw LLM when the use case clearly needs tools. Use this decision framework:

START HERE ↓
Does the task require acting on live systems or external data?
NO ↓
YES →
AI Agent (Tier 3)
+ multiple sub-tasks? →
Agentic AI (Tier 4)
Does the task require private or domain-specific knowledge?
NO ↓
YES →
RAG (Tier 2)
Plain LLM (Tier 1)
If your use case is… Start with Why
Summarise meeting notesLLMPure language task, no domain data or action needed
Answer questions about your product docsRAGNeeds private knowledge, no action required
Let users check their order status via chatAI AgentNeeds to query a live database
Incident triage across Slack, PagerDuty, and JiraAgentic AIMultiple systems, specialised sub-tasks, coordination
Generate a personalised email from CRM dataRAG or AgentLive API → Agent; static export → RAG
End-to-end employee onboardingAgentic AIMulti-step, multi-system, coordination required
🧭 The practitioner’s rule: Build the simplest tier that solves the problem. Every tier you add multiplies operational complexity, latency, and cost. A plain LLM call costs milliseconds and fractions of a cent. An Agentic AI workflow can take minutes and dozens of API calls. Match the architecture to the actual requirements, not to what sounds impressive.