LLM vs RAG vs AI Agent vs Agentic AI — Practitioner's Build Guide

Overview

The Four Paradigms

The infographic from GenAI.works captures something practitioners often get confused about: these four terms are not competing products. They are four different architectural patterns for building AI-powered software, each adding a new capability layer on top of the previous.

Think of it as a stack. Every RAG system contains an LLM. Every AI Agent contains either a raw LLM or a RAG system. Every Agentic AI system is composed of multiple Agents. You cannot skip a tier — the tiers build upward.

Agentic AI (Orchestrator)

↳ coordinates

Agent A

·

Agent B

·

Agent C

↳ each agent uses

RAG

+

Tools

↳ built on

LLM (Claude)

This guide walks you through building each tier from scratch using a single running example: a customer support system for a fictional SaaS product, “Clarion”. Keeping the use case constant lets you see exactly what each tier adds — and why you might, or might not, need it.

0.1Quick comparison

Pattern	Uses external data?	Takes actions?	Multiple AI instances?	Complexity
LLM	No	No	No	★☆☆☆
RAG	Yes	No	No	★★☆☆
AI Agent	Optional	Yes	No	★★★☆
Agentic AI	Yes	Yes	Yes	★★★★

⚠️ Prerequisites: All code examples use Python 3.10+ and the anthropic SDK. Install with pip install anthropic. You will need a Claude API key set as the environment variable ANTHROPIC_API_KEY.

Tier 1 · Foundation

Building with LLMs

LLM · Large Language Model

The Thinker. An LLM is a neural network trained on vast amounts of text. You send it a message; it generates a response. That is the entire primitive. Everything else in this guide is built on top of this single interaction.

1.1What it actually is

When you call Claude’s API, you are sending a structured HTTP request containing a list of messages and receiving back a generated completion. The model has no memory, no internet access, no tools — it only knows what you put in the request and what it learned during training.

This makes it ideal for tasks that only require language reasoning: writing, summarising, classifying text, answering general knowledge questions, drafting, explaining concepts, and transforming text from one format to another.

Draft email Summarise meeting notes Answer general Q&A Rewrite content Classify intent Explain a concept

Watch out for: The model has no knowledge of your internal data. It cannot look up a user’s account. It cannot tell you yesterday’s sales figures. It will hallucinate specific facts it does not know. For the Clarion support case, a raw LLM can only answer general product questions — and your specific product was not in its training data.

1.2Step-by-step build

We are building: A basic support chat widget for Clarion. The user types a question; Claude responds using only its general knowledge.

1

Get your API key
Sign up at console.anthropic.com. Copy the key and set it as the environment variable ANTHROPIC_API_KEY — never hardcode it in source files.
2

Write a system prompt
The system prompt defines the AI’s role, tone, and constraints. For support, you want it to be helpful, concise, and honest about what it does not know.
3

Call the Messages API
Pass the system prompt and the user’s message. Read back response.content[0].text.
4

Maintain conversation history
Claude is stateless. For multi-turn conversations you must accumulate all previous messages and pass them with every request.
5

Surface the response
Print it, stream it into a UI, or pipe it to your application layer.

1.3Code walkthrough

Python · llm_basic.py

import anthropic

# 1. Initialise client — reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()

# 2. System prompt: who the assistant is and how it should behave
SYSTEM = """You are a friendly support assistant for Clarion,
a project management SaaS. Answer questions clearly and concisely.
If you do not know a specific answer, say so honestly rather than
guessing. Never fabricate feature names or pricing details."""

# 3. In-memory conversation history (list of message dicts)
history = []

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    # 4. Call the Messages API with full history on every turn
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=SYSTEM,
        messages=history
    )

    assistant_text = response.content[0].text

    # Append assistant turn so next call includes it
    history.append({"role": "assistant", "content": assistant_text})
    return assistant_text

# 5. Simple interactive loop
if __name__ == "__main__":
    print("Clarion Support (LLM only). Type 'quit' to exit.\n")
    while True:
        q = input("You: ")
        if q.lower() == "quit": break
        print(f"\nClarion: {chat(q)}\n")

💡 What you have now: A working chatbot that can answer general questions about software and project management. It cannot answer “Does Clarion integrate with Jira?” accurately because it has no knowledge of your specific product. That is the problem RAG solves.

Tier 2 · Knowledge

Building RAG Systems

RAG · Retrieval-Augmented Generation

The Researcher. RAG is the pattern of retrieving relevant documents from a knowledge base and injecting them into the LLM’s context window before it generates a response. The model’s answer is grounded in your actual data, dramatically reducing hallucination on domain-specific facts.

2.1What it actually is

RAG separates knowledge storage from reasoning. You maintain a searchable collection of documents. At query time, you retrieve the most relevant chunks and prepend them to the user’s question before sending it to Claude. Claude then answers based on that retrieved context.

The key insight: you are not fine-tuning the model. You are using in-context learning — stuffing relevant facts into the prompt at runtime. This means your knowledge base can be updated instantly, without retraining.

User query

→

Vector search

→

Top-K chunks

combined ↓

Query + chunks

→

Claude (LLM)

→

Grounded answer

Internal HR chatbot Product docs Q&A Legal document search Support knowledge base Research assistant

Watch out for: RAG quality depends entirely on retrieval quality. If the relevant chunk is not in your knowledge base, or if your search fails to find it, the model will either hallucinate or say it does not know. A poorly curated knowledge base produces a poorly performing RAG system.

2.2Step-by-step build

We are building: Clarion Support with product knowledge. The bot can now accurately answer “Does Clarion integrate with Jira?” because that answer lives in our help docs.

1

Prepare your knowledge base
Collect your documents: help articles, FAQs, product specs. Split them into chunks of 200–500 tokens each. Overlapping chunks (50-token overlap) improve retrieval continuity.
2

Embed your chunks
Convert each chunk to a vector using an embedding model. Store vectors in a vector database (Pinecone, Weaviate, Chroma, or an in-memory list for prototyping).
3

Embed the query at runtime
Convert the user’s question to a vector using the same embedding model you used at indexing time.
4

Retrieve top-K chunks
Compute cosine similarity between the query vector and all stored vectors. Return the top 3–5 most similar chunks.
5

Augment the prompt
Inject the retrieved chunks into the system or user message. Instruct Claude to answer from this context and to say when it does not have sufficient information.
6

Call Claude and return the grounded answer
The model’s response is now anchored to your actual documentation.

2.3Code walkthrough

For simplicity, this example uses a pure-Python in-memory vector store with word-overlap similarity. In production replace the embedding function with OpenAI, Cohere, or a local model. The RAG logic stays identical.

Python · rag_support.py

import anthropic, math

client = anthropic.Anthropic()

# ── STEP 1: Knowledge base (normally loaded from files or a DB) ──
DOCS = [
    "Clarion integrates with Jira, Asana, Linear, and GitHub via OAuth.",
    "Clarion pricing: Starter $12/user/month, Pro $28/user/month, Enterprise custom.",
    "To reset your password: go to Settings > Security > Reset Password.",
    "Clarion supports SSO via SAML 2.0 on Pro and Enterprise plans.",
    "File upload limit is 50 MB per file on Starter; 500 MB on Pro.",
    "API rate limit is 1,000 requests/hour on Pro; 10,000/hour on Enterprise.",
]

# ── STEP 2: Toy embedding (word-overlap; replace with a real model) ──
def embed(text):
    words = text.lower().split()
    vec = {}
    for w in words: vec[w] = vec.get(w, 0) + 1
    return vec

def cosine(a, b):
    keys = set(a) & set(b)
    dot  = sum(a[k] * b[k] for k in keys)
    mag  = (math.sqrt(sum(v**2 for v in a.values())) *
            math.sqrt(sum(v**2 for v in b.values())))
    return dot / mag if mag else 0

doc_vecs = [(embed(d), d) for d in DOCS]

# ── STEP 3 + 4: Retrieve top-K chunks ──
def retrieve(query, k=3):
    qvec   = embed(query)
    scored = [(cosine(qvec, dv), doc) for dv, doc in doc_vecs]
    scored.sort(reverse=True)
    return [doc for _, doc in scored[:k]]

# ── STEP 5 + 6: Augment and call Claude ──
def rag_chat(question):
    chunks  = retrieve(question)
    context = "\n".join(f"- {c}" for c in chunks)

    prompt = f"""Answer using ONLY the context below.
If the answer is not there, say: "I don't have that in my knowledge base."

Context:
{context}

Question: {question}"""

    r = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    return r.content[0].text

if __name__ == "__main__":
    for q in [
        "Does Clarion integrate with Jira?",
        "What is the file upload limit on the Starter plan?",
        "Can I use my own email domain?",    # not in KB
    ]:
        print(f"Q: {q}\nA: {rag_chat(q)}\n")

💡 What you have now: A support bot that answers accurately about your product and correctly says “I don’t have that information” when a question isn’t in the knowledge base. But it still cannot take action — it cannot look up a user’s account, create a ticket, or send an email. That is what an AI Agent adds.

Tier 3 · Action

Building AI Agents

AI Agent · The Doer

The Doer. An AI Agent is an LLM that has been given tools — callable functions that can read and write to the world. The model decides which tool to call, calls it, reads the result, and continues reasoning until the task is complete. It acts, not just answers.

3.1What it actually is

Claude’s tool use works like this: you describe your available tools as JSON schemas. When Claude determines a tool call is needed, it returns a tool_use content block. Your code executes the function and sends back a tool_result message. Claude continues reasoning — potentially calling more tools — until it produces a final text answer.

This loop — reason → call tool → observe result → reason again — is the agent loop. A single-agent system runs this loop until the task is complete or a maximum iteration limit is reached.

User request

→

Claude (reason)

→

tool_use block

↓ your code runs the function

Claude (reason again)

←

tool_result

↓ loop until stop_reason = end_turn

Final answer (text)

Look up order status Create support ticket Send email notification Update CRM record Run a database query

Watch out for: Tool definitions must be precise. Ambiguous descriptions lead to incorrect calls. Always cap the agent loop at a maximum iteration count, validate tool inputs server-side, and log every tool call and result.

3.2Step-by-step build

We are building: A Clarion support agent that can look up account status and create tickets.

1

Define your tools as JSON schemas
Each tool needs a name, a description (what it does and when to call it), and an input_schema. The description is what Claude reads to decide whether to invoke it — make it precise.
2

Implement the actual tool functions
These are regular Python functions that call your database, REST API, email service — whatever the agent needs.
3

Write the agent loop
Call Claude with tools=your_tools. If the response contains a tool_use block, execute it, append a tool_result, and call Claude again. Repeat until stop_reason == "end_turn" with only text blocks.
4

Set guardrails
Cap the loop at N iterations (10 is a safe start). Log every tool call. Validate inputs before destructive operations.

3.3Code walkthrough

Python · agent_support.py

import anthropic, json

client = anthropic.Anthropic()

# ── STEP 1: Tool definitions ──────────────────────────────────────
TOOLS = [
    {
        "name": "get_account_status",
        "description": "Look up a Clarion account by email. Returns plan, status, and last login.",
        "input_schema": {
            "type": "object",
            "properties": {"email": {"type": "string"}},
            "required": ["email"]
        }
    },
    {
        "name": "create_ticket",
        "description": "Create a support ticket when the issue cannot be resolved in chat.",
        "input_schema": {
            "type": "object",
            "properties": {
                "email":    {"type": "string"},
                "subject":  {"type": "string"},
                "priority": {"type": "string", "enum": ["low", "medium", "high"]}
            },
            "required": ["email", "subject", "priority"]
        }
    }
]

# ── STEP 2: Tool implementations (mocked — replace with real calls) ─
def get_account_status(email):
    db = {
        "alice@example.com": {"plan": "Pro",     "status": "active",    "last_login": "2026-05-02"},
        "bob@example.com":   {"plan": "Starter", "status": "suspended", "last_login": "2026-03-10"},
    }
    return db.get(email, {"error": "Account not found"})

def create_ticket(email, subject, priority):
    ticket_id = f"TKT-{abs(hash(email+subject)) % 10000:04d}"
    print(f"  [TICKET CREATED] {ticket_id}: {subject} ({priority}) for {email}")
    return {"ticket_id": ticket_id, "status": "created"}

def execute_tool(name, inputs):
    if   name == "get_account_status": return get_account_status(**inputs)
    elif name == "create_ticket":      return create_ticket(**inputs)
    raise ValueError(f"Unknown tool: {name}")

# ── STEP 3: The agent loop ────────────────────────────────────────
def run_agent(user_message):
    messages = [{"role": "user", "content": user_message}]

    for step in range(10):                        # STEP 4: max 10 iterations
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
            system="You are a Clarion support agent. Use get_account_status "
                   "before answering any account-specific questions."
        )

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     json.dumps(result)
                })

        messages.append({"role": "assistant", "content": response.content})

        if not tool_results:
            return next(b.text for b in response.content if b.type == "text")

        messages.append({"role": "user", "content": tool_results})

    return "Max steps reached without a final answer."

if __name__ == "__main__":
    q = "My account is alice@example.com and I can't upload files larger than 10 MB."
    print(f"User: {q}\n")
    print(f"Agent: {run_agent(q)}")

💡 What you have now: An agent that looks up Alice’s account (discovering she’s on Pro, where 500 MB is allowed), reasons about the discrepancy, and creates a ticket if needed — all within a single conversation turn. But what if the support workload involves specialised sub-tasks like billing vs. technical, spanning multiple systems? That is where Agentic AI comes in.

Tier 4 · Orchestration

Building Agentic AI

Agentic AI · The Coordinator

The Coordinator. Agentic AI is a system of multiple AI agents working together, orchestrated by a central planner. It spans workflows across teams, tools, and timeframes. It adapts when conditions change and recovers from sub-agent failures without human intervention.

4.1What it actually is

An Agentic AI system has at minimum two layers: an Orchestrator that plans and delegates, and Sub-agents that execute specialised tasks. The orchestrator itself is an LLM whose “tools” are calls to other agents rather than calls to raw APIs.

Each sub-agent can be independently tuned — given its own system prompt, its own knowledge base, its own tools — while the orchestrator maintains the overall goal, handles routing, retries, and consolidates results into a single coherent response.

Inbound request

→

Orchestrator (Claude)

↓ routes to specialist agents

Billing Agent

+

Technical Agent

+

Escalation Agent

↑ results merged by orchestrator

Orchestrator → final unified response

Multi-team incident response End-to-end hiring pipeline Automated research + report Cross-system data migration Autonomous QA pipeline

Watch out for: Agentic AI is the hardest pattern to debug and audit. Errors compound across agents. Instrument every agent call, maintain a full event log, and implement human-in-the-loop checkpoints for high-stakes actions. Start with two agents before building a fleet.

4.2Step-by-step build

We are building: A Clarion multi-agent support system with an orchestrator that routes to a Billing Agent or a Technical Agent, then consolidates and responds.

1

Design your agent topology
Map out which specialised agents you need and what each owns. Keep agents focused — a billing agent should not know about bug reports.
2

Implement each sub-agent as a callable function
Each sub-agent is the single-agent loop from Section 3, packaged as a Python function that accepts a task string and returns a result string.
3

Build the orchestrator
The orchestrator is a Claude call where the “tools” are your sub-agent functions. It decides which agent(s) to invoke, in what order, and how to merge their outputs.
4

Add observability
Log every orchestrator decision and sub-agent invocation. Track wall time and token usage per request. Set per-agent timeouts.
5

Define human-in-the-loop checkpoints
For actions above a defined risk threshold (refunding over $X, deleting data, sending external communications), require explicit human confirmation before proceeding.

4.3Code walkthrough

Python · agentic_support.py

import anthropic, json

client = anthropic.Anthropic()

# ══ SUB-AGENT 1: Billing Agent ════════════════════════════════════
BILLING_TOOLS = [{
    "name": "check_invoice",
    "description": "Check invoice and payment status for a Clarion account.",
    "input_schema": {
        "type": "object",
        "properties": {"email": {"type": "string"}},
        "required": ["email"]
    }
}]

def check_invoice(email):
    return {"invoice": "INV-0042", "amount": "$28",
            "status": "paid", "due": "2026-05-01"}

def billing_agent(task):
    print(f"  [BILLING AGENT] {task}")
    messages = [{"role": "user", "content": task}]
    for _ in range(5):
        r = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=512,
            system="You are the Clarion billing specialist. Only handle billing and payment questions.",
            tools=BILLING_TOOLS, messages=messages
        )
        tool_results = []
        for b in r.content:
            if b.type == "tool_use":
                res = check_invoice(**b.input)
                tool_results.append({"type": "tool_result",
                                     "tool_use_id": b.id, "content": json.dumps(res)})
        messages.append({"role": "assistant", "content": r.content})
        if not tool_results:
            return next(b.text for b in r.content if b.type == "text")
        messages.append({"role": "user", "content": tool_results})
    return "Billing agent could not resolve."

# ══ SUB-AGENT 2: Technical Agent ══════════════════════════════════
def technical_agent(task):
    print(f"  [TECH AGENT] {task}")
    r = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=512,
        system="You are Clarion's technical support specialist. Diagnose technical issues. "
               "If you cannot resolve, recommend opening a bug report.",
        messages=[{"role": "user", "content": task}]
    )
    return r.content[0].text

# ══ ORCHESTRATOR ══════════════════════════════════════════════════
ORCH_TOOLS = [
    {
        "name": "route_to_billing",
        "description": "Route to the billing specialist for invoices, payments, or plan changes.",
        "input_schema": {"type": "object",
                         "properties": {"task": {"type": "string"}},
                         "required": ["task"]}
    },
    {
        "name": "route_to_technical",
        "description": "Route to the technical specialist for bugs, errors, integrations, or performance.",
        "input_schema": {"type": "object",
                         "properties": {"task": {"type": "string"}},
                         "required": ["task"]}
    }
]

def orchestrate(user_message):
    messages = [{"role": "user", "content": user_message}]

    for _ in range(6):
        r = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=1024,
            system="You are the Clarion support orchestrator. Analyse the user's issue "
                   "and route to the appropriate specialist. You may call multiple agents "
                   "if the issue spans billing and technical concerns.",
            tools=ORCH_TOOLS, messages=messages
        )
        tool_results = []
        for b in r.content:
            if b.type == "tool_use":
                result = (billing_agent(b.input["task"]) if b.name == "route_to_billing"
                          else technical_agent(b.input["task"]))
                tool_results.append({"type": "tool_result",
                                     "tool_use_id": b.id, "content": result})
        messages.append({"role": "assistant", "content": r.content})
        if not tool_results:
            return next(b.text for b in r.content if b.type == "text")
        messages.append({"role": "user", "content": tool_results})
    return "Orchestrator could not resolve."

if __name__ == "__main__":
    msg = ("Hi, I'm alice@example.com. My last invoice looks wrong AND "
           "the Jira integration stopped syncing yesterday.")
    print(f"User: {msg}\n")
    print(f"System: {orchestrate(msg)}")

💡 What you have now: The orchestrator reads Alice’s message, recognises two distinct issues, routes to both the Billing Agent and the Technical Agent, collects both results, and synthesises a single coherent response. Claude supports parallel tool use, so both sub-agents can be invoked in the same turn. You have a genuine multi-agent system.

Decision Guide

Which Should You Use?

The most common mistake practitioners make is reaching for Agentic AI when a plain LLM would do, or stopping at a raw LLM when the use case clearly needs tools. Use this decision framework:

START HERE ↓

Does the task require acting on live systems or external data?

NO ↓

YES →

AI Agent (Tier 3)

+ multiple sub-tasks? →

Agentic AI (Tier 4)

Does the task require private or domain-specific knowledge?

NO ↓

YES →

RAG (Tier 2)

Plain LLM (Tier 1)

If your use case is…	Start with	Why
Summarise meeting notes	LLM	Pure language task, no domain data or action needed
Answer questions about your product docs	RAG	Needs private knowledge, no action required
Let users check their order status via chat	AI Agent	Needs to query a live database
Incident triage across Slack, PagerDuty, and Jira	Agentic AI	Multiple systems, specialised sub-tasks, coordination
Generate a personalised email from CRM data	RAG or Agent	Live API → Agent; static export → RAG
End-to-end employee onboarding	Agentic AI	Multi-step, multi-system, coordination required

🧭 The practitioner’s rule: Build the simplest tier that solves the problem. Every tier you add multiplies operational complexity, latency, and cost. A plain LLM call costs milliseconds and fractions of a cent. An Agentic AI workflow can take minutes and dozens of API calls. Match the architecture to the actual requirements, not to what sounds impressive.

Contents

LLM vs RAG vs AI Agentvs Agentic AI

The Four Paradigms

0.1Quick comparison

Building with LLMs

1.1What it actually is

1.2Step-by-step build

1.3Code walkthrough

Building RAG Systems

2.1What it actually is

2.2Step-by-step build

2.3Code walkthrough

Building AI Agents

3.1What it actually is

3.2Step-by-step build

3.3Code walkthrough

Building Agentic AI

4.1What it actually is

4.2Step-by-step build

4.3Code walkthrough

Which Should You Use?

LLM vs RAG vs AI Agent
vs Agentic AI