LangChain vs LangGraph vs Raw OpenAI: How to Choose Your RAG Stack

14.11.2025

You're starting a new RAG project and face a decision that will shape your next 6-12 months: use a framework like LangChain, or build directly with the OpenAI API? The internet offers conflicting advice. X’s threads call LangChain "overkill" and "too much abstraction." Blog posts praise its mature patterns and ecosystem. Your team splits between "let's move fast with the framework" and "we should control our own code."

We faced this exact decision while building an AI document chatbot for a professional knowledge management platform. After evaluating both approaches and living with our choice for 6+ months in production, we have a clear perspective: we chose LangChain + LangGraph, and we'd make the same choice again.

In this article, I'll walk you through the framework decision dilemma and explain what LangChain and LangGraph actually provide. I'll also show when using the raw OpenAI API makes more sense. Then, I’ll share our honest production experience with all three technologies and provide a decision framework to help you choose confidently.

In this article:

What makes choosing between LangChain, LangGraph, and raw OpenAI API so challenging?
LangChain: what it actually provides (and costs)
LangGraph: how does it improve workflow orchestration?
LangSmith: how does it transform observability in your RAG stack?
Raw OpenAI API: when it makes sense
What did we learn after six months running LangChain and LangGraph in production?
LangChain, LangGraph, or Raw OpenAI API for your RAG stack – conclusion
Want to make smarter technology choices for your RAG stack?

What makes choosing between LangChain, LangGraph, and raw OpenAI API so challenging?

The choice between framework and raw API represents fundamentally different philosophies about software development, and the RAG community is deeply divided.

The "just use OpenAI API" argument

This camp values simplicity and control above all.

Maximum control

When you write raw API calls, you see exactly what's happening. No framework magic, no hidden abstractions, no surprise behaviors. Every embedding generation, every vector search, every LLM call is code you wrote and can debug.

Minimal dependencies

Your requirements.txt has 3-5 packages instead of 50. Deployment is lightweight. You're not vulnerable to framework bugs or breaking changes. When OpenAI updates their API, you update your code on your timeline.

Performance

No framework overhead or abstraction layers. Your code runs exactly what you tell it to run, nothing more. Some teams report faster execution without framework processing.

Community validation

Prominent voices in AI development criticize LangChain's complexity. A common criticism: the framework adds layers of abstraction that obscure rather than clarify—style over substance.

The "use a framework" argument

This camp values productivity and proven patterns.

Mature RAG patterns out-of-the-box

Don't reinvent document loading, text splitting, vector search integration, retrieval strategies, prompt management, and streaming responses. Use battle-tested implementations that thousands of developers have refined.

Faster time to production

Start with working RAG in hours, not weeks. Swap vector databases with configuration changes, not rewrites. Test different retrieval strategies without rebuilding your pipeline.

Ecosystem and community

Access hundreds of integrations: vector stores, LLM providers, document loaders, specialized retrievers. Learn from community patterns solving the same problems you'll face.

Built-in observability

Tools like LangSmith provide visibility into LLM interactions that would take weeks to build yourself. See every prompt, response, token usage, and latency in production.

Why does choosing the right RAG stack matter?

The choice of RAG stack sets the foundation for everything that follows in your project — from design decisions to how easily you can evolve later.

Technical debt compounds: if you start with raw API but later need framework capabilities (routing, state management, observability), migration is painful. Conversely, if you adopt a framework but don't need its complexity, you've burdened your team with unnecessary abstractions.
Team productivity impact: frameworks have learning curves but then accelerate development. Raw APIs give immediate clarity but accumulate maintenance burden. The wrong choice affects velocity for months.
Production reliability: frameworks provide error handling, retry logic, and fallback patterns. Raw implementations need to build these. Your choice impacts system robustness and incident frequency.

LangChain: what it actually provides (and costs)

Let's examine what you get (and give up) with LangChain, based on our production experience.

What RAG components come ready-to-use in LangChain?

LangChain's core value is comprehensive RAG building blocks:

Document loaders	Web scraping (AsyncHtmlLoader) PDF parsing (PyPDFLoader) JSON APIs (custom loaders) 100+ other sources
Text splitters	RecursiveCharacterTextSplitter (semantic boundaries) CharacterTextSplitter (simple chunking) Custom splitters for domain-specific content
Embeddings	OpenAI HuggingFace models Cohere embeddings Easy provider switching
Vector store integrations	Elasticsearch (what we use) Pinecone, Weaviate, Chroma, FAISS Unified API across providers Easy migration between stores
Retrievers	Similarity search MMR (Maximal Marginal Relevance) for diversity Multi-query retrieval Ensemble retrieval combining strategies
Prompt templates and chains	Template management with variables Chain composition (retrieve → grade → generate) Memory and conversation management Streaming support

Why we chose LangChain

Our decision came down to four pragmatic factors:

1. Mature RAG capabilities: we didn't want to reinvent retrieval patterns. LangChain's implementations are production-tested and handle edge cases we wouldn't discover until production incidents.

2. Active development: LangChain releases updates weekly. Bugs get fixed quickly. New models and providers appear fast. The project has momentum that suggests longevity.

3. Financial backing: LangChain has strong company backing. This isn't a side project that might be abandoned. For production systems, framework stability matters.

4. LangSmith integration: LangSmith's free developer tier provides observability that would cost person-months to build. This alone nearly justifies the framework choice.

An example of a project in which we used LangChain and LangSmith

Read our full case study: AI Document Chatbot →

What overhead did we accept by using LangChain?

LangChain isn't free, you accept complexity:

Learning curve: understanding LangChain's abstractions (Documents, Retrievers, Chains, Runnables) takes weeks. New team members need onboarding time. Documentation sometimes lags implementation.
Abstractions hide details: sometimes you need to dig into LangChain source code to understand behavior. The "magic" that accelerates development can obscure important details during debugging.
Dependency on framework updates: breaking changes occasionally require code updates. OpenAI API changes flow through LangChain with some delay. You're coupled to the framework's release cycle.
Community criticism: the "LangChain is too complex" sentiment is real. Some developers find the abstractions confusing. The framework has strong opinions that don't fit every use case.
Some performance overhead: framework processing adds microseconds-to-milliseconds per operation. For most RAG applications this is negligible, but high-performance scenarios might notice.

LangGraph: how does it improve workflow orchestration?

LangGraph solves a specific problem that emerges in production RAG systems: complex conditional workflows are hard to build and debug with simple chains.

What LangGraph provides

LangGraph extends LangChain by adding structure and clarity to complex RAG workflows.

State management

Define typed state that flows through your workflow:

from typing import TypedDict, List
from langchain.schema import Document

class ChatbotState(TypedDict):
    question: str
    question_type: str
    confidence: float
    retrieved_docs: List[Document]
    graded_docs: List[Document]
    answer: str

Every node receives state, modifies it, returns updated state. Explicit and debuggable.

Visual workflow definition

Express complex flows as graphs with nodes and edges:

from langgraph.graph import StateGraph, END

workflow = StateGraph(ChatbotState)

# Define nodes
workflow.add_node("classify_question", classify_question)
workflow.add_node("route_question", route_question)
workflow.add_node("retrieve", retrieve_documents)
workflow.add_node("grade_documents", grade_all_documents)
workflow.add_node("generate", generate_answer)

# Conditional routing
workflow.add_conditional_edges(
    "classify_question",
    route_question,
    {
        "generic": "handle_generic",
        "conversational": "handle_conversational",
        "document_search": "retrieve"
    }
)

# Linear flow for document search path
workflow.add_edge("retrieve", "grade_documents")
workflow.add_edge("grade_documents", "generate")
workflow.add_edge("generate", END)

app = workflow.compile()

The workflow structure is explicit and visual. Adding new nodes or routes is straightforward.

Built-in debugging

LangSmith visualizes LangGraph execution:

See which path the query took.
Inspect state at each node.
Identify where failures occur.
Measure latency per node.

What does our real production LangGraph workflow look like?

Here's our actual production flow (simplified a bit by removing cashing):

Start
  ↓
classify_question
  ↓
route_question
  ├→ [generic] → handle_generic → END
  ├→ [conversational] → handle_conversational → END
  └→ [document_search]
       ↓
     generate_search_phrase
       ↓
     retrieve (~20 docs)
       ↓
     grade_documents (select top 12)
       ↓
     generate_answer
       ↓
     END

Why choose LangGraph instead of custom workflow orchestration?

LangGraph offers clear advantages over building orchestration logic from scratch, especially when you need transparency, maintainability, and rapid iteration.

Visual debugging

When a user reports "wrong answer," we open LangSmith and see the exact execution path:

classify_question → "document_search" ✓
retrieve → 20 docs found ✓
grade_documents → 0 docs passed threshold ✗ (found the bug!)

Without LangGraph, this requires extensive logging and manual trace correlation.

Clean state management

State flows explicitly through nodes. No global variables, no implicit dependencies. Each node is a pure function: input state → output state.

Easy modifications

Adding new routing (e.g., "urgent" questions bypass grading) is a configuration change:

workflow.add_conditional_edges(
    "classify_question",
    route_with_urgency,
    {
        "generic": "handle_generic",
        "urgent": "generate_immediately",  # New path
        "document_search": "retrieve"
    }
)

Testable components

Test individual nodes in isolation:

def test_classify_question():
    state = {"question": "What is GDPR?"}
    result = classify_question(state)
    assert result["question_type"] == "document_search"
    assert result["confidence"] > 0.7

What are the pros and cons of using LangGraph in your RAG stack?

Like any framework, LangGraph introduces a balance of clear advantages and practical trade-offs.

Pros	Cons
✅ Visual debugging in LangSmith invaluable	❌ Another abstraction to learn
✅ Clean separation of concerns	❌ Framework-specific patterns
✅ Easy to add new routes/nodes	❌ Some overhead vs. simple functions
✅ State management explicit	❌ Requires understanding graph concepts
✅ Individual nodes testable	❌ Limited documentation for advanced use cases

LangSmith: how does it transform observability in your RAG stack?

LangSmith is why we'd choose LangChain again even if nothing else mattered. The free developer tier provides production-grade observability that would take person-months to build.

What LangSmith provides (free tier)

LangSmith brings production-grade observability to any RAG workflow, giving developers full visibility into how their LLM-powered systems behave.

Complete LLM interaction traces

Every call to any LLM is logged automatically:

Exact prompt sent.
Full response received.
Model and parameters used.
Token usage (input/output breakdown).
Latency measurement.
Success/error status.

Cost tracking

Based on model pricing and actual token usage, LangSmith estimates costs per query. See expensive operations immediately.

LangGraph visualization

Visual representation of workflow execution:

Which nodes ran.
Time spent in each node.
State transitions.
Conditional routing decisions.

Error investigation

Failed queries include full context:

What inputs caused failure.
Which node failed.
Error message and stack trace.
State at time of failure.

Production debugging example with LangSmith

User complaint:

"The chatbot gave me completely irrelevant information about authentication when I asked about data encryption."

Without LangSmith:

Review application logs (generic request/response).
Try to reproduce the issue.
Add more logging.
Deploy new version.
Wait for it to happen again.
Guess at root cause.

With LangSmith:

1. Search traces for the user's query text.

2. Open the trace, see full LangGraph execution:

classify_question → classified as "document_search" ✓
generate_search_phrase → generated: "authentication security" ✗ (wrong!).
retrieve → returned authentication docs (technically correct for wrong search phrase).
grade_documents → passed authentication docs as relevant.
generate → answered about authentication (followed prompt correctly).

3. Root cause found in 2 minutes: the search phrase generation step misunderstood "data encryption" as "authentication."

4. Fix: improve search phrase generation prompt with better examples for encryption queries.

Token usage optimization

LangSmith shows token consumption per node:

Example query breakdown

classify_question: 150 tokens ($0.000375)
grade_documents: 20 calls × 250 tokens = 5,000 tokens ($0.0125)
generate: 15,000 tokens ($0.0375)
Total: 20,150 tokens ($0.050375)

We discovered document grading consumed 25% of costs. This insight led to optimizing grading prompts (shorter, more focused) and implementing caching for common questions.

Why does LangSmith justify choosing LangChain?

Building equivalent observability ourselves would require:

LLM interaction logging system
Token usage tracking
Cost calculation per model
Visual trace viewer
Search and filtering UI
State inspection tools

LangSmith provides this for free (developer tier) or for a low monthly fee (team tier). Even if LangChain had no other benefits, LangSmith justifies using the framework.

Raw OpenAI API: when it makes sense

Despite our choice, raw API is absolutely the right choice for certain projects. Let's be honest about when framework overhead isn't justified.

Use cases where raw API shines

While frameworks simplify complex RAG pipelines, there are plenty of scenarios where working directly with the raw OpenAI API is faster, cleaner, and more practical.

Simple question-answering

If you're building straightforward Q&A without document retrieval:

import openai

def simple_qa(question: str) -> str:
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

No framework needed. Clear, simple, maintainable.

Maximum control requirements

Research projects exploring novel RAG architectures benefit from complete control. Custom embedding strategies, specialized retrieval algorithms, experimental approaches—all easier without framework abstractions.

Minimal dependency constraints

Embedded systems, AWS Lambda functions with size limits, or organizations with strict dependency policies may prohibit frameworks. Raw API gives you exactly what you need, nothing more.

Learning and education

Understanding RAG from first principles means building without frameworks. Educational implementations benefit from seeing every step explicitly.

What are the main advantages of a raw OpenAI implementation?

Using the raw OpenAI API comes with several clear benefits that make it a strong choice for certain teams and projects.

Complete transparency: every line of code is yours. No framework magic. No hidden behavior. Debugging means reading code you wrote.
Zero framework overhead: no abstraction layers, no framework processing. Your code runs exactly what you specified.
No learning curve: team members understand Python and OpenAI API. No framework-specific patterns to learn.

What are the disadvantages of using the raw OpenAI API without a framework?

While using the raw OpenAI API provides full control, it also comes with a number of challenges and limitations that teams need to plan for.

Reinventing patterns:
- Document chunking strategies (respect semantic boundaries).
- Retrieval optimization (MMR, hybrid search).
- Streaming responses (async handling).
- Error handling and retries.
- Prompt template management.
- Conversation memory.
No built-in observability: you'll build logging, tracking, monitoring from scratch. Estimate 2-4 weeks for production-grade observability.
Integration work: every vector database needs custom integration code. Switching providers means rewriting the database layer.
Maintenance burden: all code is your code to maintain. OpenAI API changes require immediate updates. No community contributing fixes.

What did we learn after six months running LangChain and LangGraph in production?

We've run LangChain + LangGraph in production for 6+ months serving real users. Time for brutal honesty.

Would we choose LangChain again?

YES, despite community criticism and its complexity.

Why would we choose LangChain again for our RAG stack?

After several months in production, certain aspects of LangChain proved consistently valuable enough that we’d confidently choose it again for our RAG stack.

1. LangSmith observability: saved us dozens of debugging hours. Cost optimization insights. Production visibility we'd never build ourselves. Worth the framework overhead alone.

2. Faster time to production: working RAG in days, not weeks. Elasticsearch integration immediate. MMR retrieval built-in. Streaming handled automatically. These accelerations compound.

3. Team productivity: new developers onboard faster with established patterns. Community solutions exist for common problems. Framework updates bring improvements we'd never implement.

4. Maintenance reality: feared we'd fight framework limitations. In 6 months: happened twice, both resolved quickly. Framework has been enabled, not constrained.

5. Cost optimization: LangSmith revealed expensive operations. Framework flexibility enabled optimizations (caching, routing) that reduced costs significantly.

Community criticism of LangChain: what we agree with

"LangChain is complex": TRUE

The learning curve is real. Documentation is sometimes confusing. Abstractions take time to understand. Framework is objectively complex.

"Overkill for simple cases": ABSOLUTELY

For basic Q&A or simple prompts, LangChain is unnecessary overhead. Raw API is objectively better for simple use cases.

Community criticism of LangChain: what we disagree with

"Just use OpenAI API": not for complex RAG

This advice applies to simple prompting. For document grading, conditional routing, state management, and observability, the framework provides real value.

"Framework will hold you back": hasn't in our experience

Six months later: every feature we needed was achievable. Framework has been flexible.

"Performance overhead significant": negligible in practice

Our metrics show the framework adds single-digit milliseconds. For RAG where LLM calls dominate latency. Framework overhead is irrelevant.

"Not production-ready": we're in production successfully

Client satisfied. The system is reliable, and handles real user traffic. Framework has been production-grade for our needs.

LangChain, LangGraph, or Raw OpenAI API for your RAG stack – conclusion

After 6 months in production with LangChain, LangGraph, and LangSmith, our verdict is clear: for complex RAG systems, the framework overhead is justified by productivity gains and observability benefits. The community criticism is valid—LangChain is complex, abstractions sometimes obscure, and the learning curve is real. But production realities favor proven patterns and built-in tooling over maximum simplicity.

Our key insight: LangSmith alone justifies using LangChain. Production-grade observability that would take person-months to build is included free. When debugging "why did this query fail?" takes 2 minutes instead of 2 hours, framework overhead becomes irrelevant. Add Elasticsearch integration, MMR retrieval, document chunking strategies, and streaming support, and the productivity gains compound.

The decision isn't ideological; it's pragmatic. Evaluate your complexity, team preferences, timeline pressure, and observability needs. Choose the approach that accelerates your specific project. Both paths lead to working systems.

Want to make smarter technology choices for your RAG stack?

This blog post is based on our production implementation using LangChain, LangGraph, and LangSmith for over six months of serving real users. You can read the AI document chatbot case study, which includes details on technology stack, document grading, response caching, real-time synchronization, and comprehensive observability.

Interested in building or optimizing production-grade RAG systems with the right tools for your use case? Our team specializes in creating AI solutions that combine flexibility, performance, and cost efficiency. Check out our AI development services to learn how we can help you choose and implement the best stack for your project.