ai-orchestration

CHAPTER 55 / 59

읽기 약 2분

FUNCTION

스트리밍과 실시간 응답

핵심 개념

LangChain 스트리밍 API·SSE·토큰 단위 출력 — FastAPI 실시간 챗봇.

본문

스트리밍의 가치

📋 코드 (7줄)

일반 응답:
사용자 → 요청 → ... 8초 대기 ... → 전체 응답 받음
사용자 체감: "느려"

스트리밍:
사용자 → 요청 → 토큰 1 (0.1초) → 토큰 2 (0.2초) → ...
사용자 체감: "빠르다, ChatGPT처럼"

LangChain 스트리밍 기본

PYTHON📋 코드 (11줄)

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    streaming=True,
)

# stream() — 토큰 단위 iterator
for chunk in llm.stream("Python 반복문 5종을 비교해줘"):
    print(chunk.content, end="", flush=True)
# 토큰 단위로 즉시 출력 (ChatGPT 같은 경험)

astream() — 비동기

PYTHON📋 코드 (8줄)

import asyncio

async def stream_response(query):
    async for chunk in llm.astream(query):
        print(chunk.content, end="", flush=True)


asyncio.run(stream_response("LangGraph 시작 방법은?"))

FastAPI + SSE 스트리밍

PYTHON📋 코드 (27줄)

# pip install fastapi uvicorn
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_anthropic import ChatAnthropic

app = FastAPI()
llm = ChatAnthropic(model="claude-sonnet-4-6", streaming=True)


@app.get("/chat")
async def chat(q: str):
    async def generate():
        async for chunk in llm.astream(q):
            # SSE 형식: "data: <content>\n\n"
            yield f"data: {chunk.content}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
    )


# 실행: uvicorn app:app --reload
# 클라이언트 (브라우저):
# const evt = new EventSource("/chat?q=Hello");
# evt.onmessage = (e) => console.log(e.data);

React 클라이언트

TYPESCRIPT📋 코드 (36줄)

// 클라이언트 측 스트리밍 처리
'use client';
import { useState } from 'react';

export function ChatStream() {
  const [response, setResponse] = useState('');
  const [loading, setLoading] = useState(false);

  const send = async (q: string) => {
    setLoading(true);
    setResponse('');

    const evt = new EventSource(`/chat?q=${encodeURIComponent(q)}`);
    evt.onmessage = (e) => {
      if (e.data === '[DONE]') {
        evt.close();
        setLoading(false);
        return;
      }
      setResponse(prev => prev + e.data);
    };
    evt.onerror = () => {
      evt.close();
      setLoading(false);
    };
  };

  return (
    <div>
      <button onClick={() => send('Hello')} disabled={loading}>
        {loading ? '응답 중...' : '시작'}
      </button>
      <div>{response}</div>
    </div>
  );
}

RAG 체인 스트리밍

PYTHON📋 코드 (19줄)

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
컨텍스트: {context}
질문: {question}
답변:""")

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 체인 자체도 stream()
async for chunk in chain.astream("LangChain 시작 방법?"):
    print(chunk, end="", flush=True)

스트리밍 토큰 사용량 추적

PYTHON📋 코드 (19줄)

from langchain_core.callbacks import AsyncCallbackHandler

class TokenCounter(AsyncCallbackHandler):
    def __init__(self):
        self.tokens = 0

    async def on_llm_new_token(self, token: str, **kwargs):
        self.tokens += 1


counter = TokenCounter()

async for chunk in llm.astream(
    "LangChain은 무엇?",
    callbacks=[counter],
):
    print(chunk.content, end="")

print(f"\n총 토큰: {counter.tokens}")

에러 처리

PYTHON📋 코드 (19줄)

async def safe_stream(query):
    try:
        async for chunk in llm.astream(query):
            yield chunk.content
    except Exception as e:
        yield f"\n\n[에러: {e}]"


# FastAPI에서 SSE 종료 처리
async def generate_safe(q):
    try:
        async for chunk in llm.astream(q):
            yield f"data: {chunk.content}\n\n"
        yield "data: [DONE]\n\n"
    except Exception as e:
        yield f"data: [ERROR: {e}]\n\n"
    finally:
        # 클린업
        pass

다음 챕터

CH.7 "모니터링: LangSmith vs Langfuse" — 프로덕션 LLM의 눈.

AI 프롬프트

🤖 AI에게 잘 물어보는 법 — 모델·전략별 프롬프트

무료 모델

Gemini 2.5 Flash(무료) + Claude Sonnet 4.6(무료) + Grok 4.1(무료)

무료 LLM(Gemini Flash) + FastAPI로
실시간 스트리밍 챗봇 구축법을
비용 0원으로 알려줘.

소자본 모델

Claude API + Cursor $20/mo + Make.com — 월 10~30만원

Claude API + FastAPI + Next.js로
프로덕션 스트리밍 챗봇을 빠르게
구축하는 패턴을 알려줘.

프로덕션 모델

Claude Opus + CrewAI + LangGraph — 월 100만원+

Claude Opus + LangServe + Redis
캐시로 엔터프라이즈 스트리밍 시스템
아키텍처를 설계해줘.

스택 프롬프트

0원→$20/mo→$100/mo 단계별 스택 비교

단순 stream() → FastAPI SSE → LangServe
3단계 비용/복잡도/UX를 비교해줘.

⭐ 이것만 기억하세요

스트리밍과 실시간 응답은 이 3가지만 확실히 잡으세요

1.스트리밍은 토큰 단위 출력으로 응답 시작 시간 즉시 — 사용자 체감 속도 5~10배

2.LangChain의 stream()/astream() + FastAPI SSE = 표준 실시간 챗봇 패턴

3.다음 챕터 CH.7에서 모니터링 — LLM 호출 추적과 품질 평가

💬 이 챕터 질문 보기

AI-ORCHESTRATION · CH.55 — 질문하거나 답변을 확인하세요

→

진행도 55 / 59

← 커리큘럼으로 ← 목록으로 (AI Orchestration)