스트리밍 응답: 실시간 AI 출력

핵심 개념

SSE·useChat·token-by-token·UX 패턴 — 즉시 응답 시작.

본문

스트리밍 = 즉시 응답

📋 코드 (14줄)

[Non-streaming]
- 사용자가 질문
- 5~30초 대기 (응답 전체 생성)
- 한꺼번에 표시

[Streaming]
- 사용자가 질문
- 0.5초 후 첫 토큰
- 토큰 단위 점진 표시
- 전체 5~30초 동안 계속


→ 체감 속도 10배
→ "이미 답변 중" — 사용자 인내심 ↑

Vercel AI SDK 스트리밍

TYPESCRIPT📋 코드 (15줄)

// 서버
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();
  const result = streamText({ model, messages });
  return result.toDataStreamResponse();
}


// 클라이언트 — useChat이 자동 처리
import { useChat } from 'ai/react';

const { messages, isLoading } = useChat();
// messages는 자동으로 토큰 단위 업데이트

커스텀 스트리밍

TYPESCRIPT📋 코드 (39줄)

// 직접 스트림 처리
'use client';
import { useState } from 'react';

export function CustomChat() {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const send = async (message: string) => {
    setIsStreaming(true);
    setResponse('');

    const res = await fetch('/api/chat', {
      method: 'POST',
      body: JSON.stringify({ message }),
    });

    if (!res.body) throw new Error('No body');

    const reader = res.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      const text = decoder.decode(value);
      setResponse(prev => prev + text);
    }

    setIsStreaming(false);
  };

  return (
    <>
      <div>{response}</div>
      {isStreaming && <Cursor />}
    </>
  );
}

타이핑 인디케이터

TSX📋 코드 (13줄)

function TypingIndicator() {
  return (
    <div className="flex space-x-1">
      <span className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" style={{ animationDelay: '0ms' }} />
      <span className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" style={{ animationDelay: '150ms' }} />
      <span className="w-2 h-2 bg-muted-foreground rounded-full animate-bounce" style={{ animationDelay: '300ms' }} />
    </div>
  );
}


// 표시
{isLoading && response === '' && <TypingIndicator />}

Markdown 스트리밍 렌더

TSX📋 코드 (21줄)

import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';

function StreamedMessage({ content }: { content: string }) {
  return (
    <ReactMarkdown
      remarkPlugins={[remarkGfm]}
      components={{
        code: ({ children, ...props }) => (
          <code className="bg-muted px-1 rounded" {...props}>{children}</code>
        ),
      }}
    >
      {content}
    </ReactMarkdown>
  );
}


// 부분 코드 블록 처리 (스트리밍 중)
// "```typescript\nconst x"  ← 미완성 — 라이브러리가 자동 처리

스트림 중단

TYPESCRIPT📋 코드 (18줄)

'use client';
import { useChat } from 'ai/react';

const { messages, stop, isLoading } = useChat();

// 사용자가 중단 가능
{isLoading && <button onClick={stop}>중단</button>}


// 서버에서도 처리 (AbortSignal)
export async function POST(req: Request) {
  const result = streamText({
    model,
    messages,
    abortSignal: req.signal,  // 클라이언트 중단 → 서버도 중단
  });
  return result.toDataStreamResponse();
}

Tool 호출 스트리밍

TYPESCRIPT📋 코드 (18줄)

// Tool 호출 중에도 사용자에게 표시
const result = streamText({
  model,
  messages,
  tools: { searchDb, sendEmail },
  onStepFinish({ stepType, toolCalls }) {
    // 각 단계 종료 시 호출
    console.log(`Step: ${stepType}`, toolCalls);
  },
});


// 클라이언트 — toolInvocations 표시
{message.toolInvocations?.map(invocation => (
  <div className="bg-muted p-2 rounded">
    🔧 {invocation.toolName}({JSON.stringify(invocation.args)})
  </div>
))}

다음 챕터

CH.32 "비용 관리: 토큰 사용량 추적 + 사용자별 제한".

AI 프롬프트

🤖 AI에게 잘 물어보는 법 — 모델·전략별 프롬프트

무료

월 $0 — 검증·시작 단계

스트리밍 응답을 무료 도구만으로
시작하는 방법을 알려줘.

소자본

월 $20~50 — MVP·초기 운영

월 $20~50 예산으로 스트리밍 응답을
검증·MVP 단계까지 진행하는 전략은?

프로덕션

월 $200~500 — 성장 단계

스트리밍 응답을 프로덕션 단계로
확장할 때 필요한 도구·운영 체계는?

스택

풀스택 — 도구 조합 분석

2026년 스트리밍 응답 관련 도구 5개를
조합한 추천 스택을 알려줘.

⭐ 이것만 기억하세요

스트리밍 응답: 실시간 AI 출력은 이 3가지만 확실히 잡으세요

1.스트리밍 = 첫 토큰 0.5초 + 점진 표시 = 체감 10배 빠름

2.useChat이 자동 처리 — 직접 SSE 다룰 필요 없음

3.중단 (stop) 기능으로 사용자 제어

💬 이 챕터 질문 보기

AI-STARTUP · CH.31 — 질문하거나 답변을 확인하세요

→

진행도 31 / 100

← 커리큘럼으로 ← 목록으로 (ai-startup)