2026년 2월 6일·아키텍처·

2장: RESTful API 설계 원칙과 AI 서비스 적용

Richardson 성숙도 모델부터 리소스 설계, HTTP 메서드, OpenAPI 3.1 스펙, AI 서비스 REST 엔드포인트 설계까지 RESTful API의 핵심 원칙을 실습합니다.

16분947자8개 섹션

api-design graphql architecture

api-design2 / 11

1 2 3 4 5 6 7 8 9 10 11

이전1장: API 설계의 진화와 AI 서비스의 도전 다음3장: gRPC — 고성능 서비스 간 통신

학습 목표

Richardson 성숙도 모델의 4단계를 이해하고 실무에 적용합니다
리소스 중심 URL 설계와 HTTP 메서드의 올바른 사용법을 학습합니다
페이지네이션, 필터링, 정렬의 표준 패턴을 익힙니다
OpenAPI 3.1 스펙으로 API를 문서화하는 방법을 배웁니다
AI 서비스에 REST를 적용하는 구체적인 엔드포인트를 설계합니다

Richardson 성숙도 모델

Leonard Richardson이 제안한 REST 성숙도 모델은 API가 얼마나 RESTful한지를 4단계로 평가합니다. 대부분의 프로덕션 API는 Level 2에 해당하며, Level 3의 HATEOAS는 실무에서는 선택적으로 적용됩니다.

Level 0 — 단일 엔드포인트

모든 요청을 하나의 URL로 보내고, 요청 본문에 동작을 명시합니다. SOAP이 전형적인 Level 0입니다.

level-0-example.sh

bash

# 모든 요청이 같은 엔드포인트로
POST /api
{"action": "getUser", "userId": "42"}
 
POST /api
{"action": "createUser", "name": "Kreath"}

Level 1 — 리소스 분리

각 리소스에 고유한 엔드포인트를 부여하지만, 여전히 HTTP 메서드를 구분하지 않습니다.

level-1-example.sh

bash

POST /api/users/42
{"action": "get"}
 
POST /api/users
{"action": "create", "name": "Kreath"}

Level 2 — HTTP 메서드 활용

HTTP 메서드(GET, POST, PUT, DELETE 등)의 시맨틱을 올바르게 활용합니다. 대부분의 현대적 REST API가 이 단계에 해당합니다.

level-2-example.sh

bash

GET    /api/users/42          # 조회
POST   /api/users             # 생성
PUT    /api/users/42          # 전체 수정
PATCH  /api/users/42          # 부분 수정
DELETE /api/users/42          # 삭제

Level 3 — HATEOAS

응답에 관련 리소스의 링크를 포함하여, 클라이언트가 API를 탐색할 수 있게 합니다.

level-3-response.json

json

{
  "id": "42",
  "name": "Kreath",
  "email": "kreath@example.com",
  "_links": {
    "self": { "href": "/api/users/42" },
    "posts": { "href": "/api/users/42/posts" },
    "update": { "href": "/api/users/42", "method": "PUT" },
    "delete": { "href": "/api/users/42", "method": "DELETE" }
  }
}

Tip

실무에서는 Level 2를 기본 목표로 삼되, Level 3의 HATEOAS는 API 탐색성이 중요한 공개 API에서 선택적으로 적용하는 것이 현실적입니다. 완벽한 HATEOAS 구현보다는 일관된 Level 2 설계가 개발 생산성에 더 도움이 됩니다.

리소스 설계 원칙

REST API의 핵심은 리소스(Resource) 중심 설계입니다. URL은 리소스를 식별하고, HTTP 메서드는 리소스에 대한 동작을 나타냅니다.

URL 설계 규칙

url-design-rules.sh

bash

# 복수형 명사 사용
GET /api/v1/users          # O
GET /api/v1/user           # X
 
# 계층적 관계 표현
GET /api/v1/users/42/posts
GET /api/v1/users/42/posts/101/comments
 
# 동작은 URL이 아닌 HTTP 메서드로
POST /api/v1/users         # O (생성)
POST /api/v1/createUser    # X (동사 사용 금지)
 
# 소문자, 하이픈 사용
GET /api/v1/ai-models      # O
GET /api/v1/aiModels       # X
GET /api/v1/AI_Models      # X

HTTP 메서드와 멱등성

메서드	용도	멱등성	안전성	요청 본문
GET	조회	예	예	없음
POST	생성	아니오	아니오	있음
PUT	전체 교체	예	아니오	있음
PATCH	부분 수정	아니오	아니오	있음
DELETE	삭제	예	아니오	선택적

Info

멱등성(Idempotency)이란 같은 요청을 여러 번 보내도 결과가 동일한 속성입니다. GET, PUT, DELETE는 멱등적이므로 네트워크 오류 시 안전하게 재시도할 수 있습니다. POST는 멱등적이지 않으므로 Idempotency-Key 헤더를 활용하는 것이 권장됩니다.

HTTP 상태 코드

올바른 상태 코드 사용은 API 클라이언트가 응답을 프로그래밍적으로 처리하는 데 필수적입니다.

status-codes-reference.ts

typescript

// 2xx — 성공
// 200 OK: 조회/수정 성공
// 201 Created: 리소스 생성 성공
// 202 Accepted: 비동기 작업 수락
// 204 No Content: 삭제 성공 (본문 없음)
 
// 4xx — 클라이언트 오류
// 400 Bad Request: 잘못된 요청 형식
// 401 Unauthorized: 인증 필요
// 403 Forbidden: 권한 부족
// 404 Not Found: 리소스 없음
// 409 Conflict: 충돌 (중복 생성 등)
// 422 Unprocessable Entity: 유효성 검증 실패
// 429 Too Many Requests: 레이트 리밋 초과
 
// 5xx — 서버 오류
// 500 Internal Server Error: 서버 내부 오류
// 502 Bad Gateway: 업스트림 서버 오류
// 503 Service Unavailable: 서비스 일시 중단
// 504 Gateway Timeout: 업스트림 타임아웃

페이지네이션, 필터링, 정렬

대량의 데이터를 효율적으로 전달하기 위한 표준 패턴을 살펴보겠습니다.

커서 기반 페이지네이션

오프셋 기반보다 커서 기반 페이지네이션이 대규모 데이터셋에서 성능이 우수합니다.

cursor-pagination.ts

typescript

// 요청
// GET /api/v1/models?limit=20&cursor=eyJpZCI6MTAwfQ
 
// 응답
interface PaginatedResponse<T> {
  data: T[];
  pagination: {
    next_cursor: string | null;  // 다음 페이지 커서
    has_more: boolean;           // 다음 페이지 존재 여부
    total: number;               // 전체 항목 수 (선택적)
  };
}
 
// 응답 예시
const response = {
  data: [
    { id: "model-1", name: "gpt-4o", provider: "openai" },
    { id: "model-2", name: "claude-4", provider: "anthropic" }
  ],
  pagination: {
    next_cursor: "eyJpZCI6MTIwfQ",
    has_more: true,
    total: 250
  }
};

필터링과 정렬

filtering-and-sorting.sh

bash

# 필터링 — 쿼리 파라미터 사용
GET /api/v1/models?provider=openai&category=chat&min_context=128000
 
# 정렬 — sort 파라미터
GET /api/v1/models?sort=created_at&order=desc
 
# 복합 정렬
GET /api/v1/models?sort=provider,-created_at
 
# 필드 선택 (Sparse Fieldsets)
GET /api/v1/models?fields=id,name,provider,pricing

OpenAPI 3.1 스펙

OpenAPI 3.1은 REST API를 기술하는 표준 스펙으로, JSON Schema와 완전히 호환되며 API 문서화, 코드 생성, 테스트 자동화의 기반이 됩니다.

openapi-ai-service.yaml

yaml

openapi: 3.1.0
info:
  title: AI Service API
  version: 1.0.0
  description: AI 추론 서비스 API
 
servers:
  - url: https://api.example.com/v1
    description: Production
 
paths:
  /completions:
    post:
      summary: 텍스트 완성 요청
      operationId: createCompletion
      tags:
        - Completions
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/CompletionRequest"
      responses:
        "200":
          description: 완성 결과
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/CompletionResponse"
        "429":
          description: 레이트 리밋 초과
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Error"
 
components:
  schemas:
    CompletionRequest:
      type: object
      required:
        - model
        - messages
      properties:
        model:
          type: string
          example: "gpt-4o"
        messages:
          type: array
          items:
            $ref: "#/components/schemas/Message"
        temperature:
          type: number
          minimum: 0
          maximum: 2
          default: 1
        max_tokens:
          type: integer
          minimum: 1
        stream:
          type: boolean
          default: false
 
    Message:
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum: [system, user, assistant]
        content:
          type: string
 
    CompletionResponse:
      type: object
      properties:
        id:
          type: string
        model:
          type: string
        choices:
          type: array
          items:
            type: object
            properties:
              index:
                type: integer
              message:
                $ref: "#/components/schemas/Message"
              finish_reason:
                type: string
                enum: [stop, length, tool_calls]
        usage:
          $ref: "#/components/schemas/Usage"
 
    Usage:
      type: object
      properties:
        prompt_tokens:
          type: integer
        completion_tokens:
          type: integer
        total_tokens:
          type: integer
 
    Error:
      type: object
      properties:
        error:
          type: object
          properties:
            type:
              type: string
            message:
              type: string
            code:
              type: string
 
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
 
security:
  - bearerAuth: []

Tip

OpenAPI 스펙을 API 구현 전에 먼저 작성하는 "스펙 퍼스트(Spec-First)" 접근법을 권장합니다. 스펙이 프론트엔드 팀, 파트너사, QA 팀과의 계약서 역할을 하며, 9장에서 다룰 SDK 자동 생성의 기반이 됩니다.

AI 서비스 REST 엔드포인트 설계

AI 서비스의 주요 기능을 REST 엔드포인트로 설계해 보겠습니다.

엔드포인트 구조

ai-service-endpoints.sh

bash

# 모델 관리
GET    /api/v1/models                    # 사용 가능한 모델 목록
GET    /api/v1/models/gpt-4o             # 특정 모델 정보
 
# 텍스트 완성
POST   /api/v1/completions              # 텍스트 완성 (동기)
POST   /api/v1/chat/completions          # 대화형 완성 (동기/스트리밍)
 
# 임베딩
POST   /api/v1/embeddings               # 벡터 임베딩 생성
 
# 이미지
POST   /api/v1/images/generations        # 이미지 생성
POST   /api/v1/images/edits              # 이미지 편집
 
# 배치 작업
POST   /api/v1/batches                   # 배치 작업 생성
GET    /api/v1/batches/batch_abc123      # 배치 상태 조회
DELETE /api/v1/batches/batch_abc123      # 배치 취소
 
# 사용량
GET    /api/v1/usage                     # 토큰 사용량 조회
GET    /api/v1/usage/costs               # 비용 분석

FastAPI 구현

app/main.py

python

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel, Field
from enum import Enum
 
app = FastAPI(
    title="AI Service API",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc",
)
 
 
class Role(str, Enum):
    system = "system"
    user = "user"
    assistant = "assistant"
 
 
class Message(BaseModel):
    role: Role
    content: str
 
 
class CompletionRequest(BaseModel):
    model: str = Field(..., example="gpt-4o")
    messages: list[Message]
    temperature: float = Field(default=1.0, ge=0, le=2)
    max_tokens: int | None = Field(default=None, ge=1)
    stream: bool = False
 
 
class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
 
 
class Choice(BaseModel):
    index: int
    message: Message
    finish_reason: str
 
 
class CompletionResponse(BaseModel):
    id: str
    model: str
    choices: list[Choice]
    usage: Usage
 
 
@app.post(
    "/api/v1/chat/completions",
    response_model=CompletionResponse,
    summary="대화형 텍스트 완성",
    tags=["Chat"],
)
async def create_chat_completion(
    request: CompletionRequest,
):
    """
    대화 메시지를 기반으로 AI 모델의 응답을 생성합니다.
    stream=true로 설정하면 SSE 스트리밍 응답을 반환합니다.
    """
    if request.stream:
        return create_streaming_response(request)
 
    # 동기 응답 처리
    response = await inference_service.complete(request)
    return CompletionResponse(
        id=generate_id(),
        model=request.model,
        choices=[
            Choice(
                index=0,
                message=Message(
                    role=Role.assistant,
                    content=response.text,
                ),
                finish_reason="stop",
            )
        ],
        usage=Usage(
            prompt_tokens=response.prompt_tokens,
            completion_tokens=response.completion_tokens,
            total_tokens=response.total_tokens,
        ),
    )

에러 응답 표준화

일관된 에러 응답 형식은 API 사용성의 핵심입니다.

app/errors.py

python

from fastapi import Request
from fastapi.responses import JSONResponse
 
 
class APIError(Exception):
    def __init__(
        self,
        status_code: int,
        error_type: str,
        message: str,
        code: str | None = None,
    ):
        self.status_code = status_code
        self.error_type = error_type
        self.message = message
        self.code = code
 
 
@app.exception_handler(APIError)
async def api_error_handler(request: Request, exc: APIError):
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error": {
                "type": exc.error_type,
                "message": exc.message,
                "code": exc.code,
            }
        },
    )
 
 
# 사용 예시
# raise APIError(
#     status_code=429,
#     error_type="rate_limit_exceeded",
#     message="토큰 한도를 초과했습니다. 1분 후 다시 시도해주세요.",
#     code="tokens_exceeded",
# )

응답 헤더 설계

AI API에서는 표준 HTTP 헤더 외에 토큰 사용량과 레이트 리밋 정보를 커스텀 헤더로 제공하는 것이 관례입니다.

response-headers-example.sh

bash

HTTP/1.1 200 OK
Content-Type: application/json
X-Request-Id: req_abc123def456
X-Model-Id: gpt-4o-2026-01
X-RateLimit-Limit-Requests: 1000
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Requests: 950
X-RateLimit-Remaining-Tokens: 85000
X-RateLimit-Reset-Requests: 2026-03-18T12:00:00Z
X-RateLimit-Reset-Tokens: 2026-03-18T12:00:00Z

Warning

X- 접두사 헤더는 RFC 6648에서 폐기(deprecated)되었지만, AI API 생태계에서는 OpenAI가 설정한 관례를 따르는 것이 호환성 측면에서 유리합니다. 새로운 API를 설계할 때는 RateLimit 표준 헤더(RFC 9110 제안)도 함께 고려하세요.

정리

이 장에서는 RESTful API의 핵심 설계 원칙을 Richardson 성숙도 모델부터 리소스 설계, HTTP 메서드, 상태 코드, 페이지네이션, OpenAPI 스펙까지 체계적으로 살펴보았습니다. 특히 AI 서비스에 REST를 적용할 때의 엔드포인트 구조, 에러 처리, 응답 헤더 설계를 FastAPI 구현과 함께 실습했습니다.

REST는 접근성과 생태계 지원 측면에서 공개 AI API의 사실상 표준입니다. OpenAPI 스펙을 먼저 정의하고, 이를 기반으로 서버 구현과 SDK 생성을 진행하는 스펙 퍼스트 접근법이 팀 협업과 장기적 유지보수에 효과적입니다.

다음 장 미리보기

3장에서는 내부 마이크로서비스 통신의 핵심인 gRPC를 다룹니다. HTTP/2와 Protocol Buffers의 동작 원리, 4가지 스트리밍 모드, AI 추론 서비스를 gRPC로 구현하는 방법, 그리고 REST 대비 10배 빠른 성능을 달성하는 비결을 살펴봅니다.

이 글이 도움이 되셨나요?

아키텍처

3장: gRPC — 고성능 서비스 간 통신

HTTP/2와 Protocol Buffers 기반의 gRPC를 활용한 고성능 마이크로서비스 통신을 학습합니다. 4가지 스트리밍 모드와 AI 추론 서비스 구현을 실습합니다.

2026년 2월 8일·15분

아키텍처

1장: API 설계의 진화와 AI 서비스의 도전

SOAP에서 REST, GraphQL, gRPC까지 API 패러다임의 진화를 살펴보고, AI 서비스가 직면한 고유 과제와 2026년 하이브리드 아키텍처 트렌드를 분석합니다.

2026년 2월 4일·17분

아키텍처

4장: GraphQL — 유연한 데이터 쿼리

GraphQL의 스키마 퍼스트 설계, 타입 시스템, N+1 문제 해결, AI 서비스 데이터 모델링을 Apollo Server 실습과 함께 학습합니다.

2026년 2월 10일·12분

2026년 2월 6일·아키텍처·

2장: RESTful API 설계 원칙과 AI 서비스 적용

Richardson 성숙도 모델부터 리소스 설계, HTTP 메서드, OpenAPI 3.1 스펙, AI 서비스 REST 엔드포인트 설계까지 RESTful API의 핵심 원칙을 실습합니다.

16분947자8개 섹션

api-design graphql architecture

api-design2 / 11

1 2 3 4 5 6 7 8 9 10 11

이전1장: API 설계의 진화와 AI 서비스의 도전 다음3장: gRPC — 고성능 서비스 간 통신

학습 목표

Richardson 성숙도 모델의 4단계를 이해하고 실무에 적용합니다
리소스 중심 URL 설계와 HTTP 메서드의 올바른 사용법을 학습합니다
페이지네이션, 필터링, 정렬의 표준 패턴을 익힙니다
OpenAPI 3.1 스펙으로 API를 문서화하는 방법을 배웁니다
AI 서비스에 REST를 적용하는 구체적인 엔드포인트를 설계합니다

Richardson 성숙도 모델

Level 0 — 단일 엔드포인트

모든 요청을 하나의 URL로 보내고, 요청 본문에 동작을 명시합니다. SOAP이 전형적인 Level 0입니다.

level-0-example.sh

bash

# 모든 요청이 같은 엔드포인트로
POST /api
{"action": "getUser", "userId": "42"}
 
POST /api
{"action": "createUser", "name": "Kreath"}

Level 1 — 리소스 분리

각 리소스에 고유한 엔드포인트를 부여하지만, 여전히 HTTP 메서드를 구분하지 않습니다.

level-1-example.sh

bash

POST /api/users/42
{"action": "get"}
 
POST /api/users
{"action": "create", "name": "Kreath"}

Level 2 — HTTP 메서드 활용

HTTP 메서드(GET, POST, PUT, DELETE 등)의 시맨틱을 올바르게 활용합니다. 대부분의 현대적 REST API가 이 단계에 해당합니다.

level-2-example.sh

bash

GET    /api/users/42          # 조회
POST   /api/users             # 생성
PUT    /api/users/42          # 전체 수정
PATCH  /api/users/42          # 부분 수정
DELETE /api/users/42          # 삭제

Level 3 — HATEOAS

응답에 관련 리소스의 링크를 포함하여, 클라이언트가 API를 탐색할 수 있게 합니다.

level-3-response.json

json

{
  "id": "42",
  "name": "Kreath",
  "email": "kreath@example.com",
  "_links": {
    "self": { "href": "/api/users/42" },
    "posts": { "href": "/api/users/42/posts" },
    "update": { "href": "/api/users/42", "method": "PUT" },
    "delete": { "href": "/api/users/42", "method": "DELETE" }
  }
}

Tip

리소스 설계 원칙

REST API의 핵심은 리소스(Resource) 중심 설계입니다. URL은 리소스를 식별하고, HTTP 메서드는 리소스에 대한 동작을 나타냅니다.

URL 설계 규칙

url-design-rules.sh

bash

# 복수형 명사 사용
GET /api/v1/users          # O
GET /api/v1/user           # X
 
# 계층적 관계 표현
GET /api/v1/users/42/posts
GET /api/v1/users/42/posts/101/comments
 
# 동작은 URL이 아닌 HTTP 메서드로
POST /api/v1/users         # O (생성)
POST /api/v1/createUser    # X (동사 사용 금지)
 
# 소문자, 하이픈 사용
GET /api/v1/ai-models      # O
GET /api/v1/aiModels       # X
GET /api/v1/AI_Models      # X

HTTP 메서드와 멱등성

메서드	용도	멱등성	안전성	요청 본문
GET	조회	예	예	없음
POST	생성	아니오	아니오	있음
PUT	전체 교체	예	아니오	있음
PATCH	부분 수정	아니오	아니오	있음
DELETE	삭제	예	아니오	선택적

Info

HTTP 상태 코드

올바른 상태 코드 사용은 API 클라이언트가 응답을 프로그래밍적으로 처리하는 데 필수적입니다.

status-codes-reference.ts

typescript

// 2xx — 성공
// 200 OK: 조회/수정 성공
// 201 Created: 리소스 생성 성공
// 202 Accepted: 비동기 작업 수락
// 204 No Content: 삭제 성공 (본문 없음)
 
// 4xx — 클라이언트 오류
// 400 Bad Request: 잘못된 요청 형식
// 401 Unauthorized: 인증 필요
// 403 Forbidden: 권한 부족
// 404 Not Found: 리소스 없음
// 409 Conflict: 충돌 (중복 생성 등)
// 422 Unprocessable Entity: 유효성 검증 실패
// 429 Too Many Requests: 레이트 리밋 초과
 
// 5xx — 서버 오류
// 500 Internal Server Error: 서버 내부 오류
// 502 Bad Gateway: 업스트림 서버 오류
// 503 Service Unavailable: 서비스 일시 중단
// 504 Gateway Timeout: 업스트림 타임아웃

페이지네이션, 필터링, 정렬

대량의 데이터를 효율적으로 전달하기 위한 표준 패턴을 살펴보겠습니다.

커서 기반 페이지네이션

오프셋 기반보다 커서 기반 페이지네이션이 대규모 데이터셋에서 성능이 우수합니다.

cursor-pagination.ts

typescript

// 요청
// GET /api/v1/models?limit=20&cursor=eyJpZCI6MTAwfQ
 
// 응답
interface PaginatedResponse<T> {
  data: T[];
  pagination: {
    next_cursor: string | null;  // 다음 페이지 커서
    has_more: boolean;           // 다음 페이지 존재 여부
    total: number;               // 전체 항목 수 (선택적)
  };
}
 
// 응답 예시
const response = {
  data: [
    { id: "model-1", name: "gpt-4o", provider: "openai" },
    { id: "model-2", name: "claude-4", provider: "anthropic" }
  ],
  pagination: {
    next_cursor: "eyJpZCI6MTIwfQ",
    has_more: true,
    total: 250
  }
};

필터링과 정렬

filtering-and-sorting.sh

bash

# 필터링 — 쿼리 파라미터 사용
GET /api/v1/models?provider=openai&category=chat&min_context=128000
 
# 정렬 — sort 파라미터
GET /api/v1/models?sort=created_at&order=desc
 
# 복합 정렬
GET /api/v1/models?sort=provider,-created_at
 
# 필드 선택 (Sparse Fieldsets)
GET /api/v1/models?fields=id,name,provider,pricing

OpenAPI 3.1 스펙

OpenAPI 3.1은 REST API를 기술하는 표준 스펙으로, JSON Schema와 완전히 호환되며 API 문서화, 코드 생성, 테스트 자동화의 기반이 됩니다.

openapi-ai-service.yaml

yaml

openapi: 3.1.0
info:
  title: AI Service API
  version: 1.0.0
  description: AI 추론 서비스 API
 
servers:
  - url: https://api.example.com/v1
    description: Production
 
paths:
  /completions:
    post:
      summary: 텍스트 완성 요청
      operationId: createCompletion
      tags:
        - Completions
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/CompletionRequest"
      responses:
        "200":
          description: 완성 결과
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/CompletionResponse"
        "429":
          description: 레이트 리밋 초과
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Error"
 
components:
  schemas:
    CompletionRequest:
      type: object
      required:
        - model
        - messages
      properties:
        model:
          type: string
          example: "gpt-4o"
        messages:
          type: array
          items:
            $ref: "#/components/schemas/Message"
        temperature:
          type: number
          minimum: 0
          maximum: 2
          default: 1
        max_tokens:
          type: integer
          minimum: 1
        stream:
          type: boolean
          default: false
 
    Message:
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum: [system, user, assistant]
        content:
          type: string
 
    CompletionResponse:
      type: object
      properties:
        id:
          type: string
        model:
          type: string
        choices:
          type: array
          items:
            type: object
            properties:
              index:
                type: integer
              message:
                $ref: "#/components/schemas/Message"
              finish_reason:
                type: string
                enum: [stop, length, tool_calls]
        usage:
          $ref: "#/components/schemas/Usage"
 
    Usage:
      type: object
      properties:
        prompt_tokens:
          type: integer
        completion_tokens:
          type: integer
        total_tokens:
          type: integer
 
    Error:
      type: object
      properties:
        error:
          type: object
          properties:
            type:
              type: string
            message:
              type: string
            code:
              type: string
 
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
 
security:
  - bearerAuth: []

Tip

AI 서비스 REST 엔드포인트 설계

AI 서비스의 주요 기능을 REST 엔드포인트로 설계해 보겠습니다.

엔드포인트 구조

ai-service-endpoints.sh

bash

# 모델 관리
GET    /api/v1/models                    # 사용 가능한 모델 목록
GET    /api/v1/models/gpt-4o             # 특정 모델 정보
 
# 텍스트 완성
POST   /api/v1/completions              # 텍스트 완성 (동기)
POST   /api/v1/chat/completions          # 대화형 완성 (동기/스트리밍)
 
# 임베딩
POST   /api/v1/embeddings               # 벡터 임베딩 생성
 
# 이미지
POST   /api/v1/images/generations        # 이미지 생성
POST   /api/v1/images/edits              # 이미지 편집
 
# 배치 작업
POST   /api/v1/batches                   # 배치 작업 생성
GET    /api/v1/batches/batch_abc123      # 배치 상태 조회
DELETE /api/v1/batches/batch_abc123      # 배치 취소
 
# 사용량
GET    /api/v1/usage                     # 토큰 사용량 조회
GET    /api/v1/usage/costs               # 비용 분석

FastAPI 구현

app/main.py

python

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel, Field
from enum import Enum
 
app = FastAPI(
    title="AI Service API",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc",
)
 
 
class Role(str, Enum):
    system = "system"
    user = "user"
    assistant = "assistant"
 
 
class Message(BaseModel):
    role: Role
    content: str
 
 
class CompletionRequest(BaseModel):
    model: str = Field(..., example="gpt-4o")
    messages: list[Message]
    temperature: float = Field(default=1.0, ge=0, le=2)
    max_tokens: int | None = Field(default=None, ge=1)
    stream: bool = False
 
 
class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
 
 
class Choice(BaseModel):
    index: int
    message: Message
    finish_reason: str
 
 
class CompletionResponse(BaseModel):
    id: str
    model: str
    choices: list[Choice]
    usage: Usage
 
 
@app.post(
    "/api/v1/chat/completions",
    response_model=CompletionResponse,
    summary="대화형 텍스트 완성",
    tags=["Chat"],
)
async def create_chat_completion(
    request: CompletionRequest,
):
    """
    대화 메시지를 기반으로 AI 모델의 응답을 생성합니다.
    stream=true로 설정하면 SSE 스트리밍 응답을 반환합니다.
    """
    if request.stream:
        return create_streaming_response(request)
 
    # 동기 응답 처리
    response = await inference_service.complete(request)
    return CompletionResponse(
        id=generate_id(),
        model=request.model,
        choices=[
            Choice(
                index=0,
                message=Message(
                    role=Role.assistant,
                    content=response.text,
                ),
                finish_reason="stop",
            )
        ],
        usage=Usage(
            prompt_tokens=response.prompt_tokens,
            completion_tokens=response.completion_tokens,
            total_tokens=response.total_tokens,
        ),
    )

에러 응답 표준화

일관된 에러 응답 형식은 API 사용성의 핵심입니다.

app/errors.py

python

from fastapi import Request
from fastapi.responses import JSONResponse
 
 
class APIError(Exception):
    def __init__(
        self,
        status_code: int,
        error_type: str,
        message: str,
        code: str | None = None,
    ):
        self.status_code = status_code
        self.error_type = error_type
        self.message = message
        self.code = code
 
 
@app.exception_handler(APIError)
async def api_error_handler(request: Request, exc: APIError):
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error": {
                "type": exc.error_type,
                "message": exc.message,
                "code": exc.code,
            }
        },
    )
 
 
# 사용 예시
# raise APIError(
#     status_code=429,
#     error_type="rate_limit_exceeded",
#     message="토큰 한도를 초과했습니다. 1분 후 다시 시도해주세요.",
#     code="tokens_exceeded",
# )

응답 헤더 설계

AI API에서는 표준 HTTP 헤더 외에 토큰 사용량과 레이트 리밋 정보를 커스텀 헤더로 제공하는 것이 관례입니다.

response-headers-example.sh

bash

HTTP/1.1 200 OK
Content-Type: application/json
X-Request-Id: req_abc123def456
X-Model-Id: gpt-4o-2026-01
X-RateLimit-Limit-Requests: 1000
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Requests: 950
X-RateLimit-Remaining-Tokens: 85000
X-RateLimit-Reset-Requests: 2026-03-18T12:00:00Z
X-RateLimit-Reset-Tokens: 2026-03-18T12:00:00Z