2026년 3월 7일·아키텍처·

7장: 장애 대응과 회복 탄력성

AI 시스템의 장애 시나리오와 회복 탄력성 패턴 — 서킷 브레이커, 폴백, 재시도, 타임아웃, 모델 장애 조치, 그리고 그레이스풀 디그레이데이션을 다룹니다.

16분528자6개 섹션

이전6장: 비용 관리와 최적화 아키텍처 다음8장: AI 시스템의 관측 가능성

AI 시스템은 다르게 실패합니다

전통적인 서버 애플리케이션의 장애는 비교적 예측 가능합니다. 데이터베이스 연결이 끊기거나, 디스크가 가득 차거나, 메모리가 부족해지는 등의 패턴입니다. 반면 AI 시스템은 이러한 전통적 장애에 더해 고유한 실패 모드를 가지고 있습니다.

AI 고유의 장애 유형

API 제공자 장애는 LLM API 제공자의 서비스 중단으로, 외부 의존성이므로 직접 제어할 수 없습니다. Anthropic, OpenAI 등 주요 제공자도 간헐적 장애를 경험합니다.

속도 제한(Rate Limiting)은 API 호출 빈도가 제공자의 허용 한도를 초과하면 요청이 거부되는 현상입니다. 트래픽이 급증하는 시간대에 빈번하게 발생합니다.

모델 품질 저하는 API가 정상적으로 응답하지만 응답 품질이 갑자기 하락하는 경우입니다. 모델 업데이트, 인프라 변경 등으로 발생할 수 있으며, 가장 감지하기 어려운 장애 유형입니다.

환각 급증(Hallucination Spike)은 특정 입력 패턴이나 시스템 상태에서 환각 빈도가 비정상적으로 증가하는 현상입니다. 사실과 다른 정보를 자신 있게 생성하므로 자동 감지 체계가 없으면 사용자에게 그대로 전달됩니다.

비용 폭증은 프롬프트 주입 공격이나 버그로 인해 토큰 소비가 폭발적으로 증가하는 상황입니다. 6장에서 다룬 토큰 예산이 이를 방어합니다.

Warning

AI 시스템의 장애 중 가장 위험한 유형은 "조용한 실패(Silent Failure)"입니다. API는 200 OK를 반환하지만 응답 내용이 부정확한 경우, 전통적인 모니터링으로는 감지할 수 없습니다. 8장에서 다룰 관측 가능성 체계가 이 문제를 해결합니다.

서킷 브레이커 패턴

서킷 브레이커(Circuit Breaker)는 연쇄 장애를 방지하는 핵심 패턴입니다. 전기 회로의 차단기처럼, 특정 서비스로의 호출이 반복적으로 실패하면 일시적으로 호출을 차단하여 시스템 전체의 안정성을 보호합니다.

상태 전이

서킷 브레이커는 세 가지 상태를 순환합니다.

Closed(정상): 모든 요청이 통과합니다. 실패 횟수를 카운트합니다.

Open(차단): 모든 요청을 즉시 차단하고 폴백 응답을 반환합니다. 불필요한 대기 시간과 리소스 낭비를 방지합니다.

Half-Open(시험): 일정 시간이 지난 후 소수의 요청만 통과시켜 서비스 복구 여부를 확인합니다.

src/resilience/circuit-breaker.ts

typescript

type CircuitState = "closed" | "open" | "half-open";
 
interface CircuitBreakerConfig {
  failureThreshold: number;
  resetTimeoutMs: number;
  halfOpenMaxAttempts: number;
  monitorWindowMs: number;
}
 
class CircuitBreaker {
  private state: CircuitState = "closed";
  private failureCount = 0;
  private lastFailureTime = 0;
  private halfOpenAttempts = 0;
 
  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig
  ) {}
 
  async execute<T>(
    operation: () => Promise<T>,
    fallback: () => Promise<T>
  ): Promise<T> {
    if (this.state === "open") {
      if (this.shouldAttemptReset()) {
        this.state = "half-open";
        this.halfOpenAttempts = 0;
      } else {
        console.warn(
          `[CircuitBreaker:${this.name}] 회로 개방 상태, 폴백 실행`
        );
        return fallback();
      }
    }
 
    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
 
      if (this.state === "open") {
        return fallback();
      }
      throw error;
    }
  }
 
  private onSuccess(): void {
    if (this.state === "half-open") {
      this.halfOpenAttempts++;
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        this.state = "closed";
        this.failureCount = 0;
        console.info(`[CircuitBreaker:${this.name}] 회로 복구 완료`);
      }
    } else {
      this.failureCount = 0;
    }
  }
 
  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
 
    if (this.failureCount >= this.config.failureThreshold) {
      this.state = "open";
      console.error(
        `[CircuitBreaker:${this.name}] 회로 개방: ${this.failureCount}회 연속 실패`
      );
    }
  }
 
  private shouldAttemptReset(): boolean {
    return Date.now() - this.lastFailureTime >= this.config.resetTimeoutMs;
  }
 
  getState(): CircuitState {
    return this.state;
  }
}

폴백 체인 설계

서킷 브레이커가 회로를 차단했을 때, 사용자에게 어떤 응답을 제공할 것인가를 결정하는 것이 폴백 체인(Fallback Chain)입니다. AI 시스템에서는 다단계 폴백 전략이 효과적입니다.

4단계 폴백 전략

1단계 - 기본 모델: Anthropic Claude를 기본 모델로 사용합니다. 대부분의 요청은 이 단계에서 처리됩니다.

2단계 - 대체 모델: 1차 제공자에 장애가 발생하면 OpenAI GPT 등 다른 제공자의 모델로 자동 전환합니다. 프롬프트 호환성을 사전에 확보해야 합니다.

3단계 - 캐시 응답: 모든 LLM 제공자가 불가용할 때, 이전에 캐시된 유사한 질문의 응답을 반환합니다. 정확도는 낮을 수 있지만, 완전한 서비스 중단보다 낫습니다.

4단계 - 정적 응답: 최후의 수단으로, "현재 AI 서비스를 이용할 수 없습니다"와 같은 정적 메시지를 반환합니다. 사용자에게 상황을 투명하게 안내합니다.

src/resilience/fallback-chain.ts

typescript

interface FallbackResult<T> {
  data: T;
  source: "primary" | "secondary" | "cache" | "static";
  degraded: boolean;
}
 
class LLMFallbackChain {
  constructor(
    private readonly primaryBreaker: CircuitBreaker,
    private readonly secondaryBreaker: CircuitBreaker,
    private readonly cacheService: SemanticCacheService,
    private readonly staticResponses: Map<string, string>
  ) {}
 
  async execute(
    prompt: string,
    category: string
  ): Promise<FallbackResult<string>> {
    // 1단계: 기본 모델
    try {
      const result = await this.primaryBreaker.execute(
        () => this.callPrimaryModel(prompt),
        () => Promise.reject(new Error("primary circuit open"))
      );
      return { data: result, source: "primary", degraded: false };
    } catch {
      // 1단계 실패, 2단계로 진행
    }
 
    // 2단계: 대체 모델
    try {
      const result = await this.secondaryBreaker.execute(
        () => this.callSecondaryModel(prompt),
        () => Promise.reject(new Error("secondary circuit open"))
      );
      return { data: result, source: "secondary", degraded: true };
    } catch {
      // 2단계 실패, 3단계로 진행
    }
 
    // 3단계: 캐시 검색
    const cached = await this.cacheService.findSimilar(prompt);
    if (cached) {
      return { data: cached, source: "cache", degraded: true };
    }
 
    // 4단계: 정적 응답
    const staticResponse =
      this.staticResponses.get(category) ??
      "현재 AI 서비스를 이용할 수 없습니다. 잠시 후 다시 시도해 주세요.";
 
    return { data: staticResponse, source: "static", degraded: true };
  }
 
  private async callPrimaryModel(prompt: string): Promise<string> {
    // Anthropic Claude API 호출
    throw new Error("구현 필요");
  }
 
  private async callSecondaryModel(prompt: string): Promise<string> {
    // OpenAI GPT API 호출 (프롬프트 변환 포함)
    throw new Error("구현 필요");
  }
}

Tip

멀티 프로바이더 전략을 구현할 때, 프롬프트 형식의 차이에 주의해야 합니다. Anthropic과 OpenAI의 시스템 프롬프트 처리 방식이 다르므로, 프롬프트 변환 계층을 별도로 구현하는 것이 좋습니다. 추상화 레이어를 두면 제공자 전환 시 애플리케이션 코드를 수정할 필요가 없습니다.

재시도와 지수 백오프

일시적 오류(네트워크 타임아웃, 일시적 속도 제한)에 대해서는 재시도가 효과적입니다. 그러나 무분별한 재시도는 이미 과부하된 서비스에 부하를 가중시킵니다. 지수 백오프(Exponential Backoff)와 지터(Jitter)를 결합하여 재시도 간격을 점진적으로 늘립니다.

src/resilience/retry.ts

typescript

interface RetryConfig {
  maxAttempts: number;
  baseDelayMs: number;
  maxDelayMs: number;
  retryableErrors: string[];
}
 
async function withRetry<T>(
  operation: () => Promise<T>,
  config: RetryConfig
): Promise<T> {
  let lastError: Error | undefined;
 
  for (let attempt = 0; attempt < config.maxAttempts; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
 
      if (!isRetryable(lastError, config.retryableErrors)) {
        throw lastError;
      }
 
      if (attempt < config.maxAttempts - 1) {
        const delay = calculateBackoff(attempt, config);
        console.warn(
          `재시도 ${attempt + 1}/${config.maxAttempts}, ${delay}ms 후 재시도`
        );
        await sleep(delay);
      }
    }
  }
 
  throw lastError;
}
 
function calculateBackoff(attempt: number, config: RetryConfig): number {
  // 지수 백오프 + 지터
  const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
  const jitter = Math.random() * config.baseDelayMs;
  return Math.min(exponentialDelay + jitter, config.maxDelayMs);
}
 
function isRetryable(error: Error, retryableErrors: string[]): boolean {
  return retryableErrors.some(
    (code) => error.message.includes(code) || error.name.includes(code)
  );
}
 
function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

타임아웃과 벌크헤드 격리

타임아웃 관리

LLM API 호출은 응답 시간의 분산이 큽니다. 동일한 요청이 어떤 때는 2초 만에 완료되고, 어떤 때는 30초가 걸릴 수 있습니다. 적절한 타임아웃을 설정하지 않으면 느린 응답이 전체 시스템의 스레드와 연결을 점유하여 연쇄 장애를 일으킵니다.

타임아웃은 두 단계로 설정합니다. 연결 타임아웃은 TCP 연결 수립까지의 시간을 제한하고(통상 5초), 응답 타임아웃은 전체 응답 수신까지의 시간을 제한합니다(작업 유형에 따라 10-60초).

벌크헤드 격리

벌크헤드(Bulkhead)는 선박의 격벽에서 유래한 패턴입니다. 시스템의 리소스를 격리된 풀로 나누어, 하나의 구성 요소에서 발생한 장애가 다른 구성 요소로 전파되지 않도록 합니다.

AI 시스템에서는 용도별로 LLM 호출 풀을 분리하는 것이 효과적입니다. 예를 들어 실시간 채팅용 풀, 백그라운드 분석용 풀, 배치 처리용 풀을 독립적으로 운영합니다. 채팅용 풀이 소진되어도 분석 기능은 정상 동작하며, 배치 처리가 과부하를 일으켜도 사용자 대면 서비스에 영향을 주지 않습니다.

그레이스풀 디그레이데이션 전략

완전한 서비스 중단보다는 기능을 점진적으로 축소하는 것이 바람직합니다. 그레이스풀 디그레이데이션(Graceful Degradation)은 시스템 부하나 장애 수준에 따라 제공하는 기능의 범위를 조절합니다.

시스템 상태	AI 기능	사용자 경험
정상	전체 기능 제공	Opus 급 추론, 실시간 응답
경미한 장애	모델 다운그레이드	Sonnet/Haiku 급으로 전환, 품질 약간 하락
심각한 장애	캐시 기반 응답	유사 질문의 캐시 응답 제공, 제한적 정확도
완전 장애	AI 기능 비활성화	규칙 기반 처리 또는 수동 대기열로 전환

핵심은 각 단계에서 사용자에게 현재 상태를 투명하게 알리는 것입니다. "현재 간소화된 모드로 운영 중입니다"와 같은 안내 메시지는 사용자의 기대치를 조절하고 신뢰를 유지하는 데 도움이 됩니다.

이번 장에서는 AI 시스템의 고유한 장애 유형을 식별하고, 서킷 브레이커, 폴백 체인, 재시도, 타임아웃, 벌크헤드, 그레이스풀 디그레이데이션 패턴을 살펴보았습니다. 이 패턴들을 조합하면 외부 AI 서비스의 불안정성에도 불구하고 안정적인 사용자 경험을 제공할 수 있습니다. 다음 장에서는 이러한 장애를 사전에 감지하고 진단하기 위한 관측 가능성 체계를 설계하겠습니다.

이 글이 도움이 되셨나요?

아키텍처

8장: AI 시스템의 관측 가능성

LLM 기반 시스템의 관측 가능성 설계 — 트레이싱, 메트릭, 로깅, 프롬프트 버전 관리, 품질 모니터링, 그리고 AI 특화 대시보드 구축을 다룹니다.

2026년 3월 9일·16분

아키텍처

6장: 비용 관리와 최적화 아키텍처

LLM API 비용을 제어하는 아키텍처 전략 — 토큰 예산 시스템, 모델 라우팅, 캐싱 경제학, 비용 모니터링, 그리고 비용 효율적 시스템 설계를 다룹니다.

2026년 3월 5일·16분

아키텍처

9장: 확장성과 멀티테넌시 설계

AI 시스템의 수평 확장, 멀티테넌시 아키텍처, 속도 제한, 공정 스케줄링, 그리고 대규모 AI 서비스 운영을 위한 인프라 설계를 다룹니다.

2026년 3월 11일·17분

2026년 3월 7일·아키텍처·

7장: 장애 대응과 회복 탄력성

16분528자6개 섹션

architecture llm infrastructure

ai-architecture7 / 10

1 2 3 4 5 6 7 8 9 10

이전6장: 비용 관리와 최적화 아키텍처 다음8장: AI 시스템의 관측 가능성

Open(차단): 모든 요청을 즉시 차단하고 폴백 응답을 반환합니다. 불필요한 대기 시간과 리소스 낭비를 방지합니다.

Half-Open(시험): 일정 시간이 지난 후 소수의 요청만 통과시켜 서비스 복구 여부를 확인합니다.

src/resilience/circuit-breaker.ts

typescript

type CircuitState = "closed" | "open" | "half-open";
 
interface CircuitBreakerConfig {
  failureThreshold: number;
  resetTimeoutMs: number;
  halfOpenMaxAttempts: number;
  monitorWindowMs: number;
}
 
class CircuitBreaker {
  private state: CircuitState = "closed";
  private failureCount = 0;
  private lastFailureTime = 0;
  private halfOpenAttempts = 0;
 
  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig
  ) {}
 
  async execute<T>(
    operation: () => Promise<T>,
    fallback: () => Promise<T>
  ): Promise<T> {
    if (this.state === "open") {
      if (this.shouldAttemptReset()) {
        this.state = "half-open";
        this.halfOpenAttempts = 0;
      } else {
        console.warn(
          `[CircuitBreaker:${this.name}] 회로 개방 상태, 폴백 실행`
        );
        return fallback();
      }
    }
 
    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
 
      if (this.state === "open") {
        return fallback();
      }
      throw error;
    }
  }
 
  private onSuccess(): void {
    if (this.state === "half-open") {
      this.halfOpenAttempts++;
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        this.state = "closed";
        this.failureCount = 0;
        console.info(`[CircuitBreaker:${this.name}] 회로 복구 완료`);
      }
    } else {
      this.failureCount = 0;
    }
  }
 
  private onFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();
 
    if (this.failureCount >= this.config.failureThreshold) {
      this.state = "open";
      console.error(
        `[CircuitBreaker:${this.name}] 회로 개방: ${this.failureCount}회 연속 실패`
      );
    }
  }
 
  private shouldAttemptReset(): boolean {
    return Date.now() - this.lastFailureTime >= this.config.resetTimeoutMs;
  }
 
  getState(): CircuitState {
    return this.state;
  }
}

폴백 체인 설계

4단계 폴백 전략

1단계 - 기본 모델: Anthropic Claude를 기본 모델로 사용합니다. 대부분의 요청은 이 단계에서 처리됩니다.

2단계 - 대체 모델: 1차 제공자에 장애가 발생하면 OpenAI GPT 등 다른 제공자의 모델로 자동 전환합니다. 프롬프트 호환성을 사전에 확보해야 합니다.

src/resilience/fallback-chain.ts

typescript

interface FallbackResult<T> {
  data: T;
  source: "primary" | "secondary" | "cache" | "static";
  degraded: boolean;
}
 
class LLMFallbackChain {
  constructor(
    private readonly primaryBreaker: CircuitBreaker,
    private readonly secondaryBreaker: CircuitBreaker,
    private readonly cacheService: SemanticCacheService,
    private readonly staticResponses: Map<string, string>
  ) {}
 
  async execute(
    prompt: string,
    category: string
  ): Promise<FallbackResult<string>> {
    // 1단계: 기본 모델
    try {
      const result = await this.primaryBreaker.execute(
        () => this.callPrimaryModel(prompt),
        () => Promise.reject(new Error("primary circuit open"))
      );
      return { data: result, source: "primary", degraded: false };
    } catch {
      // 1단계 실패, 2단계로 진행
    }
 
    // 2단계: 대체 모델
    try {
      const result = await this.secondaryBreaker.execute(
        () => this.callSecondaryModel(prompt),
        () => Promise.reject(new Error("secondary circuit open"))
      );
      return { data: result, source: "secondary", degraded: true };
    } catch {
      // 2단계 실패, 3단계로 진행
    }
 
    // 3단계: 캐시 검색
    const cached = await this.cacheService.findSimilar(prompt);
    if (cached) {
      return { data: cached, source: "cache", degraded: true };
    }
 
    // 4단계: 정적 응답
    const staticResponse =
      this.staticResponses.get(category) ??
      "현재 AI 서비스를 이용할 수 없습니다. 잠시 후 다시 시도해 주세요.";
 
    return { data: staticResponse, source: "static", degraded: true };
  }
 
  private async callPrimaryModel(prompt: string): Promise<string> {
    // Anthropic Claude API 호출
    throw new Error("구현 필요");
  }
 
  private async callSecondaryModel(prompt: string): Promise<string> {
    // OpenAI GPT API 호출 (프롬프트 변환 포함)
    throw new Error("구현 필요");
  }
}

Tip

재시도와 지수 백오프

src/resilience/retry.ts

typescript

interface RetryConfig {
  maxAttempts: number;
  baseDelayMs: number;
  maxDelayMs: number;
  retryableErrors: string[];
}
 
async function withRetry<T>(
  operation: () => Promise<T>,
  config: RetryConfig
): Promise<T> {
  let lastError: Error | undefined;
 
  for (let attempt = 0; attempt < config.maxAttempts; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error as Error;
 
      if (!isRetryable(lastError, config.retryableErrors)) {
        throw lastError;
      }
 
      if (attempt < config.maxAttempts - 1) {
        const delay = calculateBackoff(attempt, config);
        console.warn(
          `재시도 ${attempt + 1}/${config.maxAttempts}, ${delay}ms 후 재시도`
        );
        await sleep(delay);
      }
    }
  }
 
  throw lastError;
}
 
function calculateBackoff(attempt: number, config: RetryConfig): number {
  // 지수 백오프 + 지터
  const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
  const jitter = Math.random() * config.baseDelayMs;
  return Math.min(exponentialDelay + jitter, config.maxDelayMs);
}
 
function isRetryable(error: Error, retryableErrors: string[]): boolean {
  return retryableErrors.some(
    (code) => error.message.includes(code) || error.name.includes(code)
  );
}
 
function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

시스템 상태	AI 기능	사용자 경험
정상	전체 기능 제공	Opus 급 추론, 실시간 응답
경미한 장애	모델 다운그레이드	Sonnet/Haiku 급으로 전환, 품질 약간 하락
심각한 장애	캐시 기반 응답	유사 질문의 캐시 응답 제공, 제한적 정확도
완전 장애	AI 기능 비활성화	규칙 기반 처리 또는 수동 대기열로 전환