2026년 2월 3일·AI / ML·

9장: CI/CD 파이프라인 - GitHub Actions로 모델 배포 자동화

GitHub Actions를 활용하여 AI 서비스의 빌드, 테스트, 배포를 자동화하는 CI/CD 파이프라인을 구축하고, 모델 평가를 파이프라인에 통합합니다.

14분1,423자6개 섹션

mlops kubernetes infrastructure performance

이전8장: 비용 최적화 - 스팟 인스턴스, 모델 공유, 리소스 관리 다음10장: 실전 프로젝트 - 프로덕션 AI 서비스 파이프라인 구축

AI 서비스 CI/CD의 특수성

전통적인 소프트웨어의 CI/CD 파이프라인은 코드 변경을 중심으로 설계됩니다. 코드를 빌드하고, 테스트하고, 배포하는 일련의 과정이 비교적 직선적입니다. 반면 AI 서비스의 CI/CD는 여러 독립적인 변경 축을 다루어야 합니다.

첫째, 서빙 코드의 변경입니다. API 서버, 전처리/후처리 로직, 미들웨어 등 애플리케이션 코드가 변경되는 경우입니다. 이것은 기존 CI/CD와 유사합니다.

둘째, 모델의 변경입니다. 새로운 모델로 교체하거나, 기존 모델을 파인튜닝한 버전으로 업데이트하는 경우입니다. 모델 변경은 코드 변경과 독립적으로 발생할 수 있으며, 모델 평가라는 추가적인 검증 단계가 필요합니다.

셋째, 인프라 설정의 변경입니다. Kubernetes 매니페스트, 서빙 파라미터, 오토스케일링 설정 등의 변경입니다.

text

AI 서비스 CI/CD의 변경 축:
 
  [코드 변경]  ----+
                    |
  [모델 변경]  ----+--> [빌드] --> [테스트] --> [평가] --> [배포]
                    |
  [인프라 변경] ---+

GitHub Actions 파이프라인 설계

전체 파이프라인 구조

AI 서비스의 CI/CD 파이프라인을 세 개의 워크플로우로 분리합니다.

CI 워크플로우: 코드 검증, 이미지 빌드, 유닛 테스트
모델 평가 워크플로우: 모델 품질 검증
CD 워크플로우: 스테이징/프로덕션 배포

text

파이프라인 흐름:
 
  PR 생성/업데이트
      |
      v
  [CI: 린트 + 테스트 + 이미지 빌드]
      |
      v
  [모델 평가: 벤치마크 + 품질 체크]
      |
      v
  PR 머지 (main 브랜치)
      |
      v
  [CD: 스테이징 배포]
      |
      v
  [수동 승인]
      |
      v
  [CD: 프로덕션 배포]

CI 워크플로우

.github/workflows/ci.yml

yaml

name: CI
 
on:
  pull_request:
    branches: [main]
    paths:
      - "src/**"
      - "Dockerfile*"
      - "requirements*.txt"
      - ".github/workflows/ci.yml"
 
env:
  ECR_REGISTRY: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com
  ECR_REPOSITORY: ai-serving
  IMAGE_TAG: ${{ github.sha }}
 
jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: "pip"
 
      - name: Install dependencies
        run: pip install -r requirements-dev.txt
 
      - name: Run linter
        run: |
          ruff check src/
          ruff format --check src/
 
      - name: Run type checker
        run: mypy src/ --strict
 
      - name: Run unit tests
        run: pytest tests/unit/ -v --cov=src --cov-report=xml
 
      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.xml
 
  build-image:
    runs-on: ubuntu-latest
    needs: lint-and-test
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Login to ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}"
          format: "sarif"
          output: "trivy-results.sarif"
          severity: "CRITICAL,HIGH"

모델 평가 워크플로우

모델이 변경될 때(새 모델 도입, 양자화 적용, 파인튜닝 등) 자동으로 품질을 검증하는 워크플로우입니다.

.github/workflows/model-eval.yml

yaml

name: Model Evaluation
 
on:
  pull_request:
    branches: [main]
    paths:
      - "configs/model-config.yaml"
      - "evals/**"
 
  workflow_dispatch:
    inputs:
      model_name:
        description: "Model to evaluate"
        required: true
        default: "meta-llama/Llama-3.1-8B-Instruct"
 
jobs:
  evaluate:
    runs-on: [self-hosted, gpu]
    timeout-minutes: 60
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
 
      - name: Install evaluation dependencies
        run: pip install -r requirements-eval.txt
 
      - name: Start vLLM server
        run: |
          MODEL_NAME="${{ github.event.inputs.model_name || 'meta-llama/Llama-3.1-8B-Instruct' }}"
          vllm serve "$MODEL_NAME" \
            --host 0.0.0.0 \
            --port 8000 \
            --max-model-len 4096 &
 
          # 서버가 준비될 때까지 대기
          echo "Waiting for vLLM server..."
          for i in $(seq 1 60); do
            if curl -sf http://localhost:8000/health; then
              echo "Server is ready"
              break
            fi
            sleep 5
          done
 
      - name: Run evaluation suite
        run: |
          python evals/run_evaluation.py \
            --server-url http://localhost:8000/v1 \
            --eval-set evals/datasets/standard.jsonl \
            --output-dir eval-results/
 
      - name: Check quality gates
        run: |
          python evals/check_gates.py \
            --results eval-results/results.json \
            --thresholds evals/thresholds.yaml
 
      - name: Post evaluation results to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(
              fs.readFileSync('eval-results/results.json', 'utf8')
            );
            const body = [
              '## Model Evaluation Results',
              '',
              '| Metric | Score | Threshold | Status |',
              '|--------|-------|-----------|--------|',
              ...results.metrics.map(m =>
                `| ${m.name} | ${m.score.toFixed(3)} | ${m.threshold} | ${m.score >= m.threshold ? 'PASS' : 'FAIL'} |`
              ),
              '',
              `Overall: ${results.passed ? 'PASSED' : 'FAILED'}`,
            ].join('\n');
 
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });
 
      - name: Upload evaluation artifacts
        uses: actions/upload-artifact@v4
        with:
          name: eval-results
          path: eval-results/

CD 워크플로우

.github/workflows/cd.yml

yaml

name: CD
 
on:
  push:
    branches: [main]
    paths:
      - "src/**"
      - "Dockerfile*"
      - "k8s/**"
 
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        type: choice
        options:
          - staging
          - production
      image_tag:
        description: "Image tag to deploy"
        required: true
 
jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Set image tag
        id: image
        run: |
          TAG="${{ github.event.inputs.image_tag || github.sha }}"
          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
 
      - name: Deploy to staging
        run: |
          cd k8s/overlays/staging
          kustomize edit set image \
            "ai-serving=$ECR_REGISTRY/$ECR_REPOSITORY:${{ steps.image.outputs.tag }}"
          kustomize build . | kubectl apply -f -
 
      - name: Wait for rollout
        run: |
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving-staging \
            --timeout=600s
 
      - name: Run smoke tests
        run: |
          ENDPOINT=$(kubectl get svc vllm-service \
            -n ai-serving-staging \
            -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
          python tests/smoke/test_serving.py --endpoint "http://$ENDPOINT"
 
  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment: production
    if: github.event.inputs.environment == 'production' || github.ref == 'refs/heads/main'
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Deploy to production (canary)
        run: |
          cd k8s/overlays/production
          kustomize edit set image \
            "ai-serving=$ECR_REGISTRY/$ECR_REPOSITORY:${{ github.sha }}"
 
          # 카나리 배포: 먼저 1개 Pod만 업데이트
          kubectl apply -f canary-deployment.yaml
          kubectl rollout status deployment/vllm-llama-canary \
            -n ai-serving \
            --timeout=600s
 
      - name: Monitor canary (5 minutes)
        run: |
          echo "Monitoring canary deployment for 5 minutes..."
          python scripts/monitor_canary.py \
            --namespace ai-serving \
            --canary-deployment vllm-llama-canary \
            --stable-deployment vllm-llama \
            --duration 300 \
            --error-threshold 0.01
 
      - name: Promote canary to stable
        run: |
          cd k8s/overlays/production
          kustomize build . | kubectl apply -f -
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving \
            --timeout=600s
 
          # 카나리 정리
          kubectl delete deployment vllm-llama-canary \
            -n ai-serving \
            --ignore-not-found
 
      - name: Create deployment record
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.repos.createDeployment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: context.sha,
              environment: 'production',
              auto_merge: false,
              required_contexts: [],
              description: 'AI serving deployment'
            });

Info

production 환경에는 GitHub Environments의 보호 규칙(Protection Rules)을 설정하여, 지정된 리뷰어의 승인 없이는 배포가 진행되지 않도록 합니다. Settings > Environments > production에서 "Required reviewers"를 활성화하세요.

Kustomize로 환경별 설정 관리

디렉토리 구조

text

k8s/
  base/
    deployment.yaml
    service.yaml
    hpa.yaml
    kustomization.yaml
  overlays/
    staging/
      kustomization.yaml
      patches/
        deployment-patch.yaml
    production/
      kustomization.yaml
      patches/
        deployment-patch.yaml
      canary-deployment.yaml

베이스 설정

k8s/base/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
 
commonLabels:
  app: vllm
  managed-by: kustomize

스테이징 오버레이

k8s/overlays/staging/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: ai-serving-staging
 
resources:
  - ../../base
 
patches:
  - path: patches/deployment-patch.yaml
 
images:
  - name: ai-serving
    newName: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/ai-serving
    newTag: latest

k8s/overlays/staging/patches/deployment-patch.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: vllm
          resources:
            limits:
              nvidia.com/gpu: 1
            requests:
              nvidia.com/gpu: 1
              cpu: "2"
              memory: "16Gi"

프로덕션 오버레이

k8s/overlays/production/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: ai-serving
 
resources:
  - ../../base
 
patches:
  - path: patches/deployment-patch.yaml
 
images:
  - name: ai-serving
    newName: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/ai-serving
    newTag: latest

k8s/overlays/production/patches/deployment-patch.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: vllm
          resources:
            limits:
              nvidia.com/gpu: 1
            requests:
              nvidia.com/gpu: 1
              cpu: "4"
              memory: "24Gi"

GitOps 패턴

Argo CD를 활용한 GitOps

GitOps는 Git 리포지토리를 단일 진실의 원천(Single Source of Truth)으로 삼아, 선언적 설정을 통해 인프라와 애플리케이션을 관리하는 방법론입니다. Argo CD는 Kubernetes에서 GitOps를 구현하는 대표적인 도구입니다.

Argo CD 설치

bash

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

argocd-application.yaml

yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ai-serving-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/ai-serving-infra.git
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: ai-serving
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m
        factor: 2

GitOps 워크플로우에서는 CI 파이프라인이 이미지를 빌드하고, 인프라 리포지토리의 이미지 태그를 업데이트하는 PR을 자동 생성합니다. PR이 머지되면 Argo CD가 변경을 감지하고 자동으로 배포합니다.

text

GitOps 배포 흐름:
 
1. 개발자가 코드 변경 PR 생성
2. CI: 테스트 + 이미지 빌드 + ECR 푸시
3. CI: 인프라 리포의 이미지 태그 업데이트 PR 자동 생성
4. 리뷰어가 인프라 PR 승인 및 머지
5. Argo CD: Git 변경 감지 --> Kubernetes 배포 동기화
6. Argo CD: 헬스 체크 통과 확인

롤백 전략

자동 롤백

배포 후 헬스 체크가 실패하면 자동으로 이전 버전으로 롤백해야 합니다.

rollback-on-failure.yml (GitHub Actions job)

yaml

  rollback:
    runs-on: ubuntu-latest
    needs: deploy-production
    if: failure()
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Rollback deployment
        run: |
          kubectl rollout undo deployment/vllm-llama -n ai-serving
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving \
            --timeout=600s
 
      - name: Notify rollback
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'Deployment failed and was automatically rolled back.'
            });

수동 롤백

수동 롤백 명령어

bash

# 롤백 이력 확인
kubectl rollout history deployment/vllm-llama -n ai-serving
 
# 직전 버전으로 롤백
kubectl rollout undo deployment/vllm-llama -n ai-serving
 
# 특정 리비전으로 롤백
kubectl rollout undo deployment/vllm-llama -n ai-serving --to-revision=3
 
# 롤백 상태 확인
kubectl rollout status deployment/vllm-llama -n ai-serving

Tip

AI 서비스의 롤백에서는 모델 버전도 함께 고려해야 합니다. 서빙 코드와 모델을 함께 업데이트한 경우, 코드만 롤백하면 호환성 문제가 발생할 수 있습니다. 모델 버전을 Deployment의 환경 변수나 ConfigMap에 명시적으로 기록하여 추적할 것을 권장합니다.

AI 서비스의 CI/CD 파이프라인은 코드, 모델, 인프라라는 세 가지 변경 축을 통합적으로 관리해야 합니다. GitHub Actions로 빌드와 테스트를 자동화하고, 모델 평가를 파이프라인에 통합하며, 카나리 배포와 자동 롤백으로 안전한 프로덕션 배포를 구현할 수 있습니다. GitOps 패턴을 통해 선언적 인프라 관리를 실현하면, 배포의 투명성과 감사 가능성(Auditability)이 크게 향상됩니다.

다음 장에서는 이 시리즈에서 다룬 모든 내용을 종합하여 프로덕션 수준의 AI 서비스 파이프라인을 처음부터 끝까지 구축하는 실전 프로젝트를 진행하겠습니다.

이 글이 도움이 되셨나요?

AI / ML

10장: 실전 프로젝트 - 프로덕션 AI 서비스 파이프라인 구축

모델 서빙부터 Kubernetes 배포, 오토스케일링, CI/CD까지 전체 AI 서비스 배포 파이프라인을 처음부터 끝까지 구축하는 종합 실전 프로젝트입니다.

2026년 2월 5일·20분

AI / ML

8장: 비용 최적화 - 스팟 인스턴스, 모델 공유, 리소스 관리

GPU 기반 AI 서비스의 운영 비용을 체계적으로 절감하는 전략을 다루며, 스팟 인스턴스 활용, 모델 공유 아키텍처, 리소스 관리 기법을 소개합니다.

2026년 2월 1일·18분

AI / ML

7장: 오토스케일링 - 트래픽 기반 GPU 워크로드 확장

Kubernetes에서 GPU 기반 AI 서비스의 자동 확장 전략을 구현하며, HPA 커스텀 메트릭과 Cluster Autoscaler를 활용한 효율적인 스케일링 방법을 다룹니다.

2026년 1월 30일·17분

2026년 2월 3일·AI / ML·

9장: CI/CD 파이프라인 - GitHub Actions로 모델 배포 자동화

GitHub Actions를 활용하여 AI 서비스의 빌드, 테스트, 배포를 자동화하는 CI/CD 파이프라인을 구축하고, 모델 평가를 파이프라인에 통합합니다.

14분1,423자6개 섹션

mlops kubernetes infrastructure performance

ai-deployment9 / 10

1 2 3 4 5 6 7 8 9 10

이전8장: 비용 최적화 - 스팟 인스턴스, 모델 공유, 리소스 관리 다음10장: 실전 프로젝트 - 프로덕션 AI 서비스 파이프라인 구축

AI 서비스 CI/CD의 특수성

셋째, 인프라 설정의 변경입니다. Kubernetes 매니페스트, 서빙 파라미터, 오토스케일링 설정 등의 변경입니다.

text

AI 서비스 CI/CD의 변경 축:
 
  [코드 변경]  ----+
                    |
  [모델 변경]  ----+--> [빌드] --> [테스트] --> [평가] --> [배포]
                    |
  [인프라 변경] ---+

GitHub Actions 파이프라인 설계

전체 파이프라인 구조

AI 서비스의 CI/CD 파이프라인을 세 개의 워크플로우로 분리합니다.

CI 워크플로우: 코드 검증, 이미지 빌드, 유닛 테스트
모델 평가 워크플로우: 모델 품질 검증
CD 워크플로우: 스테이징/프로덕션 배포

text

파이프라인 흐름:
 
  PR 생성/업데이트
      |
      v
  [CI: 린트 + 테스트 + 이미지 빌드]
      |
      v
  [모델 평가: 벤치마크 + 품질 체크]
      |
      v
  PR 머지 (main 브랜치)
      |
      v
  [CD: 스테이징 배포]
      |
      v
  [수동 승인]
      |
      v
  [CD: 프로덕션 배포]

CI 워크플로우

.github/workflows/ci.yml

yaml

name: CI
 
on:
  pull_request:
    branches: [main]
    paths:
      - "src/**"
      - "Dockerfile*"
      - "requirements*.txt"
      - ".github/workflows/ci.yml"
 
env:
  ECR_REGISTRY: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com
  ECR_REPOSITORY: ai-serving
  IMAGE_TAG: ${{ github.sha }}
 
jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: "pip"
 
      - name: Install dependencies
        run: pip install -r requirements-dev.txt
 
      - name: Run linter
        run: |
          ruff check src/
          ruff format --check src/
 
      - name: Run type checker
        run: mypy src/ --strict
 
      - name: Run unit tests
        run: pytest tests/unit/ -v --cov=src --cov-report=xml
 
      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.xml
 
  build-image:
    runs-on: ubuntu-latest
    needs: lint-and-test
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Login to ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}"
          format: "sarif"
          output: "trivy-results.sarif"
          severity: "CRITICAL,HIGH"

모델 평가 워크플로우

모델이 변경될 때(새 모델 도입, 양자화 적용, 파인튜닝 등) 자동으로 품질을 검증하는 워크플로우입니다.

.github/workflows/model-eval.yml

yaml

name: Model Evaluation
 
on:
  pull_request:
    branches: [main]
    paths:
      - "configs/model-config.yaml"
      - "evals/**"
 
  workflow_dispatch:
    inputs:
      model_name:
        description: "Model to evaluate"
        required: true
        default: "meta-llama/Llama-3.1-8B-Instruct"
 
jobs:
  evaluate:
    runs-on: [self-hosted, gpu]
    timeout-minutes: 60
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
 
      - name: Install evaluation dependencies
        run: pip install -r requirements-eval.txt
 
      - name: Start vLLM server
        run: |
          MODEL_NAME="${{ github.event.inputs.model_name || 'meta-llama/Llama-3.1-8B-Instruct' }}"
          vllm serve "$MODEL_NAME" \
            --host 0.0.0.0 \
            --port 8000 \
            --max-model-len 4096 &
 
          # 서버가 준비될 때까지 대기
          echo "Waiting for vLLM server..."
          for i in $(seq 1 60); do
            if curl -sf http://localhost:8000/health; then
              echo "Server is ready"
              break
            fi
            sleep 5
          done
 
      - name: Run evaluation suite
        run: |
          python evals/run_evaluation.py \
            --server-url http://localhost:8000/v1 \
            --eval-set evals/datasets/standard.jsonl \
            --output-dir eval-results/
 
      - name: Check quality gates
        run: |
          python evals/check_gates.py \
            --results eval-results/results.json \
            --thresholds evals/thresholds.yaml
 
      - name: Post evaluation results to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(
              fs.readFileSync('eval-results/results.json', 'utf8')
            );
            const body = [
              '## Model Evaluation Results',
              '',
              '| Metric | Score | Threshold | Status |',
              '|--------|-------|-----------|--------|',
              ...results.metrics.map(m =>
                `| ${m.name} | ${m.score.toFixed(3)} | ${m.threshold} | ${m.score >= m.threshold ? 'PASS' : 'FAIL'} |`
              ),
              '',
              `Overall: ${results.passed ? 'PASSED' : 'FAILED'}`,
            ].join('\n');
 
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });
 
      - name: Upload evaluation artifacts
        uses: actions/upload-artifact@v4
        with:
          name: eval-results
          path: eval-results/

CD 워크플로우

.github/workflows/cd.yml

yaml

name: CD
 
on:
  push:
    branches: [main]
    paths:
      - "src/**"
      - "Dockerfile*"
      - "k8s/**"
 
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        type: choice
        options:
          - staging
          - production
      image_tag:
        description: "Image tag to deploy"
        required: true
 
jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Set image tag
        id: image
        run: |
          TAG="${{ github.event.inputs.image_tag || github.sha }}"
          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
 
      - name: Deploy to staging
        run: |
          cd k8s/overlays/staging
          kustomize edit set image \
            "ai-serving=$ECR_REGISTRY/$ECR_REPOSITORY:${{ steps.image.outputs.tag }}"
          kustomize build . | kubectl apply -f -
 
      - name: Wait for rollout
        run: |
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving-staging \
            --timeout=600s
 
      - name: Run smoke tests
        run: |
          ENDPOINT=$(kubectl get svc vllm-service \
            -n ai-serving-staging \
            -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
          python tests/smoke/test_serving.py --endpoint "http://$ENDPOINT"
 
  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment: production
    if: github.event.inputs.environment == 'production' || github.ref == 'refs/heads/main'
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Deploy to production (canary)
        run: |
          cd k8s/overlays/production
          kustomize edit set image \
            "ai-serving=$ECR_REGISTRY/$ECR_REPOSITORY:${{ github.sha }}"
 
          # 카나리 배포: 먼저 1개 Pod만 업데이트
          kubectl apply -f canary-deployment.yaml
          kubectl rollout status deployment/vllm-llama-canary \
            -n ai-serving \
            --timeout=600s
 
      - name: Monitor canary (5 minutes)
        run: |
          echo "Monitoring canary deployment for 5 minutes..."
          python scripts/monitor_canary.py \
            --namespace ai-serving \
            --canary-deployment vllm-llama-canary \
            --stable-deployment vllm-llama \
            --duration 300 \
            --error-threshold 0.01
 
      - name: Promote canary to stable
        run: |
          cd k8s/overlays/production
          kustomize build . | kubectl apply -f -
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving \
            --timeout=600s
 
          # 카나리 정리
          kubectl delete deployment vllm-llama-canary \
            -n ai-serving \
            --ignore-not-found
 
      - name: Create deployment record
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.repos.createDeployment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: context.sha,
              environment: 'production',
              auto_merge: false,
              required_contexts: [],
              description: 'AI serving deployment'
            });

Info

Kustomize로 환경별 설정 관리

디렉토리 구조

text

k8s/
  base/
    deployment.yaml
    service.yaml
    hpa.yaml
    kustomization.yaml
  overlays/
    staging/
      kustomization.yaml
      patches/
        deployment-patch.yaml
    production/
      kustomization.yaml
      patches/
        deployment-patch.yaml
      canary-deployment.yaml

베이스 설정

k8s/base/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
 
commonLabels:
  app: vllm
  managed-by: kustomize

스테이징 오버레이

k8s/overlays/staging/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: ai-serving-staging
 
resources:
  - ../../base
 
patches:
  - path: patches/deployment-patch.yaml
 
images:
  - name: ai-serving
    newName: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/ai-serving
    newTag: latest

k8s/overlays/staging/patches/deployment-patch.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: vllm
          resources:
            limits:
              nvidia.com/gpu: 1
            requests:
              nvidia.com/gpu: 1
              cpu: "2"
              memory: "16Gi"

프로덕션 오버레이

k8s/overlays/production/kustomization.yaml

yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: ai-serving
 
resources:
  - ../../base
 
patches:
  - path: patches/deployment-patch.yaml
 
images:
  - name: ai-serving
    newName: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/ai-serving
    newTag: latest

k8s/overlays/production/patches/deployment-patch.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: vllm
          resources:
            limits:
              nvidia.com/gpu: 1
            requests:
              nvidia.com/gpu: 1
              cpu: "4"
              memory: "24Gi"

GitOps 패턴

Argo CD를 활용한 GitOps

Argo CD 설치

bash

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

argocd-application.yaml

yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ai-serving-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/ai-serving-infra.git
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: ai-serving
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m
        factor: 2

text

GitOps 배포 흐름:
 
1. 개발자가 코드 변경 PR 생성
2. CI: 테스트 + 이미지 빌드 + ECR 푸시
3. CI: 인프라 리포의 이미지 태그 업데이트 PR 자동 생성
4. 리뷰어가 인프라 PR 승인 및 머지
5. Argo CD: Git 변경 감지 --> Kubernetes 배포 동기화
6. Argo CD: 헬스 체크 통과 확인

롤백 전략

자동 롤백

배포 후 헬스 체크가 실패하면 자동으로 이전 버전으로 롤백해야 합니다.

rollback-on-failure.yml (GitHub Actions job)

yaml

  rollback:
    runs-on: ubuntu-latest
    needs: deploy-production
    if: failure()
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          aws-region: ap-northeast-2
 
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name ai-serving-cluster
 
      - name: Rollback deployment
        run: |
          kubectl rollout undo deployment/vllm-llama -n ai-serving
          kubectl rollout status deployment/vllm-llama \
            -n ai-serving \
            --timeout=600s
 
      - name: Notify rollback
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'Deployment failed and was automatically rolled back.'
            });

수동 롤백

수동 롤백 명령어

bash

# 롤백 이력 확인
kubectl rollout history deployment/vllm-llama -n ai-serving
 
# 직전 버전으로 롤백
kubectl rollout undo deployment/vllm-llama -n ai-serving
 
# 특정 리비전으로 롤백
kubectl rollout undo deployment/vllm-llama -n ai-serving --to-revision=3
 
# 롤백 상태 확인
kubectl rollout status deployment/vllm-llama -n ai-serving