2026년 2월 20일·인프라·

7장: Collector 아키텍처와 배포

OTel Collector의 Receiver/Processor/Exporter 파이프라인, 핵심 프로세서 활용법, Kubernetes 환경에서의 DaemonSet/Deployment 배포를 학습합니다.

13분767자8개 섹션

monitoring observability

opentelemetry7 / 11

1 2 3 4 5 6 7 8 9 10 11

이전6장: OTel SDK 계측 실전 다음8장: Grafana, Jaeger, Prometheus 연동

학습 목표

Collector의 Receiver/Processor/Exporter 파이프라인 구조를 이해합니다
핵심 프로세서(batch, memory_limiter, filter, attributes, tail_sampling)를 활용합니다
다양한 Receiver와 Exporter의 역할을 파악합니다
Agent와 Gateway 배포 패턴을 Kubernetes에서 구현합니다
Helm 차트를 활용한 Collector 배포를 실습합니다

Collector란

OTel Collector는 텔레메트리 데이터를 수집, 처리, 전송하는 독립 실행 프로세스입니다. 애플리케이션과 백엔드 사이에 위치하여 데이터 파이프라인 역할을 합니다.

Collector를 사용하는 이유는 다음과 같습니다.

관심사 분리 — 애플리케이션은 텔레메트리 생성에만 집중하고, 라우팅/변환/필터링은 Collector가 담당
백엔드 독립 — 애플리케이션 재배포 없이 Collector 설정만으로 백엔드 변경
데이터 처리 — 배치, 재시도, 필터링, 속성 변환, 샘플링을 중앙에서 수행
프로토콜 변환 — OTLP로 수신하여 Prometheus, Jaeger, Loki 등 각 백엔드 형식으로 변환

파이프라인 구조

Collector의 파이프라인은 세 가지 구성 요소로 이루어집니다.

Receiver — 데이터 수신

Receiver는 외부에서 텔레메트리 데이터를 수신하는 입구입니다. 다양한 프로토콜과 형식을 지원합니다.

Receiver	수신 형식	용도
`otlp`	OTLP gRPC/HTTP	OTel SDK에서 직접 수신 (기본)
`prometheus`	Prometheus scrape	기존 Prometheus 타겟 스크레이핑
`jaeger`	Jaeger Thrift/gRPC	기존 Jaeger 클라이언트 호환
`zipkin`	Zipkin JSON	Zipkin 클라이언트 호환
`filelog`	파일 로그	로그 파일 tail
`hostmetrics`	시스템 메트릭	CPU, 메모리, 디스크, 네트워크
`k8s_events`	Kubernetes 이벤트	클러스터 이벤트 수집

receivers-config.yaml

yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
  
  prometheus:
    config:
      scrape_configs:
        - job_name: "node-exporters"
          scrape_interval: 15s
          static_configs:
            - targets: ["node-exporter:9100"]
  
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      memory: {}
      disk: {}
      network: {}

Processor — 데이터 처리

Processor는 수신된 데이터를 가공하는 중간 단계입니다. 여러 프로세서를 체이닝하여 복잡한 처리 파이프라인을 구성합니다.

Exporter — 데이터 전송

Exporter는 처리된 데이터를 최종 백엔드로 전송하는 출구입니다.

Exporter	전송 대상	용도
`otlp`	OTLP 호환 백엔드	Tempo, Jaeger, Datadog 등
`prometheus`	Prometheus scrape 엔드포인트	메트릭 노출
`prometheusremotewrite`	Prometheus Remote Write	메트릭 push
`loki`	Grafana Loki	로그 저장
`debug`	콘솔 출력	디버깅용

핵심 프로세서

batch — 배치 처리

가장 기본적이면서 필수적인 프로세서입니다. 개별 텔레메트리 데이터를 묶어서 한 번에 전송하여 네트워크 효율을 높입니다.

batch-processor.yaml

yaml

processors:
  batch:
    timeout: 1s              # 최대 대기 시간
    send_batch_size: 1024    # 배치당 최대 항목 수
    send_batch_max_size: 2048  # 배치 최대 크기 제한

Info

batch 프로세서는 거의 모든 파이프라인에서 필수입니다. 없으면 각 스팬/메트릭이 개별 네트워크 요청으로 전송되어 백엔드에 과도한 부하를 줍니다. 프로세서 체인의 마지막(Exporter 직전)에 배치하는 것이 일반적입니다.

memory_limiter — 메모리 보호

Collector의 메모리 사용량을 제한하여 OOM(Out of Memory) 장애를 방지합니다.

memory-limiter.yaml

yaml

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512          # 메모리 상한 (512MB)
    spike_limit_mib: 128    # 급격한 증가 허용량

메모리 한도에 도달하면 프로세서는 데이터를 거부(reject)하고, Receiver가 SDK에 재전송을 요청합니다. 프로세서 체인의 첫 번째에 배치하여 다른 프로세서가 메모리를 과도하게 사용하기 전에 차단해야 합니다.

filter — 불필요한 데이터 제거

조건에 따라 텔레메트리 데이터를 필터링합니다.

filter-processor.yaml

yaml

processors:
  filter/traces:
    traces:
      span:
        # 헬스 체크 스팬 제거
        - 'attributes["http.target"] == "/healthz"'
        - 'attributes["http.target"] == "/readyz"'
        # 정적 리소스 스팬 제거
        - 'IsMatch(attributes["http.target"], "/static/.*")'
  
  filter/metrics:
    metrics:
      metric:
        # 특정 메트릭 제거
        - 'name == "http.server.active_requests" and resource.attributes["service.name"] == "debug-service"'
  
  filter/logs:
    logs:
      log_record:
        # DEBUG 로그 제거
        - 'severity_number < 9'

attributes — 속성 조작

속성을 추가, 수정, 삭제, 해싱할 수 있습니다.

attributes-processor.yaml

yaml

processors:
  attributes/insert:
    actions:
      # 환경 정보 추가
      - key: deployment.cluster
        value: "production-kr"
        action: insert
      
      # 민감 정보 해싱
      - key: user.email
        action: hash
      
      # 불필요한 속성 삭제
      - key: http.user_agent
        action: delete
      
      # 속성값 변환
      - key: http.url
        pattern: "^(?P<host>[^/]+)/.*$"
        replacement: "\\1"
        action: extract

tail_sampling — 지능형 샘플링

트레이스가 완료된 후 전체 데이터를 분석하여 보존 여부를 결정합니다. 3장에서 학습한 Tail Sampling을 Collector에서 구현합니다.

tail-sampling.yaml

yaml

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      # 정책 1: 에러 트레이스 100% 보존
      - name: error-traces
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      # 정책 2: 느린 트레이스 보존 (1초 이상)
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000
      
      # 정책 3: 특정 서비스의 트레이스 100% 보존
      - name: critical-service
        type: string_attribute
        string_attribute:
          key: service.name
          values: [payment-service, auth-service]
      
      # 정책 4: 나머지는 10% 확률 샘플링
      - name: default
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Warning

tail_sampling 프로세서는 반드시 Gateway 모드의 Collector에서 실행해야 합니다. 특정 트레이스의 모든 스팬이 하나의 Collector 인스턴스로 라우팅되어야 올바른 샘플링 결정을 내릴 수 있습니다. 여러 Gateway 인스턴스를 운영한다면 loadbalancing 익스포터로 Trace ID 기반 라우팅을 구성하세요.

완전한 파이프라인 구성

3대 신호를 모두 처리하는 완전한 Collector 설정 예시입니다.

otel-collector-full.yaml

yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
    spike_limit_mib: 256
 
  filter/healthcheck:
    traces:
      span:
        - 'attributes["http.target"] == "/healthz"'
    logs:
      log_record:
        - 'IsMatch(body, ".*healthcheck.*")'
 
  attributes/common:
    actions:
      - key: deployment.cluster
        value: "production-kr"
        action: insert
 
  batch:
    timeout: 1s
    send_batch_size: 1024
 
exporters:
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true
 
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
 
  otlp/loki:
    endpoint: "loki:3100"
    tls:
      insecure: true
 
  debug:
    verbosity: basic
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, filter/healthcheck, attributes/common, batch]
      exporters: [otlp/tempo]
 
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, attributes/common, batch]
      exporters: [prometheusremotewrite]
 
    logs:
      receivers: [otlp]
      processors: [memory_limiter, filter/healthcheck, attributes/common, batch]
      exporters: [otlp/loki]
 
  telemetry:
    logs:
      level: info
    metrics:
      address: ":8888"

프로세서 순서가 중요합니다.

memory_limiter — 가장 먼저 실행하여 메모리 보호
filter — 불필요한 데이터를 일찍 제거하여 이후 처리 부하 감소
attributes — 속성 추가/변환
batch — 마지막에 배치 처리하여 효율적 전송

Kubernetes 배포

Agent 모드 — DaemonSet

각 노드에 하나의 Collector를 배포하여 해당 노드의 모든 Pod에서 텔레메트리를 수집합니다.

collector-daemonset.yaml

yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-collector-agent
  template:
    metadata:
      labels:
        app: otel-collector-agent
    spec:
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.100.0
          args: ["--config=/etc/otelcol/config.yaml"]
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          volumeMounts:
            - name: config
              mountPath: /etc/otelcol
      volumes:
        - name: config
          configMap:
            name: otel-agent-config

Gateway 모드 — Deployment

중앙 집중식 Collector를 Deployment로 배포합니다.

collector-deployment.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-collector-gateway
  template:
    metadata:
      labels:
        app: otel-collector-gateway
    spec:
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.100.0
          args: ["--config=/etc/otelcol/config.yaml"]
          ports:
            - containerPort: 4317
            - containerPort: 4318
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 2
              memory: 4Gi
          livenessProbe:
            httpGet:
              path: /
              port: 13133
          readinessProbe:
            httpGet:
              path: /
              port: 13133
      volumes:
        - name: config
          configMap:
            name: otel-gateway-config
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector-gateway
  namespace: observability
spec:
  selector:
    app: otel-collector-gateway
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
    - name: otlp-http
      port: 4318
      targetPort: 4318

Helm 차트 배포

Helm으로 Collector 배포

bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
 
# Agent 모드 (DaemonSet)
helm install otel-agent open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=daemonset \
    --set config.receivers.otlp.protocols.grpc.endpoint="0.0.0.0:4317" \
    --values agent-values.yaml
 
# Gateway 모드 (Deployment)
helm install otel-gateway open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=deployment \
    --set replicaCount=3 \
    --values gateway-values.yaml

Tip

Kubernetes 환경에서는 OpenTelemetry Operator를 사용하면 CRD(Custom Resource Definition)로 Collector를 선언적으로 관리할 수 있습니다. 또한 Operator는 Pod에 자동 계측을 주입하는 기능도 제공하여, 애플리케이션 이미지를 변경하지 않고도 OTel SDK를 적용할 수 있습니다.

Collector 모니터링

Collector 자체의 건강 상태를 모니터링하는 것도 중요합니다.

collector-self-monitoring.yaml

yaml

service:
  telemetry:
    logs:
      level: info
    metrics:
      address: ":8888"  # Prometheus 형식으로 자체 메트릭 노출

모니터링할 핵심 메트릭입니다.

메트릭	설명	알림 조건
`otelcol_receiver_accepted_spans`	수신된 스팬 수	급격한 감소 시
`otelcol_receiver_refused_spans`	거부된 스팬 수	0보다 클 때
`otelcol_exporter_send_failed_spans`	전송 실패 스팬 수	지속적 증가 시
`otelcol_processor_batch_timeout_trigger_send`	타임아웃으로 배치 전송	빈번한 발생 시
`otelcol_process_memory_rss`	메모리 사용량	limit에 근접 시

이번 장에서는 OTel Collector의 내부 구조와 핵심 구성 요소를 깊이 있게 다루었습니다. Receiver, Processor, Exporter로 이루어진 파이프라인을 설계하고, memory_limiter, filter, attributes, batch 같은 핵심 프로세서의 활용법을 학습했습니다. Kubernetes에서 DaemonSet(Agent)과 Deployment(Gateway)를 조합한 배포 전략과 Helm 차트 활용법도 실습했습니다.

다음 장에서는 Grafana, Jaeger, Prometheus와의 백엔드 연동을 다룹니다. Docker Compose로 전체 관측 가능성 스택을 구축하고, Exemplars를 통한 메트릭-트레이스 연결을 실제로 확인합니다.

이 글이 도움이 되셨나요?

인프라

8장: Grafana, Jaeger, Prometheus 연동

Jaeger로 분산 추적을 시각화하고, Prometheus로 메트릭을 저장/쿼리하며, Grafana로 통합 대시보드를 구성합니다. Docker Compose로 전체 스택을 실습합니다.

2026년 2월 22일·12분

인프라

6장: OTel SDK 계측 실전

자동 계측과 수동 계측의 차이를 이해하고, Python/Node.js/Go 각 언어별 SDK 활용법과 커스텀 스팬/메트릭 생성을 실습합니다.

2026년 2월 18일·11분

인프라

9장: AI 서비스 관측 가능성

LLM 호출 추적, 토큰 사용량/비용 모니터링, AI 에이전트 행동 추적, LangChain/LlamaIndex OTel 통합을 통한 AI 관측 가능성을 학습합니다.

2026년 2월 24일·12분

2026년 2월 20일·인프라·

7장: Collector 아키텍처와 배포

OTel Collector의 Receiver/Processor/Exporter 파이프라인, 핵심 프로세서 활용법, Kubernetes 환경에서의 DaemonSet/Deployment 배포를 학습합니다.

13분767자8개 섹션

monitoring observability

opentelemetry7 / 11

1 2 3 4 5 6 7 8 9 10 11

이전6장: OTel SDK 계측 실전 다음8장: Grafana, Jaeger, Prometheus 연동

학습 목표

Collector의 Receiver/Processor/Exporter 파이프라인 구조를 이해합니다
핵심 프로세서(batch, memory_limiter, filter, attributes, tail_sampling)를 활용합니다
다양한 Receiver와 Exporter의 역할을 파악합니다
Agent와 Gateway 배포 패턴을 Kubernetes에서 구현합니다
Helm 차트를 활용한 Collector 배포를 실습합니다

Collector란

Collector를 사용하는 이유는 다음과 같습니다.

관심사 분리 — 애플리케이션은 텔레메트리 생성에만 집중하고, 라우팅/변환/필터링은 Collector가 담당
백엔드 독립 — 애플리케이션 재배포 없이 Collector 설정만으로 백엔드 변경
데이터 처리 — 배치, 재시도, 필터링, 속성 변환, 샘플링을 중앙에서 수행
프로토콜 변환 — OTLP로 수신하여 Prometheus, Jaeger, Loki 등 각 백엔드 형식으로 변환

파이프라인 구조

Collector의 파이프라인은 세 가지 구성 요소로 이루어집니다.

Receiver — 데이터 수신

Receiver는 외부에서 텔레메트리 데이터를 수신하는 입구입니다. 다양한 프로토콜과 형식을 지원합니다.

Receiver	수신 형식	용도
`otlp`	OTLP gRPC/HTTP	OTel SDK에서 직접 수신 (기본)
`prometheus`	Prometheus scrape	기존 Prometheus 타겟 스크레이핑
`jaeger`	Jaeger Thrift/gRPC	기존 Jaeger 클라이언트 호환
`zipkin`	Zipkin JSON	Zipkin 클라이언트 호환
`filelog`	파일 로그	로그 파일 tail
`hostmetrics`	시스템 메트릭	CPU, 메모리, 디스크, 네트워크
`k8s_events`	Kubernetes 이벤트	클러스터 이벤트 수집

receivers-config.yaml

yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
  
  prometheus:
    config:
      scrape_configs:
        - job_name: "node-exporters"
          scrape_interval: 15s
          static_configs:
            - targets: ["node-exporter:9100"]
  
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      memory: {}
      disk: {}
      network: {}

Processor — 데이터 처리

Processor는 수신된 데이터를 가공하는 중간 단계입니다. 여러 프로세서를 체이닝하여 복잡한 처리 파이프라인을 구성합니다.

Exporter — 데이터 전송

Exporter는 처리된 데이터를 최종 백엔드로 전송하는 출구입니다.

Exporter	전송 대상	용도
`otlp`	OTLP 호환 백엔드	Tempo, Jaeger, Datadog 등
`prometheus`	Prometheus scrape 엔드포인트	메트릭 노출
`prometheusremotewrite`	Prometheus Remote Write	메트릭 push
`loki`	Grafana Loki	로그 저장
`debug`	콘솔 출력	디버깅용

핵심 프로세서

batch — 배치 처리

가장 기본적이면서 필수적인 프로세서입니다. 개별 텔레메트리 데이터를 묶어서 한 번에 전송하여 네트워크 효율을 높입니다.

batch-processor.yaml

yaml

processors:
  batch:
    timeout: 1s              # 최대 대기 시간
    send_batch_size: 1024    # 배치당 최대 항목 수
    send_batch_max_size: 2048  # 배치 최대 크기 제한

Info

memory_limiter — 메모리 보호

Collector의 메모리 사용량을 제한하여 OOM(Out of Memory) 장애를 방지합니다.

memory-limiter.yaml

yaml

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512          # 메모리 상한 (512MB)
    spike_limit_mib: 128    # 급격한 증가 허용량

filter — 불필요한 데이터 제거

조건에 따라 텔레메트리 데이터를 필터링합니다.

filter-processor.yaml

yaml

processors:
  filter/traces:
    traces:
      span:
        # 헬스 체크 스팬 제거
        - 'attributes["http.target"] == "/healthz"'
        - 'attributes["http.target"] == "/readyz"'
        # 정적 리소스 스팬 제거
        - 'IsMatch(attributes["http.target"], "/static/.*")'
  
  filter/metrics:
    metrics:
      metric:
        # 특정 메트릭 제거
        - 'name == "http.server.active_requests" and resource.attributes["service.name"] == "debug-service"'
  
  filter/logs:
    logs:
      log_record:
        # DEBUG 로그 제거
        - 'severity_number < 9'

attributes — 속성 조작

속성을 추가, 수정, 삭제, 해싱할 수 있습니다.

attributes-processor.yaml

yaml

processors:
  attributes/insert:
    actions:
      # 환경 정보 추가
      - key: deployment.cluster
        value: "production-kr"
        action: insert
      
      # 민감 정보 해싱
      - key: user.email
        action: hash
      
      # 불필요한 속성 삭제
      - key: http.user_agent
        action: delete
      
      # 속성값 변환
      - key: http.url
        pattern: "^(?P<host>[^/]+)/.*$"
        replacement: "\\1"
        action: extract

tail_sampling — 지능형 샘플링

트레이스가 완료된 후 전체 데이터를 분석하여 보존 여부를 결정합니다. 3장에서 학습한 Tail Sampling을 Collector에서 구현합니다.

tail-sampling.yaml

yaml

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      # 정책 1: 에러 트레이스 100% 보존
      - name: error-traces
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      # 정책 2: 느린 트레이스 보존 (1초 이상)
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000
      
      # 정책 3: 특정 서비스의 트레이스 100% 보존
      - name: critical-service
        type: string_attribute
        string_attribute:
          key: service.name
          values: [payment-service, auth-service]
      
      # 정책 4: 나머지는 10% 확률 샘플링
      - name: default
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Warning

완전한 파이프라인 구성

3대 신호를 모두 처리하는 완전한 Collector 설정 예시입니다.

otel-collector-full.yaml

yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
    spike_limit_mib: 256
 
  filter/healthcheck:
    traces:
      span:
        - 'attributes["http.target"] == "/healthz"'
    logs:
      log_record:
        - 'IsMatch(body, ".*healthcheck.*")'
 
  attributes/common:
    actions:
      - key: deployment.cluster
        value: "production-kr"
        action: insert
 
  batch:
    timeout: 1s
    send_batch_size: 1024
 
exporters:
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true
 
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
 
  otlp/loki:
    endpoint: "loki:3100"
    tls:
      insecure: true
 
  debug:
    verbosity: basic
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, filter/healthcheck, attributes/common, batch]
      exporters: [otlp/tempo]
 
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, attributes/common, batch]
      exporters: [prometheusremotewrite]
 
    logs:
      receivers: [otlp]
      processors: [memory_limiter, filter/healthcheck, attributes/common, batch]
      exporters: [otlp/loki]
 
  telemetry:
    logs:
      level: info
    metrics:
      address: ":8888"

프로세서 순서가 중요합니다.

memory_limiter — 가장 먼저 실행하여 메모리 보호
filter — 불필요한 데이터를 일찍 제거하여 이후 처리 부하 감소
attributes — 속성 추가/변환
batch — 마지막에 배치 처리하여 효율적 전송

Kubernetes 배포

Agent 모드 — DaemonSet

각 노드에 하나의 Collector를 배포하여 해당 노드의 모든 Pod에서 텔레메트리를 수집합니다.

collector-daemonset.yaml

yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-collector-agent
  template:
    metadata:
      labels:
        app: otel-collector-agent
    spec:
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.100.0
          args: ["--config=/etc/otelcol/config.yaml"]
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          volumeMounts:
            - name: config
              mountPath: /etc/otelcol
      volumes:
        - name: config
          configMap:
            name: otel-agent-config

Gateway 모드 — Deployment

중앙 집중식 Collector를 Deployment로 배포합니다.

collector-deployment.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-collector-gateway
  template:
    metadata:
      labels:
        app: otel-collector-gateway
    spec:
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.100.0
          args: ["--config=/etc/otelcol/config.yaml"]
          ports:
            - containerPort: 4317
            - containerPort: 4318
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 2
              memory: 4Gi
          livenessProbe:
            httpGet:
              path: /
              port: 13133
          readinessProbe:
            httpGet:
              path: /
              port: 13133
      volumes:
        - name: config
          configMap:
            name: otel-gateway-config
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector-gateway
  namespace: observability
spec:
  selector:
    app: otel-collector-gateway
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
    - name: otlp-http
      port: 4318
      targetPort: 4318

Helm 차트 배포

Helm으로 Collector 배포

bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
 
# Agent 모드 (DaemonSet)
helm install otel-agent open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=daemonset \
    --set config.receivers.otlp.protocols.grpc.endpoint="0.0.0.0:4317" \
    --values agent-values.yaml
 
# Gateway 모드 (Deployment)
helm install otel-gateway open-telemetry/opentelemetry-collector \
    --namespace observability \
    --set mode=deployment \
    --set replicaCount=3 \
    --values gateway-values.yaml

Tip

Collector 모니터링

Collector 자체의 건강 상태를 모니터링하는 것도 중요합니다.

collector-self-monitoring.yaml

yaml

service:
  telemetry:
    logs:
      level: info
    metrics:
      address: ":8888"  # Prometheus 형식으로 자체 메트릭 노출

모니터링할 핵심 메트릭입니다.

메트릭	설명	알림 조건
`otelcol_receiver_accepted_spans`	수신된 스팬 수	급격한 감소 시
`otelcol_receiver_refused_spans`	거부된 스팬 수	0보다 클 때
`otelcol_exporter_send_failed_spans`	전송 실패 스팬 수	지속적 증가 시
`otelcol_processor_batch_timeout_trigger_send`	타임아웃으로 배치 전송	빈번한 발생 시
`otelcol_process_memory_rss`	메모리 사용량	limit에 근접 시