2026년 3월 13일·AI / ML·

7장: 감사 로깅과 컴플라이언스

에이전트 행동 추적, 불변 감사 로그 설계, 규제 요구사항 대응, 설명 가능성, 재현 가능성, OpenTelemetry 통합, 보존 정책을 다룹니다.

16분1,128자11개 섹션

workflow ai automation

agentic-workflow7 / 10

1 2 3 4 5 6 7 8 9 10

이전6장: 상태 관리와 체크포인팅 다음8장: 엔터프라이즈 시스템 통합

이 장에서 배울 내용

에이전트 행동 추적과 결정 근거 기록의 필요성
불변 감사 로그의 설계와 구현
금융/의료/법률 분야의 규제 요구사항 대응
설명 가능성과 재현 가능성 확보 방법
OpenTelemetry 기반 관찰 가능성 통합
감사 로그 보존 정책 수립

왜 감사 로깅이 필요한가

전통적인 소프트웨어에서는 코드가 결정론적으로 동작하기 때문에 동일한 입력이 주어지면 동일한 경로를 따릅니다. 그러나 Agentic Workflow에서는 LLM이 확률적으로 판단하고, 도구 선택과 실행 순서가 동적으로 결정됩니다. 이러한 비결정성은 강력한 감사 체계 없이는 디버깅, 규제 준수, 품질 개선이 불가능합니다.

감사 로깅이 해결하는 핵심 질문들은 다음과 같습니다.

에이전트가 왜 이 결정을 내렸는가?
어떤 정보를 참고하여 판단했는가?
사람이 언제, 어떤 방식으로 개입했는가?
동일한 입력으로 결과를 재현할 수 있는가?
규제 기관에 결정 과정을 설명할 수 있는가?

에이전트 행동 추적

에이전트의 모든 행동을 체계적으로 기록하는 행동 추적(Action Tracking) 시스템을 구축합니다.

추적 대상

행동 기록 스키마

audit_schema.py

python

from pydantic import BaseModel
from datetime import datetime
from typing import Any
 
class AuditEntry(BaseModel):
    """감사 로그 항목"""
    # 식별
    entry_id: str
    workflow_id: str
    trace_id: str  # 분산 추적 ID
    span_id: str
 
    # 시간
    timestamp: datetime
    duration_ms: int | None = None
 
    # 행위자
    actor_type: str  # agent, human, system
    actor_id: str
 
    # 행동
    action_type: str  # llm_call, tool_call, decision, human_review
    action_detail: dict
 
    # 결과
    outcome: str  # success, failure, escalated
    outcome_detail: dict | None = None
 
    # 컨텍스트
    input_summary: str
    output_summary: str
    reasoning: str | None = None  # 결정 근거
    confidence: float | None = None
    alternatives_considered: list[dict] | None = None
 
class LLMCallAudit(BaseModel):
    """LLM 호출 감사 기록"""
    model: str
    provider: str
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    temperature: float
    system_prompt_hash: str  # 전문 대신 해시
    user_prompt_summary: str
    response_summary: str
    latency_ms: int
    cost_usd: float
 
class ToolCallAudit(BaseModel):
    """도구 호출 감사 기록"""
    tool_name: str
    tool_version: str
    input_params: dict
    output_data: dict
    side_effects: list[str]  # 외부 시스템 변경 사항
    latency_ms: int
    idempotency_key: str

불변 감사 로그

감사 로그는 한번 기록되면 수정이나 삭제가 불가능해야 합니다. 불변성(Immutability)은 로그의 신뢰성과 법적 증거 능력을 보장하는 핵심 속성입니다.

불변 로그 저장 전략

immutable_audit_log.py

python

import hashlib
import json
 
class ImmutableAuditLog:
    """체이닝 해시를 사용한 불변 감사 로그"""
 
    def __init__(self, store: AuditStore):
        self.store = store
 
    async def append(self, entry: AuditEntry) -> str:
        """감사 항목 추가 (체이닝 해시로 무결성 보장)"""
        # 이전 항목의 해시를 가져와 체이닝
        previous_hash = await self.store.get_latest_hash(entry.workflow_id)
 
        # 현재 항목의 해시 계산 (이전 해시 포함)
        entry_data = entry.model_dump_json()
        chain_input = f"{previous_hash}:{entry_data}"
        entry_hash = hashlib.sha256(chain_input.encode()).hexdigest()
 
        # 저장 (append-only 테이블)
        await self.store.insert(
            entry=entry,
            entry_hash=entry_hash,
            previous_hash=previous_hash,
        )
 
        return entry_hash
 
    async def verify_integrity(self, workflow_id: str) -> bool:
        """체이닝 해시를 검증하여 로그 무결성 확인"""
        entries = await self.store.get_all(workflow_id)
        previous_hash = "genesis"
 
        for entry, stored_hash in entries:
            entry_data = entry.model_dump_json()
            chain_input = f"{previous_hash}:{entry_data}"
            computed_hash = hashlib.sha256(chain_input.encode()).hexdigest()
 
            if computed_hash != stored_hash:
                return False
 
            previous_hash = stored_hash
 
        return True

Info

체이닝 해시(Chaining Hash)는 블록체인의 기본 원리와 동일합니다. 중간의 어떤 항목이 변경되면 이후 모든 해시가 불일치하게 되므로, 사후 조작을 탐지할 수 있습니다.

데이터베이스 수준의 불변성

PostgreSQL에서 감사 테이블의 불변성을 보장하는 방법입니다.

audit_table.sql

sql

-- 감사 로그 테이블
CREATE TABLE audit_log (
    entry_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    workflow_id UUID NOT NULL,
    trace_id VARCHAR(64) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    actor_type VARCHAR(20) NOT NULL,
    actor_id VARCHAR(100) NOT NULL,
    action_type VARCHAR(50) NOT NULL,
    action_detail JSONB NOT NULL,
    outcome VARCHAR(20) NOT NULL,
    reasoning TEXT,
    entry_hash VARCHAR(64) NOT NULL,
    previous_hash VARCHAR(64) NOT NULL
);
 
-- UPDATE, DELETE 차단 트리거
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
    RAISE EXCEPTION 'Audit log entries cannot be modified or deleted';
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER audit_immutable_update
    BEFORE UPDATE ON audit_log
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
 
CREATE TRIGGER audit_immutable_delete
    BEFORE DELETE ON audit_log
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
 
-- 인덱스
CREATE INDEX idx_audit_workflow ON audit_log (workflow_id, timestamp);
CREATE INDEX idx_audit_trace ON audit_log (trace_id);
CREATE INDEX idx_audit_actor ON audit_log (actor_id, timestamp);

규제 요구사항 대응

산업별로 에이전틱 자동화에 적용되는 규제 요구사항이 다릅니다.

산업별 요구사항

산업	주요 규제	에이전트 관련 요구사항
금융	Basel III, 전자금융감독규정	모든 거래 결정의 감사 추적, 모델 리스크 관리
의료	HIPAA, 의료법	환자 데이터 접근 기록, 진단 보조 결과의 근거 기록
법률	변호사법, 개인정보보호법	비밀유지, 이해충돌 검증, 법률 조언의 면책 고지
일반	GDPR, 개인정보보호법	자동화된 의사결정에 대한 설명 요구권, 이의 제기권

컴플라이언스 체크리스트

compliance_checker.py

python

@dataclass
class ComplianceRequirement:
    regulation: str
    requirement: str
    check_function: Callable
    severity: str  # critical, high, medium
 
class ComplianceChecker:
    def __init__(self, requirements: list[ComplianceRequirement]):
        self.requirements = requirements
 
    async def check_workflow(self, workflow_id: str) -> ComplianceReport:
        """워크플로우의 컴플라이언스 점검"""
        audit_entries = await self.audit_store.get_all(workflow_id)
        violations = []
 
        for req in self.requirements:
            result = await req.check_function(audit_entries)
            if not result.passed:
                violations.append(ComplianceViolation(
                    regulation=req.regulation,
                    requirement=req.requirement,
                    severity=req.severity,
                    detail=result.detail,
                ))
 
        return ComplianceReport(
            workflow_id=workflow_id,
            checked_at=datetime.utcnow(),
            total_requirements=len(self.requirements),
            violations=violations,
            passed=len([v for v in violations if v.severity == "critical"]) == 0,
        )
 
# 금융 서비스 컴플라이언스 요구사항 예시
financial_requirements = [
    ComplianceRequirement(
        regulation="전자금융감독규정",
        requirement="모든 금융 거래 결정에 대한 감사 추적 존재",
        check_function=check_financial_audit_trail,
        severity="critical",
    ),
    ComplianceRequirement(
        regulation="전자금융감독규정",
        requirement="고위험 거래에 대한 사람 승인 기록",
        check_function=check_human_approval_for_high_risk,
        severity="critical",
    ),
    ComplianceRequirement(
        regulation="개인정보보호법",
        requirement="개인정보 접근 기록 및 최소 수집 준수",
        check_function=check_pii_access_logging,
        severity="high",
    ),
]

설명 가능성

에이전트의 결정을 사후에 사람이 이해할 수 있도록 설명 가능성(Explainability)을 확보해야 합니다.

결정 근거 기록

decision_explanation.py

python

@dataclass
class DecisionExplanation:
    """에이전트 결정의 설명"""
    decision: str
    reasoning_steps: list[str]
    evidence: list[Evidence]
    alternatives: list[Alternative]
    confidence: float
    limitations: list[str]
 
@dataclass
class Evidence:
    source: str  # 참조한 문서/데이터
    relevance: float
    excerpt: str  # 핵심 발췌
 
@dataclass
class Alternative:
    option: str
    reason_not_chosen: str
    estimated_confidence: float
 
class ExplainableAgent:
    async def decide_with_explanation(
        self, context: dict
    ) -> tuple[str, DecisionExplanation]:
        """결정과 함께 설명을 생성"""
        # 에이전트의 추론 과정을 구조화하여 기록
        response = await self.llm.invoke(
            messages=[
                {"role": "system", "content": EXPLAINABLE_SYSTEM_PROMPT},
                {"role": "user", "content": json.dumps(context)},
            ],
            response_format=DecisionWithExplanation,
        )
 
        explanation = DecisionExplanation(
            decision=response.decision,
            reasoning_steps=response.reasoning_steps,
            evidence=[
                Evidence(source=e.source, relevance=e.relevance, excerpt=e.excerpt)
                for e in response.evidence
            ],
            alternatives=[
                Alternative(
                    option=a.option,
                    reason_not_chosen=a.reason,
                    estimated_confidence=a.confidence,
                )
                for a in response.alternatives
            ],
            confidence=response.confidence,
            limitations=response.limitations,
        )
 
        # 감사 로그에 설명 기록
        await self.audit_log.append(AuditEntry(
            action_type="decision",
            reasoning=json.dumps(explanation.__dict__, default=str),
            confidence=explanation.confidence,
        ))
 
        return response.decision, explanation

Warning

설명 가능성은 규제 요구사항일 뿐만 아니라 시스템 개선의 핵심 도구입니다. 에이전트가 잘못된 결정을 내렸을 때 "왜" 그런 판단을 했는지 이해해야 프롬프트나 도구를 개선할 수 있습니다.

재현 가능성

특정 시점의 에이전트 결정을 동일한 조건에서 다시 실행할 수 있는 재현 가능성(Reproducibility)을 확보합니다.

재현에 필요한 정보

재현을 위해서는 다음 정보를 모두 기록해야 합니다.

LLM 모델 버전과 파라미터 (temperature, seed 등)
시스템 프롬프트 전문 (또는 버전 해시)
입력 데이터 전체
도구 호출 결과 (외부 시스템의 응답은 시점에 따라 달라질 수 있으므로)
랜덤 시드 (적용 가능한 경우)

reproducibility.py

python

@dataclass
class ReproductionContext:
    """결정 재현을 위한 컨텍스트"""
    model_id: str
    model_version: str
    temperature: float
    seed: int | None
    system_prompt_version: str
    system_prompt_content: str
    input_data: dict
    tool_responses: dict[str, Any]  # 도구 호출 결과 스냅샷
    timestamp: datetime
 
class ReproductionManager:
    async def capture(self, workflow_id: str, node: str) -> ReproductionContext:
        """현재 실행 컨텍스트를 캡처"""
        # ... 현재 상태 캡처 ...
 
    async def replay(self, context: ReproductionContext) -> ReplayResult:
        """캡처된 컨텍스트로 결정을 재현"""
        # 도구 호출을 실제 실행 대신 캡처된 결과로 대체
        mock_tools = MockToolSet(context.tool_responses)
 
        response = await self.llm.invoke(
            model=context.model_id,
            messages=[
                {"role": "system", "content": context.system_prompt_content},
                {"role": "user", "content": json.dumps(context.input_data)},
            ],
            temperature=context.temperature,
            seed=context.seed,
            tools=mock_tools,
        )
 
        return ReplayResult(
            original_timestamp=context.timestamp,
            replay_timestamp=datetime.utcnow(),
            output=response,
        )

OpenTelemetry 통합

OpenTelemetry(OTel)를 사용하면 분산 추적, 메트릭, 로그를 표준화된 방식으로 수집할 수 있습니다.

otel_integration.py

python

from opentelemetry import trace
from opentelemetry.trace import StatusCode
 
tracer = trace.get_tracer("agentic-workflow")
 
class InstrumentedWorkflowRunner:
    async def run_node(self, node: str, state: dict) -> dict:
        with tracer.start_as_current_span(
            f"workflow.node.{node}",
            attributes={
                "workflow.id": state["workflow_id"],
                "workflow.node": node,
                "workflow.phase": state["phase"],
            },
        ) as span:
            try:
                result = await self.execute_node(node, state)
                span.set_status(StatusCode.OK)
                span.set_attribute("workflow.node.output_size", len(str(result)))
                return result
            except Exception as e:
                span.set_status(StatusCode.ERROR, str(e))
                span.record_exception(e)
                raise
 
    async def call_llm(self, messages: list, model: str) -> str:
        with tracer.start_as_current_span(
            "llm.call",
            attributes={
                "llm.model": model,
                "llm.input_tokens": self._count_tokens(messages),
            },
        ) as span:
            response = await self.llm.invoke(messages, model=model)
            span.set_attribute("llm.output_tokens", response.usage.completion_tokens)
            span.set_attribute("llm.total_tokens", response.usage.total_tokens)
            span.set_attribute("llm.latency_ms", response.latency_ms)
            return response.content

보존 정책

감사 로그의 보존 기간과 아카이브 전략을 규제 요구사항에 맞게 수립합니다.

retention_policy.py

python

@dataclass
class RetentionPolicy:
    hot_storage_days: int    # 빠른 조회 가능 (PostgreSQL)
    warm_storage_days: int   # 조회 가능하나 느림 (S3 Standard)
    cold_storage_days: int   # 아카이브 (S3 Glacier)
    total_retention_days: int
 
# 산업별 보존 정책 예시
RETENTION_POLICIES = {
    "financial": RetentionPolicy(
        hot_storage_days=90,
        warm_storage_days=365,
        cold_storage_days=2555,  # 7년
        total_retention_days=2555,
    ),
    "healthcare": RetentionPolicy(
        hot_storage_days=90,
        warm_storage_days=365,
        cold_storage_days=3650,  # 10년
        total_retention_days=3650,
    ),
    "general": RetentionPolicy(
        hot_storage_days=30,
        warm_storage_days=180,
        cold_storage_days=365,
        total_retention_days=365,
    ),
}

Tip

보존 정책은 규제 요구사항뿐만 아니라 비용도 고려해야 합니다. Hot 저장소(PostgreSQL)는 비용이 높지만 즉시 조회가 가능하고, Cold 저장소(S3 Glacier)는 저비용이지만 복원에 수 시간이 소요됩니다. 조회 빈도에 따라 계층을 구분하는 것이 효율적입니다.

정리

이 장에서는 Agentic Workflow의 감사 로깅과 컴플라이언스를 다루었습니다.

에이전트의 입력, 추론, 도구 호출, 결정, 출력, 사람 개입을 모두 추적합니다
체이닝 해시로 감사 로그의 불변성과 무결성을 보장합니다
산업별 규제 요구사항에 맞는 컴플라이언스 체크를 자동화합니다
결정 근거 기록으로 설명 가능성을, 실행 컨텍스트 캡처로 재현 가능성을 확보합니다
OpenTelemetry로 분산 추적과 메트릭을 표준화합니다

다음 장 예고

8장에서는 Agentic Workflow를 기존 엔터프라이즈 시스템과 통합하는 방법을 다룹니다. ERP/CRM/ITSM 연동, MCP 기반 도구 통합, 이벤트 드리븐 아키텍처, 레거시 시스템 어댑터 등 실제 기업 환경에서의 통합 패턴을 살펴보겠습니다.

이 글이 도움이 되셨나요?

AI / ML

8장: 엔터프라이즈 시스템 통합

ERP/CRM/ITSM 연동, MCP 기반 도구 통합, API 게이트웨이, 이벤트 드리븐 통합, 레거시 시스템 어댑터, 트랜잭션 경계 설계를 다룹니다.

2026년 3월 15일·16분

AI / ML

6장: 상태 관리와 체크포인팅

Agentic Workflow의 상태 모델, 이벤트 소싱, 체크포인트 저장소 선택, 멱등성 보장, 상태 복원과 버전 마이그레이션, 분산 상태 일관성 전략을 다룹니다.

2026년 3월 11일·16분

AI / ML

9장: 보안과 거버넌스

최소 권한 원칙, 도구별 권한 제어, 비밀 관리, 입출력 검증, 비용 제어, 에이전트 거버넌스 프레임워크, 위험 평가, 모니터링과 알림 전략을 다룹니다.

2026년 3월 17일·17분

2026년 3월 13일·AI / ML·

7장: 감사 로깅과 컴플라이언스

에이전트 행동 추적, 불변 감사 로그 설계, 규제 요구사항 대응, 설명 가능성, 재현 가능성, OpenTelemetry 통합, 보존 정책을 다룹니다.

16분1,128자11개 섹션

workflow ai automation

agentic-workflow7 / 10

1 2 3 4 5 6 7 8 9 10

이전6장: 상태 관리와 체크포인팅 다음8장: 엔터프라이즈 시스템 통합

이 장에서 배울 내용

에이전트 행동 추적과 결정 근거 기록의 필요성
불변 감사 로그의 설계와 구현
금융/의료/법률 분야의 규제 요구사항 대응
설명 가능성과 재현 가능성 확보 방법
OpenTelemetry 기반 관찰 가능성 통합
감사 로그 보존 정책 수립

왜 감사 로깅이 필요한가

감사 로깅이 해결하는 핵심 질문들은 다음과 같습니다.

에이전트가 왜 이 결정을 내렸는가?
어떤 정보를 참고하여 판단했는가?
사람이 언제, 어떤 방식으로 개입했는가?
동일한 입력으로 결과를 재현할 수 있는가?
규제 기관에 결정 과정을 설명할 수 있는가?

from pydantic import BaseModel
from datetime import datetime
from typing import Any
 
class AuditEntry(BaseModel):
    """감사 로그 항목"""
    # 식별
    entry_id: str
    workflow_id: str
    trace_id: str  # 분산 추적 ID
    span_id: str
 
    # 시간
    timestamp: datetime
    duration_ms: int | None = None
 
    # 행위자
    actor_type: str  # agent, human, system
    actor_id: str
 
    # 행동
    action_type: str  # llm_call, tool_call, decision, human_review
    action_detail: dict
 
    # 결과
    outcome: str  # success, failure, escalated
    outcome_detail: dict | None = None
 
    # 컨텍스트
    input_summary: str
    output_summary: str
    reasoning: str | None = None  # 결정 근거
    confidence: float | None = None
    alternatives_considered: list[dict] | None = None
 
class LLMCallAudit(BaseModel):
    """LLM 호출 감사 기록"""
    model: str
    provider: str
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    temperature: float
    system_prompt_hash: str  # 전문 대신 해시
    user_prompt_summary: str
    response_summary: str
    latency_ms: int
    cost_usd: float
 
class ToolCallAudit(BaseModel):
    """도구 호출 감사 기록"""
    tool_name: str
    tool_version: str
    input_params: dict
    output_data: dict
    side_effects: list[str]  # 외부 시스템 변경 사항
    latency_ms: int
    idempotency_key: str

불변 감사 로그

불변 로그 저장 전략

immutable_audit_log.py

python

import hashlib
import json
 
class ImmutableAuditLog:
    """체이닝 해시를 사용한 불변 감사 로그"""
 
    def __init__(self, store: AuditStore):
        self.store = store
 
    async def append(self, entry: AuditEntry) -> str:
        """감사 항목 추가 (체이닝 해시로 무결성 보장)"""
        # 이전 항목의 해시를 가져와 체이닝
        previous_hash = await self.store.get_latest_hash(entry.workflow_id)
 
        # 현재 항목의 해시 계산 (이전 해시 포함)
        entry_data = entry.model_dump_json()
        chain_input = f"{previous_hash}:{entry_data}"
        entry_hash = hashlib.sha256(chain_input.encode()).hexdigest()
 
        # 저장 (append-only 테이블)
        await self.store.insert(
            entry=entry,
            entry_hash=entry_hash,
            previous_hash=previous_hash,
        )
 
        return entry_hash
 
    async def verify_integrity(self, workflow_id: str) -> bool:
        """체이닝 해시를 검증하여 로그 무결성 확인"""
        entries = await self.store.get_all(workflow_id)
        previous_hash = "genesis"
 
        for entry, stored_hash in entries:
            entry_data = entry.model_dump_json()
            chain_input = f"{previous_hash}:{entry_data}"
            computed_hash = hashlib.sha256(chain_input.encode()).hexdigest()
 
            if computed_hash != stored_hash:
                return False
 
            previous_hash = stored_hash
 
        return True

Info

데이터베이스 수준의 불변성

PostgreSQL에서 감사 테이블의 불변성을 보장하는 방법입니다.

audit_table.sql

sql

-- 감사 로그 테이블
CREATE TABLE audit_log (
    entry_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    workflow_id UUID NOT NULL,
    trace_id VARCHAR(64) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    actor_type VARCHAR(20) NOT NULL,
    actor_id VARCHAR(100) NOT NULL,
    action_type VARCHAR(50) NOT NULL,
    action_detail JSONB NOT NULL,
    outcome VARCHAR(20) NOT NULL,
    reasoning TEXT,
    entry_hash VARCHAR(64) NOT NULL,
    previous_hash VARCHAR(64) NOT NULL
);
 
-- UPDATE, DELETE 차단 트리거
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
    RAISE EXCEPTION 'Audit log entries cannot be modified or deleted';
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER audit_immutable_update
    BEFORE UPDATE ON audit_log
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
 
CREATE TRIGGER audit_immutable_delete
    BEFORE DELETE ON audit_log
    FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
 
-- 인덱스
CREATE INDEX idx_audit_workflow ON audit_log (workflow_id, timestamp);
CREATE INDEX idx_audit_trace ON audit_log (trace_id);
CREATE INDEX idx_audit_actor ON audit_log (actor_id, timestamp);

규제 요구사항 대응

산업별로 에이전틱 자동화에 적용되는 규제 요구사항이 다릅니다.

산업별 요구사항

산업	주요 규제	에이전트 관련 요구사항
금융	Basel III, 전자금융감독규정	모든 거래 결정의 감사 추적, 모델 리스크 관리
의료	HIPAA, 의료법	환자 데이터 접근 기록, 진단 보조 결과의 근거 기록
법률	변호사법, 개인정보보호법	비밀유지, 이해충돌 검증, 법률 조언의 면책 고지
일반	GDPR, 개인정보보호법	자동화된 의사결정에 대한 설명 요구권, 이의 제기권

컴플라이언스 체크리스트

compliance_checker.py

python

@dataclass
class ComplianceRequirement:
    regulation: str
    requirement: str
    check_function: Callable
    severity: str  # critical, high, medium
 
class ComplianceChecker:
    def __init__(self, requirements: list[ComplianceRequirement]):
        self.requirements = requirements
 
    async def check_workflow(self, workflow_id: str) -> ComplianceReport:
        """워크플로우의 컴플라이언스 점검"""
        audit_entries = await self.audit_store.get_all(workflow_id)
        violations = []
 
        for req in self.requirements:
            result = await req.check_function(audit_entries)
            if not result.passed:
                violations.append(ComplianceViolation(
                    regulation=req.regulation,
                    requirement=req.requirement,
                    severity=req.severity,
                    detail=result.detail,
                ))
 
        return ComplianceReport(
            workflow_id=workflow_id,
            checked_at=datetime.utcnow(),
            total_requirements=len(self.requirements),
            violations=violations,
            passed=len([v for v in violations if v.severity == "critical"]) == 0,
        )
 
# 금융 서비스 컴플라이언스 요구사항 예시
financial_requirements = [
    ComplianceRequirement(
        regulation="전자금융감독규정",
        requirement="모든 금융 거래 결정에 대한 감사 추적 존재",
        check_function=check_financial_audit_trail,
        severity="critical",
    ),
    ComplianceRequirement(
        regulation="전자금융감독규정",
        requirement="고위험 거래에 대한 사람 승인 기록",
        check_function=check_human_approval_for_high_risk,
        severity="critical",
    ),
    ComplianceRequirement(
        regulation="개인정보보호법",
        requirement="개인정보 접근 기록 및 최소 수집 준수",
        check_function=check_pii_access_logging,
        severity="high",
    ),
]

설명 가능성

에이전트의 결정을 사후에 사람이 이해할 수 있도록 설명 가능성(Explainability)을 확보해야 합니다.

결정 근거 기록

decision_explanation.py

python

@dataclass
class DecisionExplanation:
    """에이전트 결정의 설명"""
    decision: str
    reasoning_steps: list[str]
    evidence: list[Evidence]
    alternatives: list[Alternative]
    confidence: float
    limitations: list[str]
 
@dataclass
class Evidence:
    source: str  # 참조한 문서/데이터
    relevance: float
    excerpt: str  # 핵심 발췌
 
@dataclass
class Alternative:
    option: str
    reason_not_chosen: str
    estimated_confidence: float
 
class ExplainableAgent:
    async def decide_with_explanation(
        self, context: dict
    ) -> tuple[str, DecisionExplanation]:
        """결정과 함께 설명을 생성"""
        # 에이전트의 추론 과정을 구조화하여 기록
        response = await self.llm.invoke(
            messages=[
                {"role": "system", "content": EXPLAINABLE_SYSTEM_PROMPT},
                {"role": "user", "content": json.dumps(context)},
            ],
            response_format=DecisionWithExplanation,
        )
 
        explanation = DecisionExplanation(
            decision=response.decision,
            reasoning_steps=response.reasoning_steps,
            evidence=[
                Evidence(source=e.source, relevance=e.relevance, excerpt=e.excerpt)
                for e in response.evidence
            ],
            alternatives=[
                Alternative(
                    option=a.option,
                    reason_not_chosen=a.reason,
                    estimated_confidence=a.confidence,
                )
                for a in response.alternatives
            ],
            confidence=response.confidence,
            limitations=response.limitations,
        )
 
        # 감사 로그에 설명 기록
        await self.audit_log.append(AuditEntry(
            action_type="decision",
            reasoning=json.dumps(explanation.__dict__, default=str),
            confidence=explanation.confidence,
        ))
 
        return response.decision, explanation

Warning

재현 가능성

특정 시점의 에이전트 결정을 동일한 조건에서 다시 실행할 수 있는 재현 가능성(Reproducibility)을 확보합니다.

재현에 필요한 정보

재현을 위해서는 다음 정보를 모두 기록해야 합니다.

LLM 모델 버전과 파라미터 (temperature, seed 등)
시스템 프롬프트 전문 (또는 버전 해시)
입력 데이터 전체
도구 호출 결과 (외부 시스템의 응답은 시점에 따라 달라질 수 있으므로)
랜덤 시드 (적용 가능한 경우)

reproducibility.py

python

@dataclass
class ReproductionContext:
    """결정 재현을 위한 컨텍스트"""
    model_id: str
    model_version: str
    temperature: float
    seed: int | None
    system_prompt_version: str
    system_prompt_content: str
    input_data: dict
    tool_responses: dict[str, Any]  # 도구 호출 결과 스냅샷
    timestamp: datetime
 
class ReproductionManager:
    async def capture(self, workflow_id: str, node: str) -> ReproductionContext:
        """현재 실행 컨텍스트를 캡처"""
        # ... 현재 상태 캡처 ...
 
    async def replay(self, context: ReproductionContext) -> ReplayResult:
        """캡처된 컨텍스트로 결정을 재현"""
        # 도구 호출을 실제 실행 대신 캡처된 결과로 대체
        mock_tools = MockToolSet(context.tool_responses)
 
        response = await self.llm.invoke(
            model=context.model_id,
            messages=[
                {"role": "system", "content": context.system_prompt_content},
                {"role": "user", "content": json.dumps(context.input_data)},
            ],
            temperature=context.temperature,
            seed=context.seed,
            tools=mock_tools,
        )
 
        return ReplayResult(
            original_timestamp=context.timestamp,
            replay_timestamp=datetime.utcnow(),
            output=response,
        )

OpenTelemetry 통합

OpenTelemetry(OTel)를 사용하면 분산 추적, 메트릭, 로그를 표준화된 방식으로 수집할 수 있습니다.

otel_integration.py

python

from opentelemetry import trace
from opentelemetry.trace import StatusCode
 
tracer = trace.get_tracer("agentic-workflow")
 
class InstrumentedWorkflowRunner:
    async def run_node(self, node: str, state: dict) -> dict:
        with tracer.start_as_current_span(
            f"workflow.node.{node}",
            attributes={
                "workflow.id": state["workflow_id"],
                "workflow.node": node,
                "workflow.phase": state["phase"],
            },
        ) as span:
            try:
                result = await self.execute_node(node, state)
                span.set_status(StatusCode.OK)
                span.set_attribute("workflow.node.output_size", len(str(result)))
                return result
            except Exception as e:
                span.set_status(StatusCode.ERROR, str(e))
                span.record_exception(e)
                raise
 
    async def call_llm(self, messages: list, model: str) -> str:
        with tracer.start_as_current_span(
            "llm.call",
            attributes={
                "llm.model": model,
                "llm.input_tokens": self._count_tokens(messages),
            },
        ) as span:
            response = await self.llm.invoke(messages, model=model)
            span.set_attribute("llm.output_tokens", response.usage.completion_tokens)
            span.set_attribute("llm.total_tokens", response.usage.total_tokens)
            span.set_attribute("llm.latency_ms", response.latency_ms)
            return response.content

보존 정책

감사 로그의 보존 기간과 아카이브 전략을 규제 요구사항에 맞게 수립합니다.

retention_policy.py

python

@dataclass
class RetentionPolicy:
    hot_storage_days: int    # 빠른 조회 가능 (PostgreSQL)
    warm_storage_days: int   # 조회 가능하나 느림 (S3 Standard)
    cold_storage_days: int   # 아카이브 (S3 Glacier)
    total_retention_days: int
 
# 산업별 보존 정책 예시
RETENTION_POLICIES = {
    "financial": RetentionPolicy(
        hot_storage_days=90,
        warm_storage_days=365,
        cold_storage_days=2555,  # 7년
        total_retention_days=2555,
    ),
    "healthcare": RetentionPolicy(
        hot_storage_days=90,
        warm_storage_days=365,
        cold_storage_days=3650,  # 10년
        total_retention_days=3650,
    ),
    "general": RetentionPolicy(
        hot_storage_days=30,
        warm_storage_days=180,
        cold_storage_days=365,
        total_retention_days=365,
    ),
}