2026년 3월 23일·AI / ML·

10장: 실전 프로젝트 -- LLM 코드 분석 파이프라인 구축

AST 추출부터 코드 스멜 감지, 리팩터링 제안, 검증, 적용까지 전체 파이프라인을 구축하는 실전 프로젝트입니다. 레거시 프로젝트 현대화 사례와 도입 가이드를 포함합니다.

22분1,700자10개 섹션

code-quality ai llm devtools

code-analysis10 / 10

1 2 3 4 5 6 7 8 9 10

이전9장: CI/CD 통합과 지속적 코드 품질 관리

학습 목표

전체 파이프라인(AST 추출 -> 스멜 감지 -> 리팩터링 제안 -> 검증 -> 적용)을 구축합니다
Claude Code와 Copilot을 활용한 실전 코드 분석 기법을 익힙니다
레거시 프로젝트 현대화의 실제 사례를 분석합니다
조직에 LLM 코드 분석 도구를 도입하는 가이드를 수립합니다

프로젝트 개요

이 장에서는 시리즈 전체에서 학습한 기법들을 하나의 통합 파이프라인으로 구현합니다. 가상의 레거시 Python/TypeScript 프로젝트를 대상으로 분석에서 리팩터링까지 전 과정을 수행합니다.

전체 파이프라인 구조

프로젝트 구조

code-analyzer/
  src/
    ast_engine/
      parser.py          # AST 파서
      metrics.py          # 메트릭 계산
      chunker.py          # cAST 청킹
    analysis/
      smell_detector.py   # 코드 스멜 탐지
      debt_calculator.py  # 기술 부채 정량화
      security_scanner.py # 보안 분석
    refactoring/
      planner.py          # 리팩터링 계획
      generator.py        # 코드 생성
      validator.py        # 검증
    reporting/
      pr_commenter.py     # PR 코멘트
      dashboard.py        # 대시보드 데이터
    pipeline.py           # 오케스트레이터
    config.py             # 설정
  tests/
  scripts/
  pyproject.toml

1단계: 통합 AST 엔진

2장에서 학습한 AST 추출, 메트릭 계산, 청킹을 하나의 엔진으로 통합합니다.

src/ast_engine/parser.py

python

"""통합 AST 파싱 엔진"""
import ast
from dataclasses import dataclass, field
from pathlib import Path
 
 
@dataclass
class CodeEntity:
    """코드 엔티티 (함수, 클래스, 메서드)"""
    name: str
    entity_type: str  # function, class, method
    filepath: str
    line_start: int
    line_end: int
    source: str
    metrics: dict = field(default_factory=dict)
    children: list["CodeEntity"] = field(default_factory=list)
 
 
@dataclass
class FileAnalysis:
    """파일 분석 결과"""
    filepath: str
    language: str
    total_lines: int
    entities: list[CodeEntity]
    imports: list[str]
    global_metrics: dict
 
 
class UnifiedParser:
    """Python과 TypeScript를 지원하는 통합 파서"""
 
    def parse_file(self, filepath: str) -> FileAnalysis:
        path = Path(filepath)
        source = path.read_text()
 
        if path.suffix == ".py":
            return self._parse_python(filepath, source)
        elif path.suffix in (".ts", ".tsx"):
            return self._parse_typescript(filepath, source)
        else:
            raise ValueError(f"지원하지 않는 확장자: {path.suffix}")
 
    def parse_directory(self, dirpath: str) -> list[FileAnalysis]:
        results = []
        root = Path(dirpath)
 
        ignore_dirs = {"node_modules", ".git", "__pycache__", ".venv", "dist"}
 
        for path in root.rglob("*"):
            if any(p in path.parts for p in ignore_dirs):
                continue
            if path.suffix in (".py", ".ts", ".tsx"):
                try:
                    results.append(self.parse_file(str(path)))
                except (SyntaxError, UnicodeDecodeError) as e:
                    results.append(FileAnalysis(
                        filepath=str(path),
                        language="unknown",
                        total_lines=0,
                        entities=[],
                        imports=[],
                        global_metrics={"error": str(e)},
                    ))
 
        return results
 
    def _parse_python(self, filepath: str, source: str) -> FileAnalysis:
        tree = ast.parse(source)
        lines = source.splitlines()
        entities = []
        imports = []
 
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                entities.append(self._python_function_entity(
                    node, filepath, lines
                ))
            elif isinstance(node, ast.ClassDef):
                class_entity = self._python_class_entity(
                    node, filepath, lines
                )
                entities.append(class_entity)
            elif isinstance(node, ast.Import):
                for alias in node.names:
                    imports.append(alias.name)
            elif isinstance(node, ast.ImportFrom):
                if node.module:
                    imports.append(node.module)
 
        return FileAnalysis(
            filepath=filepath,
            language="python",
            total_lines=len(lines),
            entities=entities,
            imports=imports,
            global_metrics={
                "entity_count": len(entities),
                "import_count": len(imports),
                "avg_complexity": (
                    sum(
                        e.metrics.get("complexity", 0)
                        for e in entities
                    ) / max(len(entities), 1)
                ),
            },
        )
 
    def _python_function_entity(
        self,
        node: ast.FunctionDef,
        filepath: str,
        lines: list[str],
    ) -> CodeEntity:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        source = "\n".join(lines[start:end])
 
        return CodeEntity(
            name=node.name,
            entity_type="function",
            filepath=filepath,
            line_start=node.lineno,
            line_end=end,
            source=source,
            metrics={
                "complexity": self._calc_complexity(node),
                "line_count": end - node.lineno + 1,
                "param_count": len(node.args.args),
                "has_docstring": ast.get_docstring(node) is not None,
                "return_count": sum(
                    1 for n in ast.walk(node)
                    if isinstance(n, ast.Return)
                ),
            },
        )
 
    def _python_class_entity(
        self,
        node: ast.ClassDef,
        filepath: str,
        lines: list[str],
    ) -> CodeEntity:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        source = "\n".join(lines[start:end])
 
        children = []
        for child in node.body:
            if isinstance(child, (ast.FunctionDef, ast.AsyncFunctionDef)):
                children.append(self._python_function_entity(
                    child, filepath, lines
                ))
 
        return CodeEntity(
            name=node.name,
            entity_type="class",
            filepath=filepath,
            line_start=node.lineno,
            line_end=end,
            source=source,
            metrics={
                "method_count": len(children),
                "line_count": end - node.lineno + 1,
                "has_docstring": ast.get_docstring(node) is not None,
            },
            children=children,
        )
 
    def _parse_typescript(self, filepath: str, source: str) -> FileAnalysis:
        # ts-morph 기반 파싱 (Python에서는 subprocess로 호출)
        # 실제 구현에서는 Node.js 스크립트와 연동
        return FileAnalysis(
            filepath=filepath,
            language="typescript",
            total_lines=len(source.splitlines()),
            entities=[],
            imports=[],
            global_metrics={},
        )
 
    def _calc_complexity(self, node: ast.AST) -> int:
        complexity = 1
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
            elif isinstance(child, ast.BoolOp):
                complexity += len(child.values) - 1
            elif isinstance(child, ast.ExceptHandler):
                complexity += 1
        return complexity

2-3단계: 스멜 탐지와 부채 정량화

4장의 코드 스멜 탐지와 기술 부채 계산을 통합합니다.

src/analysis/smell_detector.py

python

"""통합 코드 스멜 탐지기"""
from dataclasses import dataclass
 
 
@dataclass
class DetectedSmell:
    smell_type: str
    severity: str
    entity_name: str
    filepath: str
    line_start: int
    description: str
    metrics: dict
    remediation_hours: float
 
 
class IntegratedSmellDetector:
    """AST 메트릭 + LLM 하이브리드 스멜 탐지"""
 
    # 메트릭 기반 탐지 임계값
    THRESHOLDS = {
        "complexity": {"warning": 10, "critical": 20},
        "line_count": {"warning": 40, "critical": 80},
        "param_count": {"warning": 4, "critical": 7},
        "method_count": {"warning": 10, "critical": 20},
    }
 
    def __init__(self, llm_client=None):
        self.llm_client = llm_client
 
    async def detect(
        self, file_analyses: list,
    ) -> list[DetectedSmell]:
        smells = []
 
        # 1단계: 메트릭 기반 탐지 (빠름, 비용 없음)
        for file_analysis in file_analyses:
            for entity in file_analysis.entities:
                smells.extend(self._check_metrics(entity))
 
        # 2단계: LLM 기반 심층 탐지 (느림, 비용 있음)
        if self.llm_client:
            high_priority = [
                s for s in smells if s.severity == "critical"
            ]
            for smell in high_priority:
                llm_smells = await self._llm_deep_analysis(smell)
                smells.extend(llm_smells)
 
        return self._deduplicate(smells)
 
    def _check_metrics(self, entity) -> list[DetectedSmell]:
        smells = []
        metrics = entity.metrics
 
        # 순환 복잡도 검사
        cx = metrics.get("complexity", 0)
        if cx >= self.THRESHOLDS["complexity"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="high_complexity",
                severity="critical",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=(
                    f"순환 복잡도가 {cx}로 매우 높습니다. "
                    f"함수를 분해하세요."
                ),
                metrics={"complexity": cx},
                remediation_hours=cx * 0.3,
            ))
        elif cx >= self.THRESHOLDS["complexity"]["warning"]:
            smells.append(DetectedSmell(
                smell_type="moderate_complexity",
                severity="warning",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=f"순환 복잡도가 {cx}입니다. 개선을 검토하세요.",
                metrics={"complexity": cx},
                remediation_hours=cx * 0.2,
            ))
 
        # 함수 길이 검사
        line_count = metrics.get("line_count", 0)
        if line_count >= self.THRESHOLDS["line_count"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="long_function",
                severity="critical",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=f"함수가 {line_count}줄로 너무 깁니다.",
                metrics={"line_count": line_count},
                remediation_hours=2.0,
            ))
 
        # 매개변수 수 검사
        param_count = metrics.get("param_count", 0)
        if param_count >= self.THRESHOLDS["param_count"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="long_parameter_list",
                severity="warning",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=(
                    f"매개변수가 {param_count}개입니다. "
                    f"매개변수 객체 도입을 고려하세요."
                ),
                metrics={"param_count": param_count},
                remediation_hours=1.0,
            ))
 
        return smells
 
    async def _llm_deep_analysis(self, smell: DetectedSmell) -> list[DetectedSmell]:
        # LLM을 통한 추가 스멜 탐지 (간략화)
        return []
 
    def _deduplicate(self, smells: list[DetectedSmell]) -> list[DetectedSmell]:
        seen = set()
        unique = []
        for smell in smells:
            key = (smell.smell_type, smell.entity_name, smell.filepath)
            if key not in seen:
                seen.add(key)
                unique.append(smell)
        return unique

4-6단계: 리팩터링 제안과 검증

5장의 멀티에이전트 리팩터링과 검증 파이프라인을 통합합니다.

src/refactoring/planner.py

python

"""리팩터링 계획 수립 및 실행"""
from dataclasses import dataclass
 
 
@dataclass
class RefactorAction:
    target_file: str
    target_entity: str
    action_type: str  # extract_function, rename, move, inline
    description: str
    priority: int
    estimated_hours: float
 
 
@dataclass
class RefactorPlan:
    actions: list[RefactorAction]
    total_hours: float
    expected_improvements: dict
 
 
class RefactorPlanner:
    """우선순위 기반 리팩터링 계획 수립"""
 
    def __init__(self, llm_client):
        self.llm_client = llm_client
 
    async def create_plan(
        self,
        smells: list,
        budget_hours: float = 40,
    ) -> RefactorPlan:
        # 우선순위 정렬: 심각도 x 수정 용이성
        prioritized = sorted(
            smells,
            key=lambda s: (
                {"critical": 4, "warning": 2, "info": 1}.get(
                    s.severity, 1
                ) / max(s.remediation_hours, 0.5)
            ),
            reverse=True,
        )
 
        actions = []
        total_hours = 0
 
        for smell in prioritized:
            if total_hours + smell.remediation_hours > budget_hours:
                continue
 
            action = await self._plan_action(smell)
            if action:
                actions.append(action)
                total_hours += action.estimated_hours
 
        return RefactorPlan(
            actions=actions,
            total_hours=total_hours,
            expected_improvements={
                "complexity_reduction": self._estimate_complexity_reduction(
                    actions
                ),
                "smell_reduction": len(actions),
            },
        )
 
    async def _plan_action(self, smell) -> RefactorAction | None:
        action_map = {
            "high_complexity": "extract_function",
            "long_function": "extract_function",
            "long_parameter_list": "introduce_parameter_object",
            "duplicate_code": "extract_common_function",
            "feature_envy": "move_method",
        }
 
        action_type = action_map.get(smell.smell_type)
        if not action_type:
            return None
 
        # LLM에게 구체적인 리팩터링 지시 요청
        prompt = f"""다음 코드 스멜에 대한 구체적인 리팩터링 지시를 생성하세요.
 
스멜 유형: {smell.smell_type}
대상: {smell.entity_name} ({smell.filepath}:{smell.line_start})
설명: {smell.description}
권장 액션: {action_type}
 
다음을 포함하세요:
1. 구체적인 리팩터링 단계
2. 추출할 함수의 이름과 매개변수
3. 주의사항"""
 
        response = await self.llm_client.generate(prompt)
 
        return RefactorAction(
            target_file=smell.filepath,
            target_entity=smell.entity_name,
            action_type=action_type,
            description=response,
            priority=1,
            estimated_hours=smell.remediation_hours,
        )
 
    def _estimate_complexity_reduction(
        self, actions: list[RefactorAction]
    ) -> float:
        return sum(
            3.0 if a.action_type == "extract_function" else 1.0
            for a in actions
        )

7단계: 통합 파이프라인 오케스트레이터

src/pipeline.py

python

"""전체 파이프라인 오케스트레이터"""
import json
import asyncio
from dataclasses import dataclass, asdict
from datetime import datetime
from pathlib import Path
 
 
@dataclass
class PipelineResult:
    project_path: str
    analyzed_at: str
    files_analyzed: int
    total_entities: int
    smells_detected: int
    critical_smells: int
    refactor_actions: int
    estimated_hours: float
    report_path: str
 
 
class CodeAnalysisPipeline:
    """LLM 코드 분석 통합 파이프라인"""
 
    def __init__(
        self,
        llm_client=None,
        output_dir: str = "./reports",
    ):
        self.llm_client = llm_client
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
 
    async def run(
        self,
        project_path: str,
        budget_hours: float = 40,
    ) -> PipelineResult:
        print(f"프로젝트 분석 시작: {project_path}")
 
        # 1단계: AST 파싱
        print("  [1/7] AST 파싱 중...")
        parser = UnifiedParser()
        file_analyses = parser.parse_directory(project_path)
        print(f"  -> {len(file_analyses)}개 파일 분석 완료")
 
        # 2단계: 코드 스멜 탐지
        print("  [2/7] 코드 스멜 탐지 중...")
        detector = IntegratedSmellDetector(self.llm_client)
        smells = await detector.detect(file_analyses)
        print(f"  -> {len(smells)}개 스멜 감지")
 
        # 3단계: 기술 부채 정량화
        print("  [3/7] 기술 부채 계산 중...")
        total_debt_hours = sum(s.remediation_hours for s in smells)
        critical_count = sum(
            1 for s in smells if s.severity == "critical"
        )
 
        # 4단계: 보안 분석
        print("  [4/7] 보안 분석 중...")
        # security_scanner.scan(file_analyses)
 
        # 5단계: 리팩터링 계획
        print("  [5/7] 리팩터링 계획 수립 중...")
        planner = RefactorPlanner(self.llm_client)
        plan = await planner.create_plan(smells, budget_hours)
 
        # 6단계: 아키텍처 분석
        print("  [6/7] 아키텍처 분석 중...")
        # architecture_analyzer.analyze(file_analyses)
 
        # 7단계: 리포트 생성
        print("  [7/7] 리포트 생성 중...")
        report = self._generate_report(
            file_analyses, smells, plan
        )
        report_path = self._save_report(report)
 
        return PipelineResult(
            project_path=project_path,
            analyzed_at=datetime.now().isoformat(),
            files_analyzed=len(file_analyses),
            total_entities=sum(
                len(f.entities) for f in file_analyses
            ),
            smells_detected=len(smells),
            critical_smells=critical_count,
            refactor_actions=len(plan.actions),
            estimated_hours=plan.total_hours,
            report_path=str(report_path),
        )
 
    def _generate_report(
        self, file_analyses, smells, plan
    ) -> dict:
        return {
            "summary": {
                "files": len(file_analyses),
                "entities": sum(
                    len(f.entities) for f in file_analyses
                ),
                "smells": len(smells),
                "critical": sum(
                    1 for s in smells if s.severity == "critical"
                ),
                "total_debt_hours": round(
                    sum(s.remediation_hours for s in smells), 1
                ),
            },
            "smells": [
                {
                    "type": s.smell_type,
                    "severity": s.severity,
                    "entity": s.entity_name,
                    "file": s.filepath,
                    "line": s.line_start,
                    "description": s.description,
                }
                for s in smells
            ],
            "plan": {
                "actions": [
                    {
                        "file": a.target_file,
                        "entity": a.target_entity,
                        "action": a.action_type,
                        "hours": a.estimated_hours,
                    }
                    for a in plan.actions
                ],
                "total_hours": plan.total_hours,
            },
        }
 
    def _save_report(self, report: dict) -> Path:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        path = self.output_dir / f"analysis_{timestamp}.json"
        path.write_text(
            json.dumps(report, indent=2, ensure_ascii=False)
        )
        return path

Claude Code 활용 실전 팁

AI 코딩 어시스턴트를 활용한 코드 분석의 실전 기법들을 정리합니다.

Claude Code로 코드베이스 분석하기

Claude Code는 리포지토리 전체를 컨텍스트로 활용할 수 있어 아키텍처 수준의 분석에 특히 강력합니다.

코드베이스 이해 요청 패턴:

이 프로젝트의 전체 아키텍처를 분석해주세요.
주요 모듈 간 의존관계, 데이터 흐름, 그리고 잠재적 아키텍처 문제를 포함해주세요.

리팩터링 요청 패턴:

src/services/order.py의 process_order 함수를 리팩터링해주세요.
단일 책임 원칙에 따라 함수를 분리하고, 기존 테스트가 통과하는지 확인해주세요.

보안 분석 요청 패턴:

이 프로젝트의 보안 취약점을 분석해주세요.
OWASP Top 10 기준으로 검토하고, 발견된 취약점의 수정 코드를 제안해주세요.

Tip

Claude Code에 분석을 요청할 때는 구체적인 범위와 기준을 제시하면 더 정확한 결과를 얻을 수 있습니다. "코드를 개선해주세요"보다 "순환 복잡도 15 이상인 함수를 찾아 함수 추출 리팩터링을 적용해주세요"가 훨씬 효과적입니다.

레거시 프로젝트 현대화 사례

사례: 5년 된 Python 백엔드 현대화

프로젝트 규모: 약 5만 줄, Python 3.8, Flask

자동 분석 결과:

전체 함수 420개 중 85개(20%)가 순환 복잡도 10 이상
23개(5.5%)가 순환 복잡도 20 이상 (즉시 리팩터링 필요)
의존성 순환 3건 탐지
레이어 위반 12건 탐지
보안 취약점 7건 (SQL Injection 2건 포함)
추정 기술 부채: 약 320시간

현대화 계획 (12주):

주차	작업	자동화 비율
1-2	보안 취약점 7건 수정	80%
3-4	순환 의존성 3건 해소	60%
5-8	상위 23개 고복잡도 함수 리팩터링	70%
9-10	레이어 위반 12건 수정	50%
11-12	CI/CD 파이프라인 구축 + 품질 게이트	90%

측정 가능한 개선 결과

도입 가이드

1단계: 현황 파악 (1주)

조직의 코드 분석 성숙도를 평가하는 것이 첫 단계입니다.

현재 정적 분석 도구를 사용하고 있는지 확인합니다
기존 CI/CD 파이프라인의 구조를 파악합니다
팀의 코드 리뷰 프로세스를 점검합니다
가장 문제가 심한 코드 영역(핫스팟)을 식별합니다

2단계: 파일럿 프로젝트 (2-4주)

작은 범위에서 LLM 기반 코드 분석을 시작합니다.

AST 메트릭 수집 자동화를 먼저 구축합니다
한 개의 리포지토리에서 파일럿을 진행합니다
품질 게이트는 "경고만" 모드로 시작합니다
팀 피드백을 수집하고 정책을 조정합니다

3단계: 확대 적용 (4-8주)

파일럿 결과를 바탕으로 전체 조직에 확대합니다.

모든 리포지토리에 CI/CD 통합을 적용합니다
품질 게이트를 점진적으로 강화합니다
기술 부채 대시보드를 구축합니다
정기적 리팩터링 스프린트를 도입합니다

Warning

도구를 도입하는 것보다 문화를 바꾸는 것이 더 어렵습니다. LLM 코드 분석 도구가 "개발자를 감시하는 도구"가 아닌 "개발자를 돕는 도구"로 인식되도록 소통해야 합니다. 품질 메트릭을 개인 성과 평가에 사용하면 팀의 저항을 초래하므로 반드시 팀 수준의 메트릭으로 관리합니다.

비용 대비 효과

투자 항목	비용 (연간)	절감 효과
LLM API 비용	약 500만 원	-
도구 구축/유지	약 1,000만 원	-
기술 부채 감소	-	약 3,000만 원
결함 예방	-	약 2,000만 원
온보딩 가속	-	약 500만 원
합계	약 1,500만 원	약 5,500만 원

10명 팀 기준, 개발자 시간의 42%가 기술 부채에 소모된다는 점을 고려하면, 기술 부채를 30%만 줄여도 연간 약 1억 원 이상의 간접 비용을 절감할 수 있습니다.

시리즈 정리

이 시리즈에서 학습한 핵심 내용을 정리합니다.

1장: 전통 정적 분석의 한계와 LLM의 코드 이해 능력, 2026년 도구 생태계

2장: AST와 LLM의 하이브리드 분석, cAST 청킹, 순환 복잡도/결합도/응집도

3장: 레거시 코드 이해와 문서화 자동화, 의존성 그래프, 아키텍처 다이어그램

4장: 코드 스멜 분류와 탐지, CodeScene Code Health, 기술 부채 정량화

5장: 멀티에이전트 리팩터링(RepoAI), 37%에서 98%로의 검증 파이프라인

6장: 언어/프레임워크 마이그레이션 자동화, 의미 보존 검증

7장: SAST+LLM 보안 분석, OWASP Top 10, CI/CD 보안 게이트

8장: 모듈 의존성 분석, 순환 의존성, 레이어 위반, 마이크로서비스 경계

9장: CI/CD 통합, 품질 게이트, 트렌드 대시보드, GitHub Actions

10장: 전체 파이프라인 구축, 실전 사례, 도입 가이드

LLM 기반 코드 분석은 이제 실험적 기술이 아닌 실전에서 검증된 접근법입니다. AST의 정확성과 LLM의 의미 이해를 결합하고, 체계적인 검증 파이프라인을 구축하면, 코드 품질을 지속적으로 개선하면서 개발 속도를 높일 수 있습니다. 이 시리즈에서 학습한 기법들을 조직의 상황에 맞게 적용하여 코드 품질 관리의 새로운 단계로 나아가시기 바랍니다.

이 글이 도움이 되셨나요?

AI / ML

9장: CI/CD 통합과 지속적 코드 품질 관리

LLM 기반 코드 분석을 CI/CD 파이프라인에 통합하는 방법을 학습합니다. PR별 자동 분석, 품질 게이트, 기술 부채 대시보드와 GitHub Actions 구축을 다룹니다.

2026년 3월 21일·16분

AI / ML

8장: 아키텍처 분석과 시각화

LLM을 활용한 아키텍처 분석, 순환 의존성 감지, 레이어 위반 탐지, 마이크로서비스 경계 제안과 아키텍처 다이어그램 자동 생성을 학습합니다.

2026년 3월 19일·17분

AI / ML

7장: 보안 취약점 분석과 자동 수정

SAST와 LLM을 결합한 보안 취약점 탐지, OWASP Top 10 자동 검출, 취약점 자동 수정 제안과 CI/CD 보안 게이트 구축을 학습합니다.

2026년 3월 17일·16분

2026년 3월 23일·AI / ML·

10장: 실전 프로젝트 -- LLM 코드 분석 파이프라인 구축

22분1,700자10개 섹션

code-quality ai llm devtools

code-analysis10 / 10

1 2 3 4 5 6 7 8 9 10

이전9장: CI/CD 통합과 지속적 코드 품질 관리

학습 목표

전체 파이프라인(AST 추출 -> 스멜 감지 -> 리팩터링 제안 -> 검증 -> 적용)을 구축합니다
Claude Code와 Copilot을 활용한 실전 코드 분석 기법을 익힙니다
레거시 프로젝트 현대화의 실제 사례를 분석합니다
조직에 LLM 코드 분석 도구를 도입하는 가이드를 수립합니다

code-analyzer/
  src/
    ast_engine/
      parser.py          # AST 파서
      metrics.py          # 메트릭 계산
      chunker.py          # cAST 청킹
    analysis/
      smell_detector.py   # 코드 스멜 탐지
      debt_calculator.py  # 기술 부채 정량화
      security_scanner.py # 보안 분석
    refactoring/
      planner.py          # 리팩터링 계획
      generator.py        # 코드 생성
      validator.py        # 검증
    reporting/
      pr_commenter.py     # PR 코멘트
      dashboard.py        # 대시보드 데이터
    pipeline.py           # 오케스트레이터
    config.py             # 설정
  tests/
  scripts/
  pyproject.toml

1단계: 통합 AST 엔진

2장에서 학습한 AST 추출, 메트릭 계산, 청킹을 하나의 엔진으로 통합합니다.

src/ast_engine/parser.py

python

"""통합 AST 파싱 엔진"""
import ast
from dataclasses import dataclass, field
from pathlib import Path
 
 
@dataclass
class CodeEntity:
    """코드 엔티티 (함수, 클래스, 메서드)"""
    name: str
    entity_type: str  # function, class, method
    filepath: str
    line_start: int
    line_end: int
    source: str
    metrics: dict = field(default_factory=dict)
    children: list["CodeEntity"] = field(default_factory=list)
 
 
@dataclass
class FileAnalysis:
    """파일 분석 결과"""
    filepath: str
    language: str
    total_lines: int
    entities: list[CodeEntity]
    imports: list[str]
    global_metrics: dict
 
 
class UnifiedParser:
    """Python과 TypeScript를 지원하는 통합 파서"""
 
    def parse_file(self, filepath: str) -> FileAnalysis:
        path = Path(filepath)
        source = path.read_text()
 
        if path.suffix == ".py":
            return self._parse_python(filepath, source)
        elif path.suffix in (".ts", ".tsx"):
            return self._parse_typescript(filepath, source)
        else:
            raise ValueError(f"지원하지 않는 확장자: {path.suffix}")
 
    def parse_directory(self, dirpath: str) -> list[FileAnalysis]:
        results = []
        root = Path(dirpath)
 
        ignore_dirs = {"node_modules", ".git", "__pycache__", ".venv", "dist"}
 
        for path in root.rglob("*"):
            if any(p in path.parts for p in ignore_dirs):
                continue
            if path.suffix in (".py", ".ts", ".tsx"):
                try:
                    results.append(self.parse_file(str(path)))
                except (SyntaxError, UnicodeDecodeError) as e:
                    results.append(FileAnalysis(
                        filepath=str(path),
                        language="unknown",
                        total_lines=0,
                        entities=[],
                        imports=[],
                        global_metrics={"error": str(e)},
                    ))
 
        return results
 
    def _parse_python(self, filepath: str, source: str) -> FileAnalysis:
        tree = ast.parse(source)
        lines = source.splitlines()
        entities = []
        imports = []
 
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                entities.append(self._python_function_entity(
                    node, filepath, lines
                ))
            elif isinstance(node, ast.ClassDef):
                class_entity = self._python_class_entity(
                    node, filepath, lines
                )
                entities.append(class_entity)
            elif isinstance(node, ast.Import):
                for alias in node.names:
                    imports.append(alias.name)
            elif isinstance(node, ast.ImportFrom):
                if node.module:
                    imports.append(node.module)
 
        return FileAnalysis(
            filepath=filepath,
            language="python",
            total_lines=len(lines),
            entities=entities,
            imports=imports,
            global_metrics={
                "entity_count": len(entities),
                "import_count": len(imports),
                "avg_complexity": (
                    sum(
                        e.metrics.get("complexity", 0)
                        for e in entities
                    ) / max(len(entities), 1)
                ),
            },
        )
 
    def _python_function_entity(
        self,
        node: ast.FunctionDef,
        filepath: str,
        lines: list[str],
    ) -> CodeEntity:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        source = "\n".join(lines[start:end])
 
        return CodeEntity(
            name=node.name,
            entity_type="function",
            filepath=filepath,
            line_start=node.lineno,
            line_end=end,
            source=source,
            metrics={
                "complexity": self._calc_complexity(node),
                "line_count": end - node.lineno + 1,
                "param_count": len(node.args.args),
                "has_docstring": ast.get_docstring(node) is not None,
                "return_count": sum(
                    1 for n in ast.walk(node)
                    if isinstance(n, ast.Return)
                ),
            },
        )
 
    def _python_class_entity(
        self,
        node: ast.ClassDef,
        filepath: str,
        lines: list[str],
    ) -> CodeEntity:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        source = "\n".join(lines[start:end])
 
        children = []
        for child in node.body:
            if isinstance(child, (ast.FunctionDef, ast.AsyncFunctionDef)):
                children.append(self._python_function_entity(
                    child, filepath, lines
                ))
 
        return CodeEntity(
            name=node.name,
            entity_type="class",
            filepath=filepath,
            line_start=node.lineno,
            line_end=end,
            source=source,
            metrics={
                "method_count": len(children),
                "line_count": end - node.lineno + 1,
                "has_docstring": ast.get_docstring(node) is not None,
            },
            children=children,
        )
 
    def _parse_typescript(self, filepath: str, source: str) -> FileAnalysis:
        # ts-morph 기반 파싱 (Python에서는 subprocess로 호출)
        # 실제 구현에서는 Node.js 스크립트와 연동
        return FileAnalysis(
            filepath=filepath,
            language="typescript",
            total_lines=len(source.splitlines()),
            entities=[],
            imports=[],
            global_metrics={},
        )
 
    def _calc_complexity(self, node: ast.AST) -> int:
        complexity = 1
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
            elif isinstance(child, ast.BoolOp):
                complexity += len(child.values) - 1
            elif isinstance(child, ast.ExceptHandler):
                complexity += 1
        return complexity

2-3단계: 스멜 탐지와 부채 정량화

4장의 코드 스멜 탐지와 기술 부채 계산을 통합합니다.

src/analysis/smell_detector.py

python

"""통합 코드 스멜 탐지기"""
from dataclasses import dataclass
 
 
@dataclass
class DetectedSmell:
    smell_type: str
    severity: str
    entity_name: str
    filepath: str
    line_start: int
    description: str
    metrics: dict
    remediation_hours: float
 
 
class IntegratedSmellDetector:
    """AST 메트릭 + LLM 하이브리드 스멜 탐지"""
 
    # 메트릭 기반 탐지 임계값
    THRESHOLDS = {
        "complexity": {"warning": 10, "critical": 20},
        "line_count": {"warning": 40, "critical": 80},
        "param_count": {"warning": 4, "critical": 7},
        "method_count": {"warning": 10, "critical": 20},
    }
 
    def __init__(self, llm_client=None):
        self.llm_client = llm_client
 
    async def detect(
        self, file_analyses: list,
    ) -> list[DetectedSmell]:
        smells = []
 
        # 1단계: 메트릭 기반 탐지 (빠름, 비용 없음)
        for file_analysis in file_analyses:
            for entity in file_analysis.entities:
                smells.extend(self._check_metrics(entity))
 
        # 2단계: LLM 기반 심층 탐지 (느림, 비용 있음)
        if self.llm_client:
            high_priority = [
                s for s in smells if s.severity == "critical"
            ]
            for smell in high_priority:
                llm_smells = await self._llm_deep_analysis(smell)
                smells.extend(llm_smells)
 
        return self._deduplicate(smells)
 
    def _check_metrics(self, entity) -> list[DetectedSmell]:
        smells = []
        metrics = entity.metrics
 
        # 순환 복잡도 검사
        cx = metrics.get("complexity", 0)
        if cx >= self.THRESHOLDS["complexity"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="high_complexity",
                severity="critical",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=(
                    f"순환 복잡도가 {cx}로 매우 높습니다. "
                    f"함수를 분해하세요."
                ),
                metrics={"complexity": cx},
                remediation_hours=cx * 0.3,
            ))
        elif cx >= self.THRESHOLDS["complexity"]["warning"]:
            smells.append(DetectedSmell(
                smell_type="moderate_complexity",
                severity="warning",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=f"순환 복잡도가 {cx}입니다. 개선을 검토하세요.",
                metrics={"complexity": cx},
                remediation_hours=cx * 0.2,
            ))
 
        # 함수 길이 검사
        line_count = metrics.get("line_count", 0)
        if line_count >= self.THRESHOLDS["line_count"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="long_function",
                severity="critical",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=f"함수가 {line_count}줄로 너무 깁니다.",
                metrics={"line_count": line_count},
                remediation_hours=2.0,
            ))
 
        # 매개변수 수 검사
        param_count = metrics.get("param_count", 0)
        if param_count >= self.THRESHOLDS["param_count"]["critical"]:
            smells.append(DetectedSmell(
                smell_type="long_parameter_list",
                severity="warning",
                entity_name=entity.name,
                filepath=entity.filepath,
                line_start=entity.line_start,
                description=(
                    f"매개변수가 {param_count}개입니다. "
                    f"매개변수 객체 도입을 고려하세요."
                ),
                metrics={"param_count": param_count},
                remediation_hours=1.0,
            ))
 
        return smells
 
    async def _llm_deep_analysis(self, smell: DetectedSmell) -> list[DetectedSmell]:
        # LLM을 통한 추가 스멜 탐지 (간략화)
        return []
 
    def _deduplicate(self, smells: list[DetectedSmell]) -> list[DetectedSmell]:
        seen = set()
        unique = []
        for smell in smells:
            key = (smell.smell_type, smell.entity_name, smell.filepath)
            if key not in seen:
                seen.add(key)
                unique.append(smell)
        return unique

4-6단계: 리팩터링 제안과 검증

5장의 멀티에이전트 리팩터링과 검증 파이프라인을 통합합니다.

src/refactoring/planner.py

python

"""리팩터링 계획 수립 및 실행"""
from dataclasses import dataclass
 
 
@dataclass
class RefactorAction:
    target_file: str
    target_entity: str
    action_type: str  # extract_function, rename, move, inline
    description: str
    priority: int
    estimated_hours: float
 
 
@dataclass
class RefactorPlan:
    actions: list[RefactorAction]
    total_hours: float
    expected_improvements: dict
 
 
class RefactorPlanner:
    """우선순위 기반 리팩터링 계획 수립"""
 
    def __init__(self, llm_client):
        self.llm_client = llm_client
 
    async def create_plan(
        self,
        smells: list,
        budget_hours: float = 40,
    ) -> RefactorPlan:
        # 우선순위 정렬: 심각도 x 수정 용이성
        prioritized = sorted(
            smells,
            key=lambda s: (
                {"critical": 4, "warning": 2, "info": 1}.get(
                    s.severity, 1
                ) / max(s.remediation_hours, 0.5)
            ),
            reverse=True,
        )
 
        actions = []
        total_hours = 0
 
        for smell in prioritized:
            if total_hours + smell.remediation_hours > budget_hours:
                continue
 
            action = await self._plan_action(smell)
            if action:
                actions.append(action)
                total_hours += action.estimated_hours
 
        return RefactorPlan(
            actions=actions,
            total_hours=total_hours,
            expected_improvements={
                "complexity_reduction": self._estimate_complexity_reduction(
                    actions
                ),
                "smell_reduction": len(actions),
            },
        )
 
    async def _plan_action(self, smell) -> RefactorAction | None:
        action_map = {
            "high_complexity": "extract_function",
            "long_function": "extract_function",
            "long_parameter_list": "introduce_parameter_object",
            "duplicate_code": "extract_common_function",
            "feature_envy": "move_method",
        }
 
        action_type = action_map.get(smell.smell_type)
        if not action_type:
            return None
 
        # LLM에게 구체적인 리팩터링 지시 요청
        prompt = f"""다음 코드 스멜에 대한 구체적인 리팩터링 지시를 생성하세요.
 
스멜 유형: {smell.smell_type}
대상: {smell.entity_name} ({smell.filepath}:{smell.line_start})
설명: {smell.description}
권장 액션: {action_type}
 
다음을 포함하세요:
1. 구체적인 리팩터링 단계
2. 추출할 함수의 이름과 매개변수
3. 주의사항"""
 
        response = await self.llm_client.generate(prompt)
 
        return RefactorAction(
            target_file=smell.filepath,
            target_entity=smell.entity_name,
            action_type=action_type,
            description=response,
            priority=1,
            estimated_hours=smell.remediation_hours,
        )
 
    def _estimate_complexity_reduction(
        self, actions: list[RefactorAction]
    ) -> float:
        return sum(
            3.0 if a.action_type == "extract_function" else 1.0
            for a in actions
        )

7단계: 통합 파이프라인 오케스트레이터

src/pipeline.py

python

"""전체 파이프라인 오케스트레이터"""
import json
import asyncio
from dataclasses import dataclass, asdict
from datetime import datetime
from pathlib import Path
 
 
@dataclass
class PipelineResult:
    project_path: str
    analyzed_at: str
    files_analyzed: int
    total_entities: int
    smells_detected: int
    critical_smells: int
    refactor_actions: int
    estimated_hours: float
    report_path: str
 
 
class CodeAnalysisPipeline:
    """LLM 코드 분석 통합 파이프라인"""
 
    def __init__(
        self,
        llm_client=None,
        output_dir: str = "./reports",
    ):
        self.llm_client = llm_client
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
 
    async def run(
        self,
        project_path: str,
        budget_hours: float = 40,
    ) -> PipelineResult:
        print(f"프로젝트 분석 시작: {project_path}")
 
        # 1단계: AST 파싱
        print("  [1/7] AST 파싱 중...")
        parser = UnifiedParser()
        file_analyses = parser.parse_directory(project_path)
        print(f"  -> {len(file_analyses)}개 파일 분석 완료")
 
        # 2단계: 코드 스멜 탐지
        print("  [2/7] 코드 스멜 탐지 중...")
        detector = IntegratedSmellDetector(self.llm_client)
        smells = await detector.detect(file_analyses)
        print(f"  -> {len(smells)}개 스멜 감지")
 
        # 3단계: 기술 부채 정량화
        print("  [3/7] 기술 부채 계산 중...")
        total_debt_hours = sum(s.remediation_hours for s in smells)
        critical_count = sum(
            1 for s in smells if s.severity == "critical"
        )
 
        # 4단계: 보안 분석
        print("  [4/7] 보안 분석 중...")
        # security_scanner.scan(file_analyses)
 
        # 5단계: 리팩터링 계획
        print("  [5/7] 리팩터링 계획 수립 중...")
        planner = RefactorPlanner(self.llm_client)
        plan = await planner.create_plan(smells, budget_hours)
 
        # 6단계: 아키텍처 분석
        print("  [6/7] 아키텍처 분석 중...")
        # architecture_analyzer.analyze(file_analyses)
 
        # 7단계: 리포트 생성
        print("  [7/7] 리포트 생성 중...")
        report = self._generate_report(
            file_analyses, smells, plan
        )
        report_path = self._save_report(report)
 
        return PipelineResult(
            project_path=project_path,
            analyzed_at=datetime.now().isoformat(),
            files_analyzed=len(file_analyses),
            total_entities=sum(
                len(f.entities) for f in file_analyses
            ),
            smells_detected=len(smells),
            critical_smells=critical_count,
            refactor_actions=len(plan.actions),
            estimated_hours=plan.total_hours,
            report_path=str(report_path),
        )
 
    def _generate_report(
        self, file_analyses, smells, plan
    ) -> dict:
        return {
            "summary": {
                "files": len(file_analyses),
                "entities": sum(
                    len(f.entities) for f in file_analyses
                ),
                "smells": len(smells),
                "critical": sum(
                    1 for s in smells if s.severity == "critical"
                ),
                "total_debt_hours": round(
                    sum(s.remediation_hours for s in smells), 1
                ),
            },
            "smells": [
                {
                    "type": s.smell_type,
                    "severity": s.severity,
                    "entity": s.entity_name,
                    "file": s.filepath,
                    "line": s.line_start,
                    "description": s.description,
                }
                for s in smells
            ],
            "plan": {
                "actions": [
                    {
                        "file": a.target_file,
                        "entity": a.target_entity,
                        "action": a.action_type,
                        "hours": a.estimated_hours,
                    }
                    for a in plan.actions
                ],
                "total_hours": plan.total_hours,
            },
        }
 
    def _save_report(self, report: dict) -> Path:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        path = self.output_dir / f"analysis_{timestamp}.json"
        path.write_text(
            json.dumps(report, indent=2, ensure_ascii=False)
        )
        return path

Claude Code 활용 실전 팁

AI 코딩 어시스턴트를 활용한 코드 분석의 실전 기법들을 정리합니다.

Claude Code로 코드베이스 분석하기

Claude Code는 리포지토리 전체를 컨텍스트로 활용할 수 있어 아키텍처 수준의 분석에 특히 강력합니다.

코드베이스 이해 요청 패턴:

이 프로젝트의 전체 아키텍처를 분석해주세요.
주요 모듈 간 의존관계, 데이터 흐름, 그리고 잠재적 아키텍처 문제를 포함해주세요.

리팩터링 요청 패턴:

src/services/order.py의 process_order 함수를 리팩터링해주세요.
단일 책임 원칙에 따라 함수를 분리하고, 기존 테스트가 통과하는지 확인해주세요.

보안 분석 요청 패턴:

이 프로젝트의 보안 취약점을 분석해주세요.
OWASP Top 10 기준으로 검토하고, 발견된 취약점의 수정 코드를 제안해주세요.

Tip

레거시 프로젝트 현대화 사례

사례: 5년 된 Python 백엔드 현대화

프로젝트 규모: 약 5만 줄, Python 3.8, Flask

자동 분석 결과:

전체 함수 420개 중 85개(20%)가 순환 복잡도 10 이상
23개(5.5%)가 순환 복잡도 20 이상 (즉시 리팩터링 필요)
의존성 순환 3건 탐지
레이어 위반 12건 탐지
보안 취약점 7건 (SQL Injection 2건 포함)
추정 기술 부채: 약 320시간

현대화 계획 (12주):

주차	작업	자동화 비율
1-2	보안 취약점 7건 수정	80%
3-4	순환 의존성 3건 해소	60%
5-8	상위 23개 고복잡도 함수 리팩터링	70%
9-10	레이어 위반 12건 수정	50%
11-12	CI/CD 파이프라인 구축 + 품질 게이트	90%

현재 정적 분석 도구를 사용하고 있는지 확인합니다
기존 CI/CD 파이프라인의 구조를 파악합니다
팀의 코드 리뷰 프로세스를 점검합니다
가장 문제가 심한 코드 영역(핫스팟)을 식별합니다

2단계: 파일럿 프로젝트 (2-4주)

작은 범위에서 LLM 기반 코드 분석을 시작합니다.

AST 메트릭 수집 자동화를 먼저 구축합니다
한 개의 리포지토리에서 파일럿을 진행합니다
품질 게이트는 "경고만" 모드로 시작합니다
팀 피드백을 수집하고 정책을 조정합니다

3단계: 확대 적용 (4-8주)

파일럿 결과를 바탕으로 전체 조직에 확대합니다.

모든 리포지토리에 CI/CD 통합을 적용합니다
품질 게이트를 점진적으로 강화합니다
기술 부채 대시보드를 구축합니다
정기적 리팩터링 스프린트를 도입합니다

Warning

비용 대비 효과

투자 항목	비용 (연간)	절감 효과
LLM API 비용	약 500만 원	-
도구 구축/유지	약 1,000만 원	-
기술 부채 감소	-	약 3,000만 원
결함 예방	-	약 2,000만 원
온보딩 가속	-	약 500만 원
합계	약 1,500만 원	약 5,500만 원