2026년 3월 7일·AI / ML·

2장: AST와 LLM 하이브리드 분석

AST 기반 정적 분석과 LLM의 의미 분석을 결합하는 하이브리드 접근법을 학습합니다. cAST 청킹, 순환 복잡도, 결합도/응집도 메트릭을 Python과 TypeScript로 실습합니다.

17분1,395자7개 섹션

code-quality ai llm devtools

code-analysis2 / 10

1 2 3 4 5 6 7 8 9 10

이전1장: LLM 기반 코드 분석의 등장과 가능성 다음3장: 레거시 코드 이해와 문서화

학습 목표

AST(추상 구문 트리)의 구조와 코드 분석에서의 역할을 이해합니다
순환 복잡도, 결합도, 응집도 등 핵심 정적 분석 메트릭을 학습합니다
cAST(AST 기반 청킹)의 원리와 RAG 파이프라인에서의 활용을 파악합니다
LLM과 AST를 결합한 하이브리드 분석의 시너지를 이해합니다

AST 기초: 코드의 뼈대를 읽다

AST(Abstract Syntax Tree), 추상 구문 트리는 소스 코드를 트리 구조로 표현한 것입니다. 코드의 구문적 구조를 프로그래밍 방식으로 탐색하고 분석할 수 있게 해주는 핵심 자료구조입니다.

AST가 표현하는 것

simple-function.ts

typescript

function calculateTotal(items: Item[], taxRate: number): number {
  const subtotal = items.reduce((sum, item) => sum + item.price, 0);
  return subtotal * (1 + taxRate);
}

이 간단한 함수가 AST로 변환되면 다음과 같은 트리 구조가 됩니다.

Python에서 AST 추출하기

Python은 표준 라이브러리에 ast 모듈을 내장하고 있어 AST 작업이 간편합니다.

ast_extractor.py

python

import ast
import json
from dataclasses import dataclass, asdict
 
 
@dataclass
class FunctionInfo:
    name: str
    args: list[str]
    returns: str | None
    complexity: int
    line_start: int
    line_end: int
    docstring: str | None
 
 
class CodeAnalyzer(ast.NodeVisitor):
    """AST를 순회하며 함수 정보를 추출하는 분석기"""
 
    def __init__(self):
        self.functions: list[FunctionInfo] = []
 
    def visit_FunctionDef(self, node: ast.FunctionDef):
        complexity = self._calculate_complexity(node)
        docstring = ast.get_docstring(node)
 
        info = FunctionInfo(
            name=node.name,
            args=[arg.arg for arg in node.args.args],
            returns=ast.unparse(node.returns) if node.returns else None,
            complexity=complexity,
            line_start=node.lineno,
            line_end=node.end_lineno or node.lineno,
            docstring=docstring,
        )
        self.functions.append(info)
        self.generic_visit(node)
 
    def _calculate_complexity(self, node: ast.AST) -> int:
        """McCabe 순환 복잡도 계산"""
        complexity = 1  # 기본 경로
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
            elif isinstance(child, ast.BoolOp):
                complexity += len(child.values) - 1
            elif isinstance(child, ast.ExceptHandler):
                complexity += 1
        return complexity
 
 
def analyze_file(filepath: str) -> list[dict]:
    with open(filepath) as f:
        source = f.read()
 
    tree = ast.parse(source)
    analyzer = CodeAnalyzer()
    analyzer.visit(tree)
 
    return [asdict(func) for func in analyzer.functions]

TypeScript에서 AST 추출하기

TypeScript에서는 ts-morph 라이브러리를 활용하면 타입 정보를 포함한 풍부한 AST 분석이 가능합니다.

ast-extractor.ts

typescript

import { Project, SyntaxKind, FunctionDeclaration } from "ts-morph";
 
interface FunctionMetrics {
  name: string;
  parameters: string[];
  returnType: string;
  complexity: number;
  lineCount: number;
  dependencies: string[];
}
 
function extractFunctionMetrics(
  project: Project,
  filePath: string,
): FunctionMetrics[] {
  const sourceFile = project.getSourceFileOrThrow(filePath);
  const functions = sourceFile.getFunctions();
 
  return functions.map((func) => ({
    name: func.getName() ?? "anonymous",
    parameters: func.getParameters().map((p) => p.getName()),
    returnType: func.getReturnType().getText(),
    complexity: calculateComplexity(func),
    lineCount: func.getEndLineNumber() - func.getStartLineNumber() + 1,
    dependencies: extractDependencies(func),
  }));
}
 
function calculateComplexity(func: FunctionDeclaration): number {
  let complexity = 1;
 
  func.forEachDescendant((node) => {
    switch (node.getKind()) {
      case SyntaxKind.IfStatement:
      case SyntaxKind.WhileStatement:
      case SyntaxKind.ForStatement:
      case SyntaxKind.ForInStatement:
      case SyntaxKind.ForOfStatement:
      case SyntaxKind.CatchClause:
      case SyntaxKind.ConditionalExpression:
        complexity++;
        break;
      case SyntaxKind.BinaryExpression:
        const operator = node.getChildAtIndex(1)?.getText();
        if (operator === "&&" || operator === "||") {
          complexity++;
        }
        break;
    }
  });
 
  return complexity;
}
 
function extractDependencies(func: FunctionDeclaration): string[] {
  const deps: Set<string> = new Set();
 
  func.forEachDescendant((node) => {
    if (node.getKind() === SyntaxKind.CallExpression) {
      const expression = node.getChildAtIndex(0);
      if (expression) {
        deps.add(expression.getText());
      }
    }
  });
 
  return Array.from(deps);
}

정적 분석 메트릭의 이해

AST에서 추출할 수 있는 핵심 메트릭 세 가지를 살펴보겠습니다.

순환 복잡도 (Cyclomatic Complexity)

순환 복잡도(Cyclomatic Complexity)는 코드의 독립 실행 경로 수를 나타내는 메트릭입니다. 값이 높을수록 코드가 복잡하고 테스트하기 어렵습니다.

복잡도 범위	위험 수준	권장 조치
1-10	낮음	유지
11-20	중간	리팩터링 검토
21-50	높음	리팩터링 필수
50 이상	매우 높음	즉시 분해

결합도 (Coupling)

결합도(Coupling)는 모듈 간 의존 정도를 측정합니다. 구심 결합도(Afferent Coupling, Ca)는 해당 모듈을 참조하는 외부 모듈의 수, 원심 결합도(Efferent Coupling, Ce)는 해당 모듈이 참조하는 외부 모듈의 수를 나타냅니다.

coupling_analyzer.py

python

import ast
from collections import defaultdict
from pathlib import Path
 
 
class CouplingAnalyzer:
    """모듈 간 결합도를 분석하는 도구"""
 
    def __init__(self, project_root: str):
        self.project_root = Path(project_root)
        self.imports: dict[str, set[str]] = defaultdict(set)
 
    def analyze(self) -> dict[str, dict[str, int]]:
        for py_file in self.project_root.rglob("*.py"):
            module_name = self._to_module_name(py_file)
            self._extract_imports(py_file, module_name)
 
        results = {}
        for module, deps in self.imports.items():
            ca = sum(
                1 for other_deps in self.imports.values()
                if module in other_deps
            )
            ce = len(deps)
            instability = ce / (ca + ce) if (ca + ce) > 0 else 0
 
            results[module] = {
                "afferent_coupling": ca,
                "efferent_coupling": ce,
                "instability": round(instability, 2),
            }
 
        return results
 
    def _extract_imports(self, filepath: Path, module_name: str):
        source = filepath.read_text()
        tree = ast.parse(source)
 
        for node in ast.walk(tree):
            if isinstance(node, ast.Import):
                for alias in node.names:
                    self.imports[module_name].add(alias.name)
            elif isinstance(node, ast.ImportFrom):
                if node.module:
                    self.imports[module_name].add(node.module)
 
    def _to_module_name(self, filepath: Path) -> str:
        relative = filepath.relative_to(self.project_root)
        return str(relative.with_suffix("")).replace("/", ".")

응집도 (Cohesion)

응집도(Cohesion)는 모듈 내부 요소들이 얼마나 밀접하게 관련되어 있는지를 측정합니다. 높은 응집도는 모듈이 하나의 명확한 책임을 가진다는 의미입니다.

LCOM(Lack of Cohesion of Methods)메서드 응집도 부족** 메트릭은 클래스 내 메서드들이 공유하는 인스턴스 변수의 비율로 응집도를 측정합니다. LCOM 값이 높을수록 응집도가 낮다는 의미이며, 클래스 분리를 고려해야 합니다.

Info

이 세 메트릭(순환 복잡도, 결합도, 응집도)은 LLM 기반 분석의 "사전 필터" 역할을 합니다. AST로 이 메트릭들을 먼저 계산하고, 임계값을 초과하는 코드 영역만 LLM에 전달하면 비용을 절감하면서도 분석 품질을 유지할 수 있습니다.

cAST: AST 기반 청킹

cAST(chunked AST)는 AST를 기반으로 코드를 의미 있는 단위로 분할하는 기법입니다. RAG(Retrieval-Augmented Generation) 파이프라인에서 코드를 LLM에 효과적으로 전달하기 위해 핵심적인 전처리 단계입니다.

기존 청킹의 문제점

텍스트 기반 청킹(고정 길이, 줄 수 기반)은 코드의 의미적 경계를 무시합니다. 함수가 중간에 잘리거나, 관련 없는 코드가 하나의 청크에 포함될 수 있습니다.

cAST의 작동 원리

cAST는 두 단계로 동작합니다.

1단계: 재귀적 분할(Recursive Partitioning)

AST 노드를 기준으로 코드를 재귀적으로 분할합니다. 함수, 클래스, 메서드 등 의미적 단위를 경계로 사용합니다.

2단계: 의미 블록 병합(Semantic Block Merging)

너무 작은 청크는 인접한 관련 청크와 병합하여 적절한 크기를 유지합니다.

cast_chunker.py

python

import ast
from dataclasses import dataclass
 
 
@dataclass
class CodeChunk:
    content: str
    chunk_type: str  # function, class, module_level
    name: str
    start_line: int
    end_line: int
    metadata: dict
 
 
class CASTChunker:
    """AST 기반 코드 청킹 엔진"""
 
    def __init__(self, max_chunk_size: int = 1500, min_chunk_size: int = 100):
        self.max_chunk_size = max_chunk_size
        self.min_chunk_size = min_chunk_size
 
    def chunk_file(self, source: str, filename: str) -> list[CodeChunk]:
        tree = ast.parse(source)
        lines = source.splitlines()
        chunks: list[CodeChunk] = []
 
        # 1단계: 최상위 노드 기준으로 분할
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                chunks.append(self._extract_function_chunk(node, lines))
            elif isinstance(node, ast.ClassDef):
                chunks.extend(self._extract_class_chunks(node, lines))
            elif isinstance(node, (ast.Import, ast.ImportFrom)):
                continue  # import는 모듈 수준 청크에 포함
 
        # 모듈 수준 코드 추출 (import, 전역 변수 등)
        module_chunk = self._extract_module_level(tree, lines)
        if module_chunk:
            chunks.insert(0, module_chunk)
 
        # 2단계: 작은 청크 병합
        return self._merge_small_chunks(chunks)
 
    def _extract_function_chunk(
        self, node: ast.FunctionDef, lines: list[str]
    ) -> CodeChunk:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        content = "\n".join(lines[start:end])
 
        return CodeChunk(
            content=content,
            chunk_type="function",
            name=node.name,
            start_line=node.lineno,
            end_line=end,
            metadata={
                "args": [arg.arg for arg in node.args.args],
                "complexity": self._quick_complexity(node),
                "has_docstring": ast.get_docstring(node) is not None,
            },
        )
 
    def _extract_class_chunks(
        self, node: ast.ClassDef, lines: list[str]
    ) -> list[CodeChunk]:
        chunks = []
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        full_content = "\n".join(lines[start:end])
 
        if len(full_content) <= self.max_chunk_size:
            chunks.append(CodeChunk(
                content=full_content,
                chunk_type="class",
                name=node.name,
                start_line=node.lineno,
                end_line=end,
                metadata={"method_count": sum(
                    1 for n in node.body
                    if isinstance(n, ast.FunctionDef)
                )},
            ))
        else:
            # 클래스가 크면 메서드별로 분할
            for child in node.body:
                if isinstance(child, (ast.FunctionDef, ast.AsyncFunctionDef)):
                    chunk = self._extract_function_chunk(child, lines)
                    chunk.name = f"{node.name}.{chunk.name}"
                    chunk.chunk_type = "method"
                    chunks.append(chunk)
 
        return chunks
 
    def _extract_module_level(
        self, tree: ast.Module, lines: list[str]
    ) -> CodeChunk | None:
        module_lines = []
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.Import, ast.ImportFrom, ast.Assign)):
                start = node.lineno - 1
                end = node.end_lineno or node.lineno
                module_lines.extend(lines[start:end])
 
        if module_lines:
            content = "\n".join(module_lines)
            return CodeChunk(
                content=content,
                chunk_type="module_level",
                name="module",
                start_line=1,
                end_line=len(module_lines),
                metadata={},
            )
        return None
 
    def _merge_small_chunks(self, chunks: list[CodeChunk]) -> list[CodeChunk]:
        if not chunks:
            return chunks
 
        merged = [chunks[0]]
        for chunk in chunks[1:]:
            prev = merged[-1]
            combined_size = len(prev.content) + len(chunk.content)
 
            if (len(chunk.content) < self.min_chunk_size
                    and combined_size <= self.max_chunk_size):
                prev.content += "\n\n" + chunk.content
                prev.end_line = chunk.end_line
            else:
                merged.append(chunk)
 
        return merged
 
    def _quick_complexity(self, node: ast.AST) -> int:
        complexity = 1
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
        return complexity

LLM + AST 하이브리드의 시너지

AST 단독 분석과 LLM 단독 분석 각각의 장단점을 결합하면 강력한 시너지가 발생합니다.

하이브리드 파이프라인 아키텍처

역할 분담

단계	담당	이유
구문 파싱	AST	정확성 100%, 비용 없음
메트릭 계산	AST	수학적 정밀성 필요
코드 청킹	AST	의미적 경계 보존
의미 분석	LLM	맥락 이해 필요
리팩터링 제안	LLM	창의적 문제 해결
문서 생성	LLM	자연어 생성
검증	AST + LLM	이중 검증으로 안전성 확보

Tip

AST는 "무엇이 있는가"를 정확하게 알려주고, LLM은 "어떻게 개선할 수 있는가"를 제안합니다. 이 분업이 하이브리드 분석의 핵심입니다. AST가 제공하는 정확한 구조 정보가 LLM의 환각(hallucination)을 억제하는 역할도 합니다.

에이전트 기반 프레임워크의 활용

RefAgent, MANTRA 같은 에이전트 기반 코드 분석 프레임워크는 이 하이브리드 접근을 체계적으로 구현합니다. Planner 에이전트가 AST를 통해 정적 분석을 수행하고, 그 결과를 기반으로 LLM 에이전트가 심층 분석과 리팩터링을 진행합니다.

hybrid_analysis_pipeline.py

python

from dataclasses import dataclass
 
 
@dataclass
class AnalysisResult:
    file_path: str
    metrics: dict
    chunks: list[dict]
    llm_insights: list[str]
    refactoring_suggestions: list[str]
 
 
async def hybrid_analyze(
    file_path: str,
    llm_client,
    complexity_threshold: int = 10,
) -> AnalysisResult:
    """AST + LLM 하이브리드 분석 파이프라인"""
 
    # 1단계: AST 기반 정적 분석
    with open(file_path) as f:
        source = f.read()
 
    analyzer = CodeAnalyzer()
    tree = ast.parse(source)
    analyzer.visit(tree)
 
    # 2단계: cAST 청킹
    chunker = CASTChunker()
    chunks = chunker.chunk_file(source, file_path)
 
    # 3단계: 복잡도 높은 함수만 LLM에 전달
    complex_functions = [
        func for func in analyzer.functions
        if func.complexity >= complexity_threshold
    ]
 
    llm_insights = []
    suggestions = []
 
    for func in complex_functions:
        # 해당 함수의 청크와 관련 컨텍스트 수집
        relevant_chunks = [
            c for c in chunks
            if c.start_line <= func.line_end
            and c.end_line >= func.line_start
        ]
 
        prompt = _build_analysis_prompt(func, relevant_chunks)
        response = await llm_client.analyze(prompt)
 
        llm_insights.extend(response.insights)
        suggestions.extend(response.suggestions)
 
    return AnalysisResult(
        file_path=file_path,
        metrics={
            "function_count": len(analyzer.functions),
            "avg_complexity": sum(
                f.complexity for f in analyzer.functions
            ) / max(len(analyzer.functions), 1),
            "max_complexity": max(
                (f.complexity for f in analyzer.functions), default=0
            ),
        },
        chunks=[{"name": c.name, "type": c.chunk_type} for c in chunks],
        llm_insights=llm_insights,
        refactoring_suggestions=suggestions,
    )

정리

AST와 LLM은 각각 고유한 강점을 가진 분석 도구입니다. AST는 코드의 구조를 정확하게 파싱하고 정량적 메트릭을 계산하는 데 탁월하며, LLM은 코드의 의미를 이해하고 개선 방안을 제안하는 데 강력합니다.

cAST 기반 청킹은 코드를 의미적 단위로 분할하여 LLM이 효과적으로 처리할 수 있게 하며, 순환 복잡도/결합도/응집도 메트릭은 LLM 분석의 우선순위를 결정하는 필터 역할을 합니다. 이 두 기술을 결합한 하이브리드 파이프라인은 비용 효율적이면서도 정확한 코드 분석을 가능하게 합니다.

다음 장 미리보기

3장에서는 이 하이브리드 분석 기법을 활용하여 레거시 코드를 자동으로 이해하고 문서화하는 방법을 다룹니다. 코드베이스 탐색 자동화, 함수/모듈 수준 설명 생성, 의존성 그래프 추출, 아키텍처 다이어그램 자동 생성까지 실습합니다.

이 글이 도움이 되셨나요?

AI / ML

3장: 레거시 코드 이해와 문서화

LLM을 활용하여 레거시 코드베이스를 자동으로 탐색하고 문서화하는 기법을 학습합니다. 의존성 그래프 추출, 아키텍처 다이어그램 생성, 인라인 주석 자동 생성을 다룹니다.

2026년 3월 9일·18분

AI / ML

1장: LLM 기반 코드 분석의 등장과 가능성

전통 정적 분석의 한계를 넘어 LLM이 코드를 이해하고 분석하는 새로운 패러다임을 소개합니다. 2026년 도구 생태계와 자동화 파이프라인의 가능성을 탐구합니다.

2026년 3월 5일·17분

AI / ML

4장: 코드 스멜 감지와 기술 부채 정량화

LLM 기반 코드 스멜 탐지와 CodeScene Code Health 메트릭을 활용한 기술 부채 정량화를 학습합니다. 우선순위 기반 리팩터링 계획 수립까지 다룹니다.

2026년 3월 11일·17분

2026년 3월 7일·AI / ML·

2장: AST와 LLM 하이브리드 분석

17분1,395자7개 섹션

code-quality ai llm devtools

code-analysis2 / 10

1 2 3 4 5 6 7 8 9 10

이전1장: LLM 기반 코드 분석의 등장과 가능성 다음3장: 레거시 코드 이해와 문서화

학습 목표

AST(추상 구문 트리)의 구조와 코드 분석에서의 역할을 이해합니다
순환 복잡도, 결합도, 응집도 등 핵심 정적 분석 메트릭을 학습합니다
cAST(AST 기반 청킹)의 원리와 RAG 파이프라인에서의 활용을 파악합니다
LLM과 AST를 결합한 하이브리드 분석의 시너지를 이해합니다

AST 기초: 코드의 뼈대를 읽다

AST가 표현하는 것

simple-function.ts

typescript

function calculateTotal(items: Item[], taxRate: number): number {
  const subtotal = items.reduce((sum, item) => sum + item.price, 0);
  return subtotal * (1 + taxRate);
}

이 간단한 함수가 AST로 변환되면 다음과 같은 트리 구조가 됩니다.

Python에서 AST 추출하기

Python은 표준 라이브러리에 ast 모듈을 내장하고 있어 AST 작업이 간편합니다.

ast_extractor.py

python

import ast
import json
from dataclasses import dataclass, asdict
 
 
@dataclass
class FunctionInfo:
    name: str
    args: list[str]
    returns: str | None
    complexity: int
    line_start: int
    line_end: int
    docstring: str | None
 
 
class CodeAnalyzer(ast.NodeVisitor):
    """AST를 순회하며 함수 정보를 추출하는 분석기"""
 
    def __init__(self):
        self.functions: list[FunctionInfo] = []
 
    def visit_FunctionDef(self, node: ast.FunctionDef):
        complexity = self._calculate_complexity(node)
        docstring = ast.get_docstring(node)
 
        info = FunctionInfo(
            name=node.name,
            args=[arg.arg for arg in node.args.args],
            returns=ast.unparse(node.returns) if node.returns else None,
            complexity=complexity,
            line_start=node.lineno,
            line_end=node.end_lineno or node.lineno,
            docstring=docstring,
        )
        self.functions.append(info)
        self.generic_visit(node)
 
    def _calculate_complexity(self, node: ast.AST) -> int:
        """McCabe 순환 복잡도 계산"""
        complexity = 1  # 기본 경로
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
            elif isinstance(child, ast.BoolOp):
                complexity += len(child.values) - 1
            elif isinstance(child, ast.ExceptHandler):
                complexity += 1
        return complexity
 
 
def analyze_file(filepath: str) -> list[dict]:
    with open(filepath) as f:
        source = f.read()
 
    tree = ast.parse(source)
    analyzer = CodeAnalyzer()
    analyzer.visit(tree)
 
    return [asdict(func) for func in analyzer.functions]

TypeScript에서 AST 추출하기

TypeScript에서는 ts-morph 라이브러리를 활용하면 타입 정보를 포함한 풍부한 AST 분석이 가능합니다.

ast-extractor.ts

typescript

import { Project, SyntaxKind, FunctionDeclaration } from "ts-morph";
 
interface FunctionMetrics {
  name: string;
  parameters: string[];
  returnType: string;
  complexity: number;
  lineCount: number;
  dependencies: string[];
}
 
function extractFunctionMetrics(
  project: Project,
  filePath: string,
): FunctionMetrics[] {
  const sourceFile = project.getSourceFileOrThrow(filePath);
  const functions = sourceFile.getFunctions();
 
  return functions.map((func) => ({
    name: func.getName() ?? "anonymous",
    parameters: func.getParameters().map((p) => p.getName()),
    returnType: func.getReturnType().getText(),
    complexity: calculateComplexity(func),
    lineCount: func.getEndLineNumber() - func.getStartLineNumber() + 1,
    dependencies: extractDependencies(func),
  }));
}
 
function calculateComplexity(func: FunctionDeclaration): number {
  let complexity = 1;
 
  func.forEachDescendant((node) => {
    switch (node.getKind()) {
      case SyntaxKind.IfStatement:
      case SyntaxKind.WhileStatement:
      case SyntaxKind.ForStatement:
      case SyntaxKind.ForInStatement:
      case SyntaxKind.ForOfStatement:
      case SyntaxKind.CatchClause:
      case SyntaxKind.ConditionalExpression:
        complexity++;
        break;
      case SyntaxKind.BinaryExpression:
        const operator = node.getChildAtIndex(1)?.getText();
        if (operator === "&&" || operator === "||") {
          complexity++;
        }
        break;
    }
  });
 
  return complexity;
}
 
function extractDependencies(func: FunctionDeclaration): string[] {
  const deps: Set<string> = new Set();
 
  func.forEachDescendant((node) => {
    if (node.getKind() === SyntaxKind.CallExpression) {
      const expression = node.getChildAtIndex(0);
      if (expression) {
        deps.add(expression.getText());
      }
    }
  });
 
  return Array.from(deps);
}

정적 분석 메트릭의 이해

AST에서 추출할 수 있는 핵심 메트릭 세 가지를 살펴보겠습니다.

순환 복잡도 (Cyclomatic Complexity)

순환 복잡도(Cyclomatic Complexity)는 코드의 독립 실행 경로 수를 나타내는 메트릭입니다. 값이 높을수록 코드가 복잡하고 테스트하기 어렵습니다.

복잡도 범위	위험 수준	권장 조치
1-10	낮음	유지
11-20	중간	리팩터링 검토
21-50	높음	리팩터링 필수
50 이상	매우 높음	즉시 분해

결합도 (Coupling)

coupling_analyzer.py

python

import ast
from collections import defaultdict
from pathlib import Path
 
 
class CouplingAnalyzer:
    """모듈 간 결합도를 분석하는 도구"""
 
    def __init__(self, project_root: str):
        self.project_root = Path(project_root)
        self.imports: dict[str, set[str]] = defaultdict(set)
 
    def analyze(self) -> dict[str, dict[str, int]]:
        for py_file in self.project_root.rglob("*.py"):
            module_name = self._to_module_name(py_file)
            self._extract_imports(py_file, module_name)
 
        results = {}
        for module, deps in self.imports.items():
            ca = sum(
                1 for other_deps in self.imports.values()
                if module in other_deps
            )
            ce = len(deps)
            instability = ce / (ca + ce) if (ca + ce) > 0 else 0
 
            results[module] = {
                "afferent_coupling": ca,
                "efferent_coupling": ce,
                "instability": round(instability, 2),
            }
 
        return results
 
    def _extract_imports(self, filepath: Path, module_name: str):
        source = filepath.read_text()
        tree = ast.parse(source)
 
        for node in ast.walk(tree):
            if isinstance(node, ast.Import):
                for alias in node.names:
                    self.imports[module_name].add(alias.name)
            elif isinstance(node, ast.ImportFrom):
                if node.module:
                    self.imports[module_name].add(node.module)
 
    def _to_module_name(self, filepath: Path) -> str:
        relative = filepath.relative_to(self.project_root)
        return str(relative.with_suffix("")).replace("/", ".")

AST 노드를 기준으로 코드를 재귀적으로 분할합니다. 함수, 클래스, 메서드 등 의미적 단위를 경계로 사용합니다.

2단계: 의미 블록 병합(Semantic Block Merging)

너무 작은 청크는 인접한 관련 청크와 병합하여 적절한 크기를 유지합니다.

cast_chunker.py

python

import ast
from dataclasses import dataclass
 
 
@dataclass
class CodeChunk:
    content: str
    chunk_type: str  # function, class, module_level
    name: str
    start_line: int
    end_line: int
    metadata: dict
 
 
class CASTChunker:
    """AST 기반 코드 청킹 엔진"""
 
    def __init__(self, max_chunk_size: int = 1500, min_chunk_size: int = 100):
        self.max_chunk_size = max_chunk_size
        self.min_chunk_size = min_chunk_size
 
    def chunk_file(self, source: str, filename: str) -> list[CodeChunk]:
        tree = ast.parse(source)
        lines = source.splitlines()
        chunks: list[CodeChunk] = []
 
        # 1단계: 최상위 노드 기준으로 분할
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                chunks.append(self._extract_function_chunk(node, lines))
            elif isinstance(node, ast.ClassDef):
                chunks.extend(self._extract_class_chunks(node, lines))
            elif isinstance(node, (ast.Import, ast.ImportFrom)):
                continue  # import는 모듈 수준 청크에 포함
 
        # 모듈 수준 코드 추출 (import, 전역 변수 등)
        module_chunk = self._extract_module_level(tree, lines)
        if module_chunk:
            chunks.insert(0, module_chunk)
 
        # 2단계: 작은 청크 병합
        return self._merge_small_chunks(chunks)
 
    def _extract_function_chunk(
        self, node: ast.FunctionDef, lines: list[str]
    ) -> CodeChunk:
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        content = "\n".join(lines[start:end])
 
        return CodeChunk(
            content=content,
            chunk_type="function",
            name=node.name,
            start_line=node.lineno,
            end_line=end,
            metadata={
                "args": [arg.arg for arg in node.args.args],
                "complexity": self._quick_complexity(node),
                "has_docstring": ast.get_docstring(node) is not None,
            },
        )
 
    def _extract_class_chunks(
        self, node: ast.ClassDef, lines: list[str]
    ) -> list[CodeChunk]:
        chunks = []
        start = node.lineno - 1
        end = node.end_lineno or node.lineno
        full_content = "\n".join(lines[start:end])
 
        if len(full_content) <= self.max_chunk_size:
            chunks.append(CodeChunk(
                content=full_content,
                chunk_type="class",
                name=node.name,
                start_line=node.lineno,
                end_line=end,
                metadata={"method_count": sum(
                    1 for n in node.body
                    if isinstance(n, ast.FunctionDef)
                )},
            ))
        else:
            # 클래스가 크면 메서드별로 분할
            for child in node.body:
                if isinstance(child, (ast.FunctionDef, ast.AsyncFunctionDef)):
                    chunk = self._extract_function_chunk(child, lines)
                    chunk.name = f"{node.name}.{chunk.name}"
                    chunk.chunk_type = "method"
                    chunks.append(chunk)
 
        return chunks
 
    def _extract_module_level(
        self, tree: ast.Module, lines: list[str]
    ) -> CodeChunk | None:
        module_lines = []
        for node in ast.iter_child_nodes(tree):
            if isinstance(node, (ast.Import, ast.ImportFrom, ast.Assign)):
                start = node.lineno - 1
                end = node.end_lineno or node.lineno
                module_lines.extend(lines[start:end])
 
        if module_lines:
            content = "\n".join(module_lines)
            return CodeChunk(
                content=content,
                chunk_type="module_level",
                name="module",
                start_line=1,
                end_line=len(module_lines),
                metadata={},
            )
        return None
 
    def _merge_small_chunks(self, chunks: list[CodeChunk]) -> list[CodeChunk]:
        if not chunks:
            return chunks
 
        merged = [chunks[0]]
        for chunk in chunks[1:]:
            prev = merged[-1]
            combined_size = len(prev.content) + len(chunk.content)
 
            if (len(chunk.content) < self.min_chunk_size
                    and combined_size <= self.max_chunk_size):
                prev.content += "\n\n" + chunk.content
                prev.end_line = chunk.end_line
            else:
                merged.append(chunk)
 
        return merged
 
    def _quick_complexity(self, node: ast.AST) -> int:
        complexity = 1
        for child in ast.walk(node):
            if isinstance(child, (ast.If, ast.While, ast.For)):
                complexity += 1
        return complexity

단계	담당	이유
구문 파싱	AST	정확성 100%, 비용 없음
메트릭 계산	AST	수학적 정밀성 필요
코드 청킹	AST	의미적 경계 보존
의미 분석	LLM	맥락 이해 필요
리팩터링 제안	LLM	창의적 문제 해결
문서 생성	LLM	자연어 생성
검증	AST + LLM	이중 검증으로 안전성 확보

Tip

에이전트 기반 프레임워크의 활용

hybrid_analysis_pipeline.py

python

from dataclasses import dataclass
 
 
@dataclass
class AnalysisResult:
    file_path: str
    metrics: dict
    chunks: list[dict]
    llm_insights: list[str]
    refactoring_suggestions: list[str]
 
 
async def hybrid_analyze(
    file_path: str,
    llm_client,
    complexity_threshold: int = 10,
) -> AnalysisResult:
    """AST + LLM 하이브리드 분석 파이프라인"""
 
    # 1단계: AST 기반 정적 분석
    with open(file_path) as f:
        source = f.read()
 
    analyzer = CodeAnalyzer()
    tree = ast.parse(source)
    analyzer.visit(tree)
 
    # 2단계: cAST 청킹
    chunker = CASTChunker()
    chunks = chunker.chunk_file(source, file_path)
 
    # 3단계: 복잡도 높은 함수만 LLM에 전달
    complex_functions = [
        func for func in analyzer.functions
        if func.complexity >= complexity_threshold
    ]
 
    llm_insights = []
    suggestions = []
 
    for func in complex_functions:
        # 해당 함수의 청크와 관련 컨텍스트 수집
        relevant_chunks = [
            c for c in chunks
            if c.start_line <= func.line_end
            and c.end_line >= func.line_start
        ]
 
        prompt = _build_analysis_prompt(func, relevant_chunks)
        response = await llm_client.analyze(prompt)
 
        llm_insights.extend(response.insights)
        suggestions.extend(response.suggestions)
 
    return AnalysisResult(
        file_path=file_path,
        metrics={
            "function_count": len(analyzer.functions),
            "avg_complexity": sum(
                f.complexity for f in analyzer.functions
            ) / max(len(analyzer.functions), 1),
            "max_complexity": max(
                (f.complexity for f in analyzer.functions), default=0
            ),
        },
        chunks=[{"name": c.name, "type": c.chunk_type} for c in chunks],
        llm_insights=llm_insights,
        refactoring_suggestions=suggestions,
    )

관련 글

3장: 레거시 코드 이해와 문서화

1장: LLM 기반 코드 분석의 등장과 가능성

4장: 코드 스멜 감지와 기술 부채 정량화

댓글

관련 글

3장: 레거시 코드 이해와 문서화

1장: LLM 기반 코드 분석의 등장과 가능성

4장: 코드 스멜 감지와 기술 부채 정량화

댓글