파일 I/O: 로그와 설정 파일 다루기

핵심 개념

로그 파일 읽기, CSV 파싱, JSON 설정 파일 처리 — 보안 분석의 기본 입출력.

본문

대용량 파일 효율 읽기

PYTHON📋 코드 (19줄)

# ⚠️ 이 코드는 허가된 환경에서만 사용하세요.

# ❌ 메모리 폭발 — 1GB 로그 파일이면 메모리 OOM
with open('access.log') as f:
    lines = f.readlines()  # 전체 메모리 적재

# ✅ 한 줄씩 스트리밍
def scan_log(path: str, pattern: str):
    """대용량 로그를 메모리 효율적으로 스캔."""
    import re
    p = re.compile(pattern)
    with open(path, encoding='utf-8', errors='replace') as f:
        for lineno, line in enumerate(f, 1):
            if p.search(line):
                yield lineno, line.rstrip()

# 사용 예
for lineno, line in scan_log('access.log', r'\b401\b'):
    print(f'{lineno}: {line}')

CSV 로그 파싱

PYTHON📋 코드 (14줄)

import csv
from collections import Counter

# 침입 탐지 시스템(IDS) 로그가 CSV 형식일 때
def top_attackers(csv_path: str, top: int = 10):
    counter = Counter()
    with open(csv_path, encoding='utf-8') as f:
        reader = csv.DictReader(f)  # 헤더 자동 인식
        for row in reader:
            if row.get('action') == 'BLOCK':
                counter[row['source_ip']] += 1
    return counter.most_common(top)

# print(top_attackers('ids_log.csv'))

JSON 설정 파일

PYTHON📋 코드 (19줄)

import json

# 보안 점검 설정을 외부 파일로 분리
SCAN_CONFIG = {
    'targets': ['127.0.0.1', '192.168.1.0/24'],
    'ports': [22, 80, 443, 3306, 8080],
    'timeout': 2.0,
    'rate_limit': 10,
}

# 저장
with open('scan_config.json', 'w', encoding='utf-8') as f:
    json.dump(SCAN_CONFIG, f, indent=2, ensure_ascii=False)

# 읽기 + 검증
with open('scan_config.json', encoding='utf-8') as f:
    cfg = json.load(f)
    assert isinstance(cfg.get('targets'), list), 'targets는 리스트여야 함'
    assert all(isinstance(p, int) for p in cfg.get('ports', [])), '포트는 정수'

실습: 웹 서버 로그에서 비정상 IP 추출

PYTHON📋 코드 (31줄)

import re
from collections import Counter

# Apache/Nginx 일반 형식
LOG_PATTERN = re.compile(
    r'^(?P<ip>\S+)\s+\S+\s+\S+\s+'
    r'\[(?P<time>[^\]]+)\]\s+'
    r'"(?P<request>[^"]+)"\s+'
    r'(?P<status>\d{3})'
)

def find_attackers(log_path: str, threshold: int = 50) -> dict:
    """status 401/403/404를 너무 자주 받는 IP를 탐지."""
    failures = Counter()
    with open(log_path, encoding='utf-8', errors='replace') as f:
        for line in f:
            m = LOG_PATTERN.search(line)
            if m and m.group('status') in ('401', '403', '404'):
                failures[m.group('ip')] += 1
    return {ip: cnt for ip, cnt in failures.items() if cnt >= threshold}

# 결과를 JSON 리포트로 저장
def save_report(attackers: dict, out_path: str):
    import json, datetime
    report = {
        'generated_at': datetime.datetime.now().isoformat(),
        'total_attackers': len(attackers),
        'attackers': sorted(attackers.items(), key=lambda x: -x[1]),
    }
    with open(out_path, 'w', encoding='utf-8') as f:
        json.dump(report, f, indent=2, ensure_ascii=False)

안전한 파일 처리 체크리스트

✅ with open() 컨텍스트 매니저 — 자동 close
✅ encoding='utf-8' 명시 — 운영체제별 다른 기본값 회피
✅ errors='replace' — 깨진 인코딩 라인 무시
✅ Path traversal 방어 — os.path.realpath로 경로 정규화 후 검증

AI 프롬프트

🤖 AI에게 잘 물어보는 법 — 모델·전략별 프롬프트

Claude

무료: Sonnet 4.6 / Pro $20/mo: Opus 4.6

내 로그 파싱 스크립트의
메모리 사용량과 Path traversal 취약점을 분석하고
안전한 형태로 리팩토링해줘.

ChatGPT

무료: GPT-5.5 / Plus $20/mo: GPT-5.5 Pro

대용량 로그(1GB+)를 빠르게 분석하는
실전 패턴 3가지(generator/multiprocessing/grep+Python)를
각각 코드로 비교해줘.

Gemini

무료: 2.5 Flash / Pro $19.99/mo: 3.1 Pro

이 Apache/Nginx 로그 파일 전체를 분석해서
시간대별 트래픽·비정상 IP·의심 User-Agent를
종합 리포트로 만들어줘.

Grok

무료: Grok 4.1 / SuperGrok $30/mo

2026년 SIEM(보안 이벤트 관리) 시장에서
Python 자체 분석 vs Splunk/ELK 같은 상용 도구
어떤 시점에 전환해야 하는지 솔직히 알려줘.

⭐ 이것만 기억하세요

파일 I/O: 로그와 설정 파일 다루기는 이 3가지만 확실히 잡으세요

1.대용량 파일은 readlines() 대신 줄 단위 제너레이터로 메모리 효율을 확보한다

2.csv·json 표준 라이브러리로 IDS 로그·설정 파일을 안전하게 다룰 수 있다

3.다음 챕터에서 에러 핸들링과 logging으로 강건한 보안 스크립트를 만든다

💬 이 챕터 질문 보기

SECURITY · CH.59 — 질문하거나 답변을 확인하세요

→

진행도 59 / 84

← 커리큘럼으로 ← 목록으로 (화이트햇 보안)