AI로 자동 요약 & 이메일 본문 작성 시스템, 하루 만에 MVP cheoly's language study blog

SMALL

업무 보고서(PDF/DOCX)를 자동으로 읽고 핵심만 요약한 뒤, 수신자/상황에 맞춘 이메일 초안을 자동 생성하는 파이프라인을 Python과 LLM으로 구현합니다. 파일 파싱 → 청크 분할 → 요약 → 이메일 템플릿 생성 → 발송 전 검토까지 한 번에.

PDF 보고서가 AI 로봇을 거쳐 이메일로 변환되는 과정을 나타낸 일러스트. 중앙의 AI 로봇 아이콘을 중심으로 양쪽에 보고서와 이메일 아이콘이 배치되어 있으며, 아래에는 ‘AI로 자동 요약 & 이메일 본문 작성 시스템, 하루 만에 MVP’라는 문구가 적혀 있는 이미지.

1) 목표

보고서(회의록/리포트)를 투입하면 요약본과 이메일 제목/본문 초안이 자동 생성.
사용자는 검토만 하고 전송.
재현 가능한 CLI 스크립트로 배치 실행(스케줄러 연동).

2) 아키텍처 개요

수집: input/ 폴더의 PDF, DOCX, TXT 로드
파싱: PDF → 텍스트, DOCX → 텍스트
전처리: 문단 기준 분할, 길이 제한(토큰/문자) 맞춰 청킹
요약: LLM으로 청크 요약 → 메타 요약(최종 TL;DR)
이메일 생성: 수신자/맥락/톤을 조건으로 제목+본문 작성
출력: output/yyyymmdd_xxx/summary.md, email_draft.md 저장

input/                  # 원본 보고서 위치
output/
  2025-11-11_0930/      # 실행 시각별 결과 폴더
    summary.md
    email_draft.md
config/
  profile.yaml          # 이메일 톤/수신자/금칙어/서명 등

3) 준비물

Python 3.10+
패키지: pypdf2(또는 pdfminer.six), python-docx, tqdm, pyyaml, (선택) tenacity 재시도
LLM 제공자 SDK (예: OpenAI/Anthropic 등) 또는 로컬 모델(HuggingFace)
→ 코드는 모델-중립 인터페이스로 제공 (플러그 교체식)

pip install PyPDF2 python-docx tqdm pyyaml tenacity
# (사용 LLM에 맞춰 SDK 추가 설치)

4) 핵심 로직

청킹 전략: 문단 기준 800~1200자(한글) 또는 400~800 토큰 단위
요약 2단계: (1) 청크별 압축 요약 → (2) 청크 요약들을 다시 합쳐 메타 요약
이메일 생성: 요약 + 수신자 역할 + 톤(격식/친근/임원 보고) + CTA(다음 액션) 반영

5) 예시 코드 (모델-중립)

아래는 “LLM 클라이언트” 인터페이스만 갈아끼우면 동작하는 구조예요.
실제 호출 부분은 YourLLM 클래스에서 LLM SDK에 맞게 구현하세요.

# file: ai_report_mail_mvp.py
import os, re, glob, datetime, textwrap, yaml
from dataclasses import dataclass
from typing import List
from tqdm import tqdm

# ---------- LLM 인터페이스(여기만 실제 SDK로 교체) ----------
class YourLLM:
    def __init__(self, model_name: str = "your-model"):
        self.model_name = model_name
    def complete(self, prompt: str, max_tokens: int = 1024) -> str:
        # TODO: 여기에 사용 LLM SDK 호출 코드 작성
        # e.g., OpenAI/Anthropic/HF 텍스트 생성
        # return client.generate(prompt, ...)
        raise NotImplementedError("Connect your LLM provider here.")

# ---------- 파일 파서 ----------
def read_txt(path: str) -> str:
    return open(path, "r", encoding="utf-8", errors="ignore").read()

def read_docx(path: str) -> str:
    from docx import Document
    doc = Document(path)
    return "\n".join([p.text for p in doc.paragraphs])

def read_pdf(path: str) -> str:
    from PyPDF2 import PdfReader
    reader = PdfReader(path)
    texts = []
    for page in reader.pages:
        texts.append(page.extract_text() or "")
    return "\n".join(texts)

def load_text(path: str) -> str:
    ext = os.path.splitext(path)[1].lower()
    if ext == ".txt":
        return read_txt(path)
    if ext == ".docx":
        return read_docx(path)
    if ext == ".pdf":
        return read_pdf(path)
    raise ValueError(f"Unsupported file type: {ext}")

# ---------- 전처리/청킹 ----------
def clean_text(s: str) -> str:
    s = s.replace("\r\n", "\n").replace("\r", "\n")
    s = re.sub(r"\n{3,}", "\n\n", s)
    return s.strip()

def split_paragraphs(s: str) -> List[str]:
    paras = [p.strip() for p in s.split("\n\n") if p.strip()]
    return paras

def chunk_by_chars(paras: List[str], max_chars: int = 1200) -> List[str]:
    chunks, buf = [], []
    size = 0
    for p in paras:
        if size + len(p) + 2 > max_chars and buf:
            chunks.append("\n\n".join(buf))
            buf, size = [], 0
        buf.append(p)
        size += len(p) + 2
    if buf:
        chunks.append("\n\n".join(buf))
    return chunks

# ---------- 요약 프롬프트 ----------
SUMMARY_PROMPT = """당신은 간결하고 정확한 전문 비서입니다.
아래 보고서 청크를 5줄 이내 핵심 bullet로 요약하고, 수치/결정/담당자/일정을 보존하세요.
가능하면 '결정사항/리스크/액션아이템' 세 영역으로 나눠주세요.

[청크]
{chunk}
"""

META_SUMMARY_PROMPT = """아래는 보고서의 부분 요약들입니다.
중복을 제거하고 전사에게 공유할 수 있는 최종 요약으로 8줄 이내로 통합하세요.
'핵심 요지', '결정사항', '리스크', '향후 일정/요청사항'을 소제목으로 구분하세요.

[부분 요약들]
{chunk_summaries}
"""

EMAIL_PROMPT = """당신은 한국어 비즈니스 이메일 작성 전문가입니다.
아래 최종 요약을 바탕으로 수신자({recipient_role})에게 보낼 메일 초안을 작성하세요.

조건:
- 제목: 1줄, 50자 이내, 핵심 키워드 포함
- 본문: 6~12줄, 결론 먼저(핵심/결정/요청 → 근거), 불필요한 수식어 금지
- 톤: {tone}
- CTA: 수신자가 취해야 할 다음 행동 2~3개 bullet
- 금칙어: {banned_words}

[최종 요약]
{final_summary}
"""

# ---------- 메인 파이프라인 ----------
@dataclass
class Profile:
    recipient_role: str = "팀장"
    tone: str = "격식 있고 간결하게"
    banned_words: str = "죄송;부탁;최대한;어쨌든"

def summarize_chunks(llm: YourLLM, chunks: List[str]) -> List[str]:
    results = []
    for c in tqdm(chunks, desc="Summarizing chunks"):
        prompt = SUMMARY_PROMPT.format(chunk=c)
        results.append(llm.complete(prompt))
    return results

def meta_summarize(llm: YourLLM, chunk_summaries: List[str]) -> str:
    joined = "\n\n---\n\n".join(chunk_summaries)
    prompt = META_SUMMARY_PROMPT.format(chunk_summaries=joined)
    return llm.complete(prompt, max_tokens=800)

def generate_email(llm: YourLLM, final_summary: str, profile: Profile) -> str:
    prompt = EMAIL_PROMPT.format(
        recipient_role=profile.recipient_role,
        tone=profile.tone,
        banned_words=profile.banned_words,
        final_summary=final_summary,
    )
    return llm.complete(prompt, max_tokens=800)

def ensure_dir(path: str):
    os.makedirs(path, exist_ok=True)

def save_text(path: str, content: str):
    with open(path, "w", encoding="utf-8") as f:
        f.write(content.strip() + "\n")

def main():
    # 1) 설정 로드
    cfg_path = "config/profile.yaml"
    if os.path.exists(cfg_path):
        cfg = yaml.safe_load(open(cfg_path, "r", encoding="utf-8"))
        profile = Profile(**cfg)
    else:
        profile = Profile()

    # 2) 입력 파일 수집
    files = []
    for ext in ("*.pdf", "*.docx", "*.txt"):
        files.extend(glob.glob(os.path.join("input", ext)))
    if not files:
        print("No input files in ./input")
        return

    # 3) LLM 준비 (여기 구현)
    llm = YourLLM(model_name="your-model")

    # 4) 출력 폴더
    stamp = datetime.datetime.now().strftime("%Y-%m-%d_%H%M")
    outdir = os.path.join("output", stamp)
    ensure_dir(outdir)

    # 5) 파일별 처리
    all_chunk_summaries = []
    for fp in files:
        raw = load_text(fp)
        txt = clean_text(raw)
        chunks = chunk_by_chars(split_paragraphs(txt), max_chars=1200)
        chunk_sums = summarize_chunks(llm, chunks)
        # 파일별 요약 저장(옵션)
        save_text(os.path.join(outdir, f"{os.path.basename(fp)}.chunksum.md"),
                  "\n\n---\n\n".join(chunk_sums))
        all_chunk_summaries.extend(chunk_sums)

    # 6) 메타 요약
    final_summary = meta_summarize(llm, all_chunk_summaries)
    save_text(os.path.join(outdir, "summary.md"), final_summary)

    # 7) 이메일 초안 생성
    email_md = generate_email(llm, final_summary, profile)
    save_text(os.path.join(outdir, "email_draft.md"), email_md)

    print(f"Done. See: {outdir}")

if __name__ == "__main__":
    main()

config/profile.yaml 예시

recipient_role: "본부장"
tone: "임원 보고 톤으로 간결하고 단정하게"
banned_words: "죄송;부탁;최대한;아무래도;일단"

6) LLM 연결 힌트

위 YourLLM.complete()에 사용 중인 LLM SDK 호출만 채워 넣으면 됩니다.
토큰 제한이 작은 모델은 max_chars를 낮추세요(예: 800자).

7) 배치/자동화

Windows: 작업 스케줄러에서 python ai_report_mail_mvp.py 매일 08:30 실행
macOS/Linux: crontab -e

30 8 * * 1-5 /usr/bin/python3 /path/ai_report_mail_mvp.py

8) 품질 팁

금칙어/톤/CTA는 반드시 profile.yaml로 관리(조직별 가이드 반영).
민감정보(인명/금액)는 요약 보존 규칙을 프롬프트에 명시.
요약 정확도 검증을 위해 샘플 원문 → 요약 대조 체크리스트 운용.

9) 확장 아이디어

수신자별 템플릿(영업/개발/임원) 스위치
다국어 이메일(ko→en) 동시 생성
메일 API 연동(초안 자동 업로드까지만, 발송은 사람 확인 후)

LIST

저작자표시 (새창열림)

'파이썬' 카테고리의 다른 글

파이썬 머신러닝 기초: 입문자가 꼭 알아야 할 핵심 개념 정리 (0)	2025.11.18
AI로 자동 보고서 요약 & 이메일 작성 시스템 구상기— 내일은 이걸 실제로 만들어본다! (0)	2025.11.11
퇴근 후 30분, 자동화로 하루를 정리하자 — 엑셀·PDF·이메일까지 한 번에! (0)	2025.11.10
🐍 파이썬 자동화, 완전 정복! (4) - 웹 크롤링을 넘어선 브라우저 조작 자동화 (Selenium/Playwright) (0)	2025.11.09
💻 파이썬 자동화, 완전 정복! (3) - 윈도우/리눅스 작업 스케줄러 등록 실전 가이드 (0)	2025.11.08

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

1) 목표

2) 아키텍처 개요

3) 준비물

4) 핵심 로직

5) 예시 코드 (모델-중립)

6) LLM 연결 힌트

7) 배치/자동화

8) 품질 팁

9) 확장 아이디어

'파이썬' 카테고리의 다른 글

티스토리툴바