throttled-py와 FastAPI로 Rate Limiting 모니터링 구축하기

이 글은 시리즈로 게재될 예정입니다.

들어가며

API 서버에 Rate Limiting을 적용하는 건 어렵지 않다. 하지만 운영 단계에서 "지금 어떤 사용자가 얼마나 제한당하고 있는지", "특정 엔드포인트의 rate limit 설정이 적절한지"를 파악하는 건 별개의 문제다.

throttled-py는 OpenTelemetry 기반의 관측 기능을 내장하기 위해 준비하고 있다. 이 글에서는 FastAPI 프로젝트에 throttled-py를 연동하고, 실제로 Prometheus + Grafana 대시보드에서 rate limiting 지표를 확인하는 과정을 다룬다.

이 글에서 다루는 것:

throttled-py의 Hook 시스템과 OTelHook의 구조
FastAPI 프로젝트에서 OTel 메트릭 파이프라인 구성
Prometheus + Grafana로 rate limiting 대시보드 만들기
실무에서 유용한 알림 설정 예시

이 글에서 다루지 않는 것:

Rate Limiting 알고리즘 자체에 대한 설명
OpenTelemetry의 Trace, Log 시그널 (메트릭만 다룸)
Kubernetes 환경 배포

1. 왜 Rate Limiting에 관측이 필요한가

Rate Limiter를 설정하고 나면 흔히 이런 질문이 생긴다:

"429가 급증했는데, 어떤 키에서 발생한 거지?"
"Rate limit을 분당 100으로 잡았는데, 실제로 적절한 값인지 어떻게 알지?"
"특정 사용자가 지속적으로 제한당하고 있다면, 정상 사용자인가 악의적 접근인가?"

로그만으로는 이런 질문에 답하기 어렵다. 집계된 메트릭과 시각화가 있어야 패턴이 보인다.

2. 아키텍처 개요

┌─────────────────────────────────────────────────┐
│ FastAPI Application                              │
│                                                  │
│  Request → Throttled(hooks=[OTelHook()])         │
│                    │                             │
│                    ▼                             │
│            OTel Meter API                        │
│            (throttled.requests, duration)         │
│                    │                             │
│            OTel SDK (MeterProvider)               │
│                    │                             │
│            PrometheusMetricReader                 │
│                    │                             │
│            :9464/metrics  ←─── Prometheus scrape  │
└─────────────────────────────────────────────────┘
                                      │
                                      ▼
                              ┌──────────────┐
                              │  Prometheus   │
                              └──────┬───────┘
                                     │
                                     ▼
                              ┌──────────────┐
                              │   Grafana     │
                              │  Dashboard    │
                              └──────────────┘

핵심은 레이어 분리다:

throttled-py — 뭘 측정할지 결정 (opentelemetry-api만 의존)
애플리케이션 — 어디로 보낼지 결정 (opentelemetry-sdk + exporter 설정)
인프라 — 저장하고 시각화 (Prometheus, Grafana)

3. 프로젝트 셋업

의존성 설치

pip install fastapi uvicorn
pip install throttled[otel]         # throttled-py + OTelHook
pip install opentelemetry-sdk
pip install opentelemetry-exporter-prometheus
pip install prometheus-client

프로젝트 구조

my-api/
├── app/
│   ├── main.py            # FastAPI 앱 + OTel 설정
│   ├── rate_limit.py      # Throttled 인스턴스 관리
│   └── api/
│       └── routes.py
├── infra/
│   ├── docker-compose.yml # Prometheus + Grafana
│   ├── prometheus.yml
│   └── grafana/
│       └── dashboard.json
└── requirements.txt

4. 코드 구성

4.1. OTel 메트릭 파이프라인 초기화

# app/observability.py
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.metrics import set_meter_provider
from prometheus_client import start_http_server


def setup_metrics():
    """OTel 메트릭 파이프라인을 구성한다.

    이 함수는 애플리케이션 시작 시 한 번만 호출한다.
    Prometheus가 scrape할 엔드포인트를 :9464에 노출한다.
    """
    resource = Resource.create({
        "service.name": "my-api",
        "service.version": "1.0.0",
        "deployment.environment": "production",
    })

    reader = PrometheusMetricReader()
    provider = MeterProvider(resource=resource, metric_readers=[reader])
    set_meter_provider(provider)

    # Prometheus scrape 엔드포인트
    start_http_server(port=9464, addr="0.0.0.0")

4.2. Rate Limiter 설정

# app/rate_limit.py
from throttled import Throttled, RateQuota, Rate
from throttled.contrib.otel import OTelHook

# OTelHook만 달아주면 메트릭 수집이 시작된다.
# 어떤 메트릭을, 어떤 attribute로 수집할지는 OTelHook 내부에서 결정한다.
throttle = Throttled(
    using="redis",
    rate_quota=RateQuota(rate=Rate(100, 60)),  # 분당 100회
    hooks=[OTelHook()],
)

4.3. FastAPI 라우트에서 사용

# app/api/routes.py
from fastapi import APIRouter, HTTPException, Request
from app.rate_limit import throttle

router = APIRouter()

@router.get("/api/resource")
async def get_resource(request: Request):
    # 클라이언트 IP 또는 사용자 ID를 키로 사용
    key = request.client.host
    result = throttle.limit(key)

    if not result.allowed:
        raise HTTPException(
            status_code=429,
            detail="Too many requests",
            headers={"Retry-After": str(int(result.retry_after.total_seconds()))},
        )

    return {"data": "ok"}

4.4. 애플리케이션 엔트리포인트

# app/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from app.observability import setup_metrics
from app.api.routes import router


@asynccontextmanager
async def lifespan(app: FastAPI):
    setup_metrics()
    yield

app = FastAPI(lifespan=lifespan)
app.include_router(router)

5. 인프라 구성 (Prometheus + Grafana)

docker-compose.yml

version: "3.8"
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

prometheus.yml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "my-api"
    static_configs:
      - targets: ["host.docker.internal:9464"]

6. OTelHook이 수집하는 메트릭

OTelHook은 두 가지 메트릭을 생성한다:

메트릭	타입	설명
`throttled.requests`	Counter	Rate limit 체크 횟수 (허용/거부별)
`throttled.duration`	Histogram	Rate limit 체크 소요 시간

각 메트릭에는 다음 attribute가 포함된다:

Attribute	예시	용도
`key`	`"192.168.1.1"`	어떤 키가 제한당하는지
`algorithm`	`"gcra"`	어떤 알고리즘을 사용 중인지
`store_type`	`"redis"`	어떤 저장소를 사용 중인지
`result`	`"allowed"` / `"denied"`	허용/거부 여부

throttled.requests는 단순 호출 횟수가 아니라 **소비된 토큰 수(cost)**를 기록한다는 점이 중요하다. 한 번의 API 호출이 여러 토큰을 소비하는 경우에도 정확한 사용량을 반영한다.

7. Grafana 대시보드 구성

7.1. 분당 요청 허용/거부 비율

# 분당 허용된 요청 수
sum(rate(throttled_requests_total{result="allowed"}[5m])) * 60

# 분당 거부된 요청 수
sum(rate(throttled_requests_total{result="denied"}[5m])) * 60

# 거부율 (%)
sum(rate(throttled_requests_total{result="denied"}[5m]))
/
sum(rate(throttled_requests_total[5m]))
* 100

7.2. 키별 거부 Top 10

topk(10,
  sum by (key) (
    rate(throttled_requests_total{result="denied"}[5m])
  )
)

어떤 사용자 또는 IP가 가장 많이 제한당하고 있는지 한눈에 볼 수 있다.

7.3. Rate Limit 체크 지연시간 (p99)

histogram_quantile(0.99,
  sum by (le) (
    rate(throttled_duration_seconds_bucket[5m])
  )
)

Redis 연결 문제나 성능 저하가 rate limit 체크 자체에 영향을 주고 있는지 확인할 수 있다.

8. 실무에서 유용한 알림 규칙

거부율이 급증할 때

# Prometheus alerting rule
groups:
  - name: rate_limiting
    rules:
      - alert: HighRateLimitDenialRate
        expr: |
          sum(rate(throttled_requests_total{result="denied"}[5m]))
          /
          sum(rate(throttled_requests_total[5m]))
          > 0.3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Rate limit 거부율이 30%를 초과했습니다"

Rate Limit 체크 자체가 느려질 때

      - alert: SlowRateLimitCheck
        expr: |
          histogram_quantile(0.99,
            sum by (le) (rate(throttled_duration_seconds_bucket[5m]))
          ) > 0.05
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "Rate limit 체크 p99 지연시간이 50ms를 초과했습니다"

이 알림은 보통 Redis 연결 문제의 조기 신호가 된다.

9. 다른 관측 도구를 사용한다면

이 글에서는 Prometheus + Grafana 조합을 사용했지만, OTelHook은 OpenTelemetry API만 사용하므로 MeterProvider 설정만 바꾸면 어떤 백엔드로든 보낼 수 있다.

Logfire — 가장 간단하다. logfire.configure() 한 줄이면 MeterProvider가 자동 등록된다.

Datadog — ddtrace-run으로 실행하면서 OTLP 메트릭을 활성화하면 코드 변경 없이 동작한다.

OTel Collector — OTLP Exporter로 Collector에 보내고, Collector에서 여러 백엔드로 라우팅하는 방식. 프로덕션 환경에서 가장 유연하다.

바뀌는 건 observability.py의 MeterProvider 설정뿐이고, hooks=[OTelHook()]은 항상 동일하다.

마치며

Rate Limiting은 "설정하고 끝"이 아니라 지속적으로 튜닝해야 하는 영역이다. 적절한 관측 없이는 limit 값이 너무 관대한지, 너무 엄격한지 판단할 수 없다.

throttled-py의 OTelHook은 이 관측을 위한 최소한의 장치를 라이브러리 레벨에서 제공한다. 라이브러리가 "뭘 측정할지"를 결정하고, 서버 개발자는 "어디로 보낼지"만 정하면 된다. 이 관심사 분리가 OpenTelemetry의 API/SDK 구조 덕분에 자연스럽게 이루어진다.