Rate Limiting

Prevent abuse and control costs with throttling and quotas.

Throttle

Limit the number of calls within a sliding time window:

{
  "type": "throttle",
  "config": {
    "max": 100,
    "window": "hour"
  }
}

Supported windows: second, minute, hour, day

Per-tool throttling

Apply different limits to different tools:

{
  "type": "throttle",
  "config": {
    "tool": "send_email",
    "max": 10,
    "window": "hour"
  }
}

Quota

Hard caps that reset on a schedule:

{
  "type": "quota",
  "config": {
    "max": 1000,
    "period": "day"
  }
}

Per-recipient quota

Limit calls per unique value of a parameter:

{
  "type": "quota",
  "config": {
    "tool": "send_notification",
    "max": 5,
    "period": "day",
    "key": "recipient_id"
  }
}

Each recipient can only receive 5 notifications per day.

Budget

Track and limit spend based on estimated costs:

{
  "type": "budget",
  "config": {
    "max_daily": 50.00,
    "cost_per_call": {
      "query_llm": 0.02,
      "generate_image": 0.10,
      "*": 0.001
    }
  }
}

Reuse (Caching)

Cache responses to avoid redundant calls:

{
  "type": "reuse",
  "config": {
    "tool": "get_weather",
    "ttl": "5m",
    "key": ["city"]
  }
}

Identical requests within 5 minutes return the cached response.

Example: API cost control

{
  "name": "LLM cost control",
  "type": "budget",
  "config": {
    "max_daily": 100.00,
    "alert_threshold": 80.00,
    "cost_per_call": {
      "query_gpt4": 0.03,
      "query_claude": 0.02,
      "generate_embedding": 0.0001
    }
  }
}