Rate Limiting
Prevent abuse and control costs with throttling and quotas.
Throttle
Limit the number of calls within a sliding time window:
{
"type": "throttle",
"config": {
"max": 100,
"window": "hour"
}
}Supported windows: second, minute, hour, day
Per-tool throttling
Apply different limits to different tools:
{
"type": "throttle",
"config": {
"tool": "send_email",
"max": 10,
"window": "hour"
}
}Quota
Hard caps that reset on a schedule:
{
"type": "quota",
"config": {
"max": 1000,
"period": "day"
}
}Per-recipient quota
Limit calls per unique value of a parameter:
{
"type": "quota",
"config": {
"tool": "send_notification",
"max": 5,
"period": "day",
"key": "recipient_id"
}
}Each recipient can only receive 5 notifications per day.
Budget
Track and limit spend based on estimated costs:
{
"type": "budget",
"config": {
"max_daily": 50.00,
"cost_per_call": {
"query_llm": 0.02,
"generate_image": 0.10,
"*": 0.001
}
}
}Reuse (Caching)
Cache responses to avoid redundant calls:
{
"type": "reuse",
"config": {
"tool": "get_weather",
"ttl": "5m",
"key": ["city"]
}
}Identical requests within 5 minutes return the cached response.
Example: API cost control
{
"name": "LLM cost control",
"type": "budget",
"config": {
"max_daily": 100.00,
"alert_threshold": 80.00,
"cost_per_call": {
"query_gpt4": 0.03,
"query_claude": 0.02,
"generate_embedding": 0.0001
}
}
}