Skill

onenote-rate-limits

Implements token bucket rate limiter and queue throttling for OneNote Graph API to handle 429 errors and per-user/tenant limits in high-throughput apps.

Python

Node

api-development

backend

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin onenote-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(npm:*)Bash(pip:*)Grep

Preview

Microsoft Graph rate limits OneNote at **600 requests per 60 seconds per user** and **10,000 requests per 10 minutes per app/tenant**. When you exceed either limit, the API returns `429 Too Many Requests` with a `Retry-After` header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that...

Supporting Assets

references/one-pager.md

SKILL.md

Similar Skills

onenote-sdk-patterns

1.9k

Implements retry middleware with Retry-After parsing, batch requests, and silent failure detection for OneNote Graph API in TypeScript and Python. Handles rate limits and large file uploads.

1 file6 tools

onenote-pack

notion-rate-limits

1.9k

Handles Notion API rate limits (3 req/s) with exponential backoff, p-queue throttling, and batch optimization for 429 errors and high-throughput TypeScript/Python integrations.

2 files6 tools

notion-pack

evernote-rate-limits

1.9k

Handles Evernote API rate limits with JS retry wrappers, delays, batching, optimization strategies, and monitoring. Use for quota errors or efficient API usage.

1 file4 tools

evernote-pack

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 24, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

OneNote — Rate Limit Handling & Request Throttling

Overview

Microsoft Graph rate limits OneNote at 600 requests per 60 seconds per user and 10,000 requests per 10 minutes per app/tenant. When you exceed either limit, the API returns 429 Too Many Requests with a Retry-After header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that wastes capacity.

This skill implements a token bucket rate limiter, queue-based request throttling, and proper Retry-After header parsing. For multi-user apps, it tracks per-user and per-tenant budgets independently.

Key pain points addressed:

The Retry-After header value is in seconds (not milliseconds) — many implementations parse this wrong
The per-user limit (600/60s) is separate from the per-tenant limit (10,000/10min) — you can hit one without the other
Batch requests ($batch) count as one request toward the limit, regardless of how many operations are inside
After a 429, subsequent requests to ANY OneNote endpoint are throttled — not just the endpoint that triggered it

Prerequisites

Azure app registration with delegated permissions: Notes.ReadWrite
App-only auth deprecated March 31, 2025 — use delegated auth only
Python: pip install msgraph-sdk azure-identity
Node/TypeScript: npm install @microsoft/microsoft-graph-client @azure/identity @azure/msal-node
Optional: npm install p-queue for production queue management

Instructions

Step 1 — Understand the Rate Limit Structure

Limit	Scope	Window	Threshold
Per-user	Single user's delegated token	60 seconds (rolling)	600 requests
Per-tenant	All users + all apps in the tenant	10 minutes (rolling)	10,000 requests

When either limit is hit:

Response status: 429 Too Many Requests
Response header: Retry-After: <seconds> (integer, not milliseconds)
All subsequent OneNote requests for that scope are blocked until the window resets
Non-OneNote Graph endpoints (Outlook, OneDrive) are not affected

Step 2 — Token Bucket Rate Limiter (TypeScript)

A token bucket preemptively throttles requests to stay below the limit, avoiding 429s entirely:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  private readonly maxTokens: number;
  private readonly refillRate: number; // tokens per millisecond

  constructor(maxTokens: number, refillWindowMs: number) {
    this.maxTokens = maxTokens;
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
    this.refillRate = maxTokens / refillWindowMs;
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return;
    }
    // Wait until a token is available
    const waitMs = Math.ceil((1 - this.tokens) / this.refillRate);
    await new Promise((resolve) => setTimeout(resolve, waitMs));
    this.tokens -= 1;
  }

  get available(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}

// Per-user bucket: 600 requests per 60 seconds
const userBucket = new TokenBucket(600, 60_000);

// Use with a safety margin (80% of limit)
const safeUserBucket = new TokenBucket(480, 60_000);

Step 3 — Queue-Based Request Throttling

Wrap all OneNote API calls through a throttled queue that respects both the token bucket and Retry-After headers:

import { Client } from "@microsoft/microsoft-graph-client";

class ThrottledOneNoteClient {
  private bucket: TokenBucket;
  private queue: Array<{
    resolve: (value: any) => void;
    reject: (error: any) => void;
    fn: () => Promise<any>;
  }> = [];
  private processing = false;
  private retryAfterUntil: number = 0; // Timestamp when retry-after expires

  constructor(
    private client: Client,
    maxRequestsPerMinute: number = 480 // 80% safety margin
  ) {
    this.bucket = new TokenBucket(maxRequestsPerMinute, 60_000);
  }

  async request<T>(fn: (client: Client) => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push({ resolve, reject, fn: () => fn(this.client) });
      this.processQueue();
    });
  }

  private async processQueue(): Promise<void> {
    if (this.processing) return;
    this.processing = true;

    while (this.queue.length > 0) {
      // Respect Retry-After if we've been throttled
      const now = Date.now();
      if (this.retryAfterUntil > now) {
        const waitMs = this.retryAfterUntil - now;
        console.warn(`Rate limited — waiting ${Math.ceil(waitMs / 1000)}s`);
        await new Promise((r) => setTimeout(r, waitMs));
      }

      await this.bucket.acquire();
      const item = this.queue.shift()!;

      try {
        const result = await item.fn();
        item.resolve(result);
      } catch (err: any) {
        if (err.statusCode === 429) {
          const retryAfter = parseInt(err.headers?.["retry-after"] ?? "30", 10);
          this.retryAfterUntil = Date.now() + retryAfter * 1000;
          // Re-queue the failed request
          this.queue.unshift(item);
          console.warn(`429 received — Retry-After: ${retryAfter}s`);
        } else {
          item.reject(err);
        }
      }
    }

    this.processing = false;
  }
}

// Usage
const throttled = new ThrottledOneNoteClient(client);
const notebooks = await throttled.request((c) =>
  c.api("/me/onenote/notebooks").get()
);

Step 4 — Per-User Tracking for Multi-User Apps

Multi-user apps must track rate limits per user, not globally:

class MultiUserRateLimiter {
  private userBuckets: Map<string, TokenBucket> = new Map();
  private tenantBucket: TokenBucket;

  constructor() {
    // Tenant-wide: 10,000 per 10 minutes
    this.tenantBucket = new TokenBucket(8_000, 600_000); // 80% safety margin
  }

  async acquire(userId: string): Promise<void> {
    // Get or create per-user bucket
    if (!this.userBuckets.has(userId)) {
      this.userBuckets.set(userId, new TokenBucket(480, 60_000));
    }
    const userBucket = this.userBuckets.get(userId)!;

    // Must acquire from BOTH buckets
    await userBucket.acquire();
    await this.tenantBucket.acquire();
  }

  getStatus(userId: string): { userRemaining: number; tenantRemaining: number } {
    const userBucket = this.userBuckets.get(userId);
    return {
      userRemaining: userBucket?.available ?? 480,
      tenantRemaining: this.tenantBucket.available,
    };
  }
}

Step 5 — Exponential Backoff with Jitter

For 429 responses without a Retry-After header (rare but possible), use exponential backoff with jitter:

async function withBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.statusCode !== 429 || attempt === maxRetries) throw err;

      const retryAfter = err.headers?.["retry-after"];
      let delayMs: number;

      if (retryAfter) {
        // Prefer server-specified delay (in seconds)
        delayMs = parseInt(retryAfter, 10) * 1000;
      } else {
        // Exponential backoff: 1s, 2s, 4s, 8s, 16s + jitter
        const base = Math.pow(2, attempt) * 1000;
        const jitter = Math.random() * 1000;
        delayMs = base + jitter;
      }

      console.warn(`Retry ${attempt + 1}/${maxRetries} in ${Math.ceil(delayMs / 1000)}s`);
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }
  throw new Error("Unreachable");
}

// Usage
const pages = await withBackoff(() =>
  client.api("/me/onenote/pages").top(50).get()
);

Step 6 — Batch Requests to Reduce Call Count

The Graph $batch endpoint lets you send up to 20 operations in a single HTTP request. The entire batch counts as one request toward your rate limit:

async function batchGetPages(client: Client, pageIds: string[]): Promise<any[]> {
  const batchSize = 20; // Graph batch limit
  const allResults: any[] = [];

  for (let i = 0; i < pageIds.length; i += batchSize) {
    const chunk = pageIds.slice(i, i + batchSize);
    const batchBody = {
      requests: chunk.map((id, idx) => ({
        id: String(idx + 1),
        method: "GET",
        url: `/me/onenote/pages/${id}?$select=id,title,lastModifiedDateTime`,
      })),
    };

    const batchResponse = await client.api("/$batch").post(batchBody);
    for (const response of batchResponse.responses) {
      if (response.status === 200) {
        allResults.push(response.body);
      } else {
        console.warn(`Batch item ${response.id} failed: ${response.status}`);
      }
    }
  }
  return allResults;
}

// 100 pages = 5 HTTP requests instead of 100
const pages = await batchGetPages(client, hundredPageIds);

Step 7 — Python Rate Limiter with asyncio

import asyncio
import time

class RateLimiter:
    """Token bucket rate limiter for OneNote Graph API."""

    def __init__(self, max_requests: int = 480, window_seconds: int = 60):
        self.max_tokens = max_requests
        self.tokens = float(max_requests)
        self.refill_rate = max_requests / window_seconds
        self.last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self):
        async with self._lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now

            if self.tokens < 1:
                wait = (1 - self.tokens) / self.refill_rate
                await asyncio.sleep(wait)
                self.tokens = 0
            else:
                self.tokens -= 1

# Usage — combines token bucket with Retry-After handling
limiter = RateLimiter(max_requests=480, window_seconds=60)

async def safe_get_pages(client, section_id: str, max_retries: int = 3):
    for attempt in range(max_retries):
        await limiter.acquire()
        try:
            return await client.me.onenote.sections.by_onenote_section_id(
                section_id
            ).pages.get()
        except Exception as e:
            # Handle 429 with Retry-After header
            if hasattr(e, "response") and e.response.status_code == 429 and attempt < max_retries - 1:
                retry_after = int(e.response.headers.get("Retry-After", "30"))
                await asyncio.sleep(retry_after)
            else:
                raise
    raise RuntimeError("Max retries exceeded for OneNote API call")

Step 8 — Monitor and Adjust Preemptively

Track your 429 rate over time and adjust thresholds:

class RateLimitMonitor {
  private requestCount = 0;
  private throttleCount = 0;
  private windowStart = Date.now();

  record(wasThrottled: boolean): void {
    this.requestCount++;
    if (wasThrottled) this.throttleCount++;
  }

  getMetrics(): { total: number; throttled: number; throttleRate: number; windowMinutes: number } {
    const windowMinutes = (Date.now() - this.windowStart) / 60_000;
    return {
      total: this.requestCount,
      throttled: this.throttleCount,
      throttleRate: this.throttleCount / Math.max(this.requestCount, 1),
      windowMinutes: Math.round(windowMinutes * 10) / 10,
    };
  }

  // Alert if throttle rate exceeds threshold
  shouldReduceRate(): boolean {
    return this.getMetrics().throttleRate > 0.05; // >5% throttled = slow down
  }
}

Output

Rate limit handling produces:

Preemptive throttling via token bucket — requests are delayed before sending, not after 429
Retry-After compliance — exact server-specified delays honored
Batch consolidation — 20 operations per HTTP request for bulk workloads
Monitoring metrics — request count, throttle count, throttle rate percentage

Error Handling

Status	Cause	Fix
429 (with Retry-After)	Per-user or per-tenant limit exceeded	Wait exactly `Retry-After` seconds; do not retry sooner
429 (no Retry-After)	Rare edge case, limit exceeded	Exponential backoff with jitter starting at 1 second
503	Service throttling under load	Treat like 429 — backoff and retry
500	Internal error during throttled state	Do not count as rate limit; retry with normal backoff

Examples

Calculate request budget for polling + CRUD:

const BUDGET_PER_MINUTE = 600;
const SAFETY_MARGIN = 0.8; // Use 80% of limit
const safeBudget = BUDGET_PER_MINUTE * SAFETY_MARGIN; // 480

// Allocate budget
const pollingSections = 20;
const pollIntervalSec = 30;
const pollRequestsPerMin = pollingSections * (60 / pollIntervalSec); // 40/min

const remainingForCrud = safeBudget - pollRequestsPerMin; // 440/min for user operations
console.log(`Polling: ${pollRequestsPerMin}/min | CRUD: ${remainingForCrud}/min`);

Production health check:

const monitor = new RateLimitMonitor();
// After each API call:
monitor.record(/* wasThrottled */ false);

// Periodic check
setInterval(() => {
  const metrics = monitor.getMetrics();
  if (monitor.shouldReduceRate()) {
    console.warn(`High throttle rate: ${(metrics.throttleRate * 100).toFixed(1)}%`);
    // Dynamically increase poll interval or reduce batch concurrency
  }
}, 60_000);

Resources

Next Steps

See onenote-webhooks-events for polling patterns that consume rate budget
See onenote-performance-tuning for batch operations and $select to reduce payload size
See onenote-core-workflow-a for CRUD operations that benefit from throttled clients