Token Introspection Load Test
Test Overview
| Item | Details |
|---|---|
| Test Date | December 16, 2025 |
| Target Endpoint | POST /introspect |
| Purpose | Measure performance for resource server token validation with revocation checks |
Executive Summary
| Target RPS | Shards | Cache | Actual RPS | Success Rate | P95 | HTTP Failures | Status |
|---|---|---|---|---|---|---|---|
| 300 RPS | 16 | Off | ~298 | 100% | 324ms | 0 | ✅ Excellent |
| 500 RPS | 16 | Off | ~555 | 100% | 1,110ms | 0 | ⚠️ Threshold exceeded |
| 500 RPS | 32 | On | ~527 | 100% | 1,245ms | 1 | ⚠️ Limited cache effect |
| 750 RPS | 32 | Off | ~735 | 100% | 2,605ms | 0 | ⚠️ High load |
Token Validation Accuracy (All RPS Levels)
| Validation Item | Result |
|---|---|
| Active detection overall accuracy | 100% ✅ |
| False Positives (revoked→active) | 0 ✅ |
| False Negatives (valid→inactive) | 0 ✅ |
| Token Exchange claims (act/resource) | 100% ✅ |
| strictValidation (aud/client) | 100% ✅ |
Test Environment
K6 Cloud Configuration
| Component | Details |
|---|---|
| Load Generator | K6 Cloud (amazon:us:portland) |
| Target | https://conformance.authrim.com |
| Protocol | Client credentials (Basic Auth) |
Infrastructure
| Component | Technology |
|---|---|
| Compute | Cloudflare Workers (op-management) |
| Revocation | Durable Objects (Region-Aware JTI Sharding) |
| Database | Cloudflare D1 |
| Cache | Cloudflare KV (optional, TTL 60s) |
Sharding Configuration
| Setting | 300/500 RPS | 750 RPS |
|---|---|---|
| Generation | 1 | 2 |
| Total Shards | 16 | 32 |
| Region | wnam (0-15) | wnam (0-31) |
| JTI Format | g1:wnam:{shard}:{random} | g2:wnam:{shard}:{random} |
Test Methodology
Token Introspection Flow
sequenceDiagram
participant RS as Resource Server
participant W as op-management Worker
participant DO as Durable Objects
participant KV as KV Cache
RS->>W: POST /introspect
Authorization: Basic {credentials}
token={access_token}
Note over W: 1. Client authentication (Basic Auth)
Note over W: 2. JWT decode & signature verification
Note over W: 3. Expiry check (exp)
W->>DO: 4. Revocation check (Region-Aware Sharding)
DO-->>W: revoked status
Note over W: 5. Audience/Client validation (strictValidation)
W-->>RS: {"active": true/false, "sub": "...", "scope": "..."}
Token Mix (RFC 7662 + Industry Standard)
| Type | Ratio | Expected Result | Validation |
|---|---|---|---|
| Valid (standard) | 60% | active=true | scope/sub integrity |
| Valid (Token Exchange) | 5% | active=true | act/resource claim (RFC 8693) |
| Expired | 12% | active=false | Immediate detection |
| Revoked | 12% | active=false | DO/KV real-time check |
| Wrong Audience | 6% | active=false | aud validation (strictValidation) |
| Wrong Client | 5% | active=false | client_id validation |
Seed Tokens: 3,000 tokens
Success Criteria
- P95 Latency < 500ms
- P99 Latency < 800ms
- Success Rate > 99%
- False Positives = 0
- False Negatives = 0
Results - Performance Metrics
300 RPS (16 Shards)
Test Period: 2025-12-16 00:14:00 - 00:18:30 UTC
| Metric | Value |
|---|---|
| Total Requests | 43,874 |
| HTTP Failures | 0 |
| Peak RPS | 298 req/s |
| Success Rate | 100% |
| Active Correct | 100% |
Response Time (ms)
| Percentile | Value |
|---|---|
| Mean | 237ms |
| P50 | 229ms |
| P95 | 324ms |
| P99 | 329ms |
| Max | 478ms |
Cloudflare Analytics
| Metric | Value |
|---|---|
| Worker P99 Duration | 46.7ms |
| DO Requests | 77,434 |
| DO Errors | 0 |
| DO Wall Time P99 | 417ms |
| D1 Read Queries | 706,689 |
✅ Excellent: All metrics within thresholds
500 RPS (16 Shards)
Test Period: 2025-12-16 01:38:30 - 01:43:00 UTC
| Metric | Value |
|---|---|
| Total Requests | 72,302 |
| HTTP Failures | 0 |
| Peak RPS | 555 req/s |
| Success Rate | 100% |
Response Time (ms)
| Percentile | Value |
|---|---|
| P50 | 216ms |
| P95 | 1,110ms |
| P99 | 1,253ms |
| Max | 1,036ms |
Cloudflare Analytics
| Metric | Value |
|---|---|
| Worker P99 Duration | 193ms |
| DO Requests | 127,969 |
| DO Errors | 0 |
| DO Wall Time P99 | 325ms |
⚠️ Threshold Exceeded: P95 > 500ms target, but zero errors
500 RPS (32 Shards, Cache Enabled)
Test Period: 2025-12-16 08:08:00 - 08:12:30 UTC Cache: Enabled (TTL 60s) Token Count: 500 (matching RPS for high cache hit potential)
| Metric | Value |
|---|---|
| Total Requests | 71,941 |
| HTTP Failures | 1 |
| Peak RPS | 527 req/s |
| Success Rate | 99.9986% |
Response Time (ms)
| Percentile | Value |
|---|---|
| P50 | 518ms |
| P95 | 1,245ms |
| P99 | 1,350ms |
| Max | 10,943ms |
Cache Effect Analysis
| Metric | Cache Off (16 shards) | Cache On (32 shards) | Delta |
|---|---|---|---|
| P95 | 1,110ms | 1,245ms | +12% |
| Worker P99 | 193ms | 221ms | +15% |
| DO P99 | 325ms | 688ms | +112% |
| D1 Reads | 706,689 | 1,083,282 | +53% |
Why cache was not effective:
- Token count ≈ RPS (low reuse of same tokens)
- Revocation check still required on cache hit (security)
- More shards added overhead that exceeded cache savings
750 RPS (32 Shards)
Test Period: 2025-12-16 02:15:00 - 02:19:30 UTC
| Metric | Value |
|---|---|
| Total Requests | 87,771 |
| HTTP Failures | 0 |
| Peak RPS | 735 req/s |
| Success Rate | 100% |
Response Time (ms)
| Percentile | Value |
|---|---|
| P50 | 227ms |
| P95 | 2,605ms |
| P99 | 2,687ms |
| Max | 517ms |
Cloudflare Analytics
| Metric | Value |
|---|---|
| Worker P99 Duration | 503ms |
| DO Requests | 155,258 |
| DO Errors | 0 |
| DO Wall Time P99 | 1,771ms |
⚠️ High Load: 32 shards achieved zero errors at 750 RPS
Token Validation Accuracy
All RPS levels achieved 100% accuracy:
| Token Type | Expected | Accuracy | Status |
|---|---|---|---|
| Valid (standard) | active=true | 100% | ✅ |
| Valid (Token Exchange) | active=true | 100% | ✅ |
| Expired | active=false | 100% | ✅ |
| Revoked | active=false | 100% | ✅ |
| Wrong Audience | active=false | 100% | ✅ |
| Wrong Client | active=false | 100% | ✅ |
| Security Metric | Result |
|---|---|
| False Positives | 0 |
| False Negatives | 0 |
| Claim Integrity (scope/aud/sub/iss) | 100% |
| act claim (RFC 8693) | 100% |
| resource claim (RFC 8693) | 100% |
Capacity Recommendations
| Load Level | RPS | Monthly Requests | Shards | Recommendation |
|---|---|---|---|---|
| Low | ~300 | ~780M | 16 | ✅ Recommended |
| Medium | ~500 | ~1.3B | 16 | △ Acceptable |
| High | ~750 | ~1.9B | 32 | ⚠️ Requires monitoring |
| Limit | 1000+ | 2.6B+ | 32+ | ❌ Requires scale-out |
Industry Comparison
| Service Scale | Monthly Active Users | Estimated RPS |
|---|---|---|
| Small/Medium | ~1M | ~50 RPS |
| Medium | ~5M | ~200 RPS |
| Large | ~10M | ~400 RPS |
| Very Large | ~50M | ~1,500 RPS |
Note: Introspect is typically used with caching, so actual server load is 10-30% of the above.
Sharding Effect
750 RPS Comparison
| Shards | P95 | HTTP Failures | Improvement |
|---|---|---|---|
| 16 | 2,687ms | 2 | - |
| 32 | 2,605ms | 0 | ✅ Errors eliminated |
Key Findings
1. 300 RPS is the Stable Operating Point
All metrics within thresholds at 300 RPS.
2. Token Validation is 100% Accurate
Even under high load, token type detection remains perfect with zero false positives/negatives.
3. Sharding Eliminates Errors
32 shards achieved zero HTTP failures at 750 RPS (vs 2 failures with 16 shards).
4. Cache Effectiveness Depends on Usage Pattern
- Test conditions: Limited effect (token count ≈ RPS)
- Production: Expected improvement with token reuse
- Security: Revocation check always performed
5. Bottleneck is DO Request Volume
At 500+ RPS, Worker-DO communication becomes the limiting factor.
Bottleneck Analysis
| Layer | 300 RPS | 500 RPS | 750 RPS |
|---|---|---|---|
| Worker P99 | 47ms | 193ms | 503ms |
| DO P99 | 417ms | 325ms | 1,771ms |
| K6 P95 | 324ms | 1,110ms | 2,605ms |
| Verdict | Headroom | At threshold | High load |
Infrastructure Architecture
flowchart TB
subgraph Test["Test Environment"]
k6["k6 Cloud (Portland)"]
end
subgraph CF["Cloudflare Edge"]
subgraph Worker["op-management Worker"]
IE["Introspect Endpoint"]
JV["JWT Validation
(signature/expiry)"]
RC["Response Cache
(KV TTL 60s, active=true only)"]
end
subgraph Revocation["Revocation Check (Region-Aware Sharding)"]
SR["Shard Router"]
TRS["TokenRevocationStore DO
16/32 shards (wnam: 0-15/0-31)"]
end
subgraph DB["Database"]
D1["D1 Database (conformance)"]
end
end
k6 -->|HTTPS| IE
IE --> JV
JV --> RC
RC -->|"Cache miss or revocation check"| SR
SR -->|"JTI: g{gen}:{region}:{shard}:{random}"| TRS
TRS --> D1
Note: Response cache only caches
active=trueresponses. Revocation check is always performed for security.
Improvement Recommendations
Short Term (Operations)
- Monitoring: Alert at 300+ RPS, critical at 500+ RPS
- Sharding: Use 32 shards when 500+ RPS expected
Medium Term (Architecture)
- Server-side cache optimization: Already implemented (KV TTL 60s)
- Production benefit expected with token reuse patterns
- Revocation check always performed for security
- Client-side caching: Resource servers cache introspect results (TTL 30-60s)
- Can reduce server load by 70-90%
- RFC 7662 compliant (no caching beyond exp)
- Dynamic shard adjustment: Auto-scale based on load
Long Term
- Geographic distribution: Multi-region DO placement
- D1 Read Replicas: Read optimization for global deployment
- Event-driven invalidation: Immediate cache invalidation on revoke
Conclusion
Token Introspection endpoint achieves:
- Up to 300 RPS: Stable operation with all metrics within thresholds
- Up to 500 RPS: Latency increases but zero errors
- 32 shards enables 750 RPS: Zero errors at high load
100% token validation accuracy at all load levels - security is never compromised for performance.
Primary bottleneck is Worker-DO communication volume. Caching effectiveness depends on real-world usage patterns (same tokens accessed multiple times).