Skip to content

Token Introspection Load Test

Test Overview

ItemDetails
Test DateDecember 16, 2025
Target EndpointPOST /introspect
PurposeMeasure performance for resource server token validation with revocation checks

Executive Summary

Target RPSShardsCacheActual RPSSuccess RateP95HTTP FailuresStatus
300 RPS16Off~298100%324ms0✅ Excellent
500 RPS16Off~555100%1,110ms0⚠️ Threshold exceeded
500 RPS32On~527100%1,245ms1⚠️ Limited cache effect
750 RPS32Off~735100%2,605ms0⚠️ High load

Token Validation Accuracy (All RPS Levels)

Validation ItemResult
Active detection overall accuracy100%
False Positives (revoked→active)0
False Negatives (valid→inactive)0
Token Exchange claims (act/resource)100%
strictValidation (aud/client)100%

Test Environment

K6 Cloud Configuration

ComponentDetails
Load GeneratorK6 Cloud (amazon:us:portland)
Targethttps://conformance.authrim.com
ProtocolClient credentials (Basic Auth)

Infrastructure

ComponentTechnology
ComputeCloudflare Workers (op-management)
RevocationDurable Objects (Region-Aware JTI Sharding)
DatabaseCloudflare D1
CacheCloudflare KV (optional, TTL 60s)

Sharding Configuration

Setting300/500 RPS750 RPS
Generation12
Total Shards1632
Regionwnam (0-15)wnam (0-31)
JTI Formatg1:wnam:{shard}:{random}g2:wnam:{shard}:{random}

Test Methodology

Token Introspection Flow

sequenceDiagram
    participant RS as Resource Server
    participant W as op-management Worker
    participant DO as Durable Objects
    participant KV as KV Cache

    RS->>W: POST /introspect
Authorization: Basic {credentials}
token={access_token} Note over W: 1. Client authentication (Basic Auth) Note over W: 2. JWT decode & signature verification Note over W: 3. Expiry check (exp) W->>DO: 4. Revocation check (Region-Aware Sharding) DO-->>W: revoked status Note over W: 5. Audience/Client validation (strictValidation) W-->>RS: {"active": true/false, "sub": "...", "scope": "..."}

Token Mix (RFC 7662 + Industry Standard)

TypeRatioExpected ResultValidation
Valid (standard)60%active=truescope/sub integrity
Valid (Token Exchange)5%active=trueact/resource claim (RFC 8693)
Expired12%active=falseImmediate detection
Revoked12%active=falseDO/KV real-time check
Wrong Audience6%active=falseaud validation (strictValidation)
Wrong Client5%active=falseclient_id validation

Seed Tokens: 3,000 tokens

Success Criteria

  • P95 Latency < 500ms
  • P99 Latency < 800ms
  • Success Rate > 99%
  • False Positives = 0
  • False Negatives = 0

Results - Performance Metrics

300 RPS (16 Shards)

Test Period: 2025-12-16 00:14:00 - 00:18:30 UTC

MetricValue
Total Requests43,874
HTTP Failures0
Peak RPS298 req/s
Success Rate100%
Active Correct100%

Response Time (ms)

PercentileValue
Mean237ms
P50229ms
P95324ms
P99329ms
Max478ms

Cloudflare Analytics

MetricValue
Worker P99 Duration46.7ms
DO Requests77,434
DO Errors0
DO Wall Time P99417ms
D1 Read Queries706,689

Excellent: All metrics within thresholds

500 RPS (16 Shards)

Test Period: 2025-12-16 01:38:30 - 01:43:00 UTC

MetricValue
Total Requests72,302
HTTP Failures0
Peak RPS555 req/s
Success Rate100%

Response Time (ms)

PercentileValue
P50216ms
P951,110ms
P991,253ms
Max1,036ms

Cloudflare Analytics

MetricValue
Worker P99 Duration193ms
DO Requests127,969
DO Errors0
DO Wall Time P99325ms

⚠️ Threshold Exceeded: P95 > 500ms target, but zero errors

500 RPS (32 Shards, Cache Enabled)

Test Period: 2025-12-16 08:08:00 - 08:12:30 UTC Cache: Enabled (TTL 60s) Token Count: 500 (matching RPS for high cache hit potential)

MetricValue
Total Requests71,941
HTTP Failures1
Peak RPS527 req/s
Success Rate99.9986%

Response Time (ms)

PercentileValue
P50518ms
P951,245ms
P991,350ms
Max10,943ms

Cache Effect Analysis

MetricCache Off (16 shards)Cache On (32 shards)Delta
P951,110ms1,245ms+12%
Worker P99193ms221ms+15%
DO P99325ms688ms+112%
D1 Reads706,6891,083,282+53%

Why cache was not effective:

  1. Token count ≈ RPS (low reuse of same tokens)
  2. Revocation check still required on cache hit (security)
  3. More shards added overhead that exceeded cache savings

750 RPS (32 Shards)

Test Period: 2025-12-16 02:15:00 - 02:19:30 UTC

MetricValue
Total Requests87,771
HTTP Failures0
Peak RPS735 req/s
Success Rate100%

Response Time (ms)

PercentileValue
P50227ms
P952,605ms
P992,687ms
Max517ms

Cloudflare Analytics

MetricValue
Worker P99 Duration503ms
DO Requests155,258
DO Errors0
DO Wall Time P991,771ms

⚠️ High Load: 32 shards achieved zero errors at 750 RPS

Token Validation Accuracy

All RPS levels achieved 100% accuracy:

Token TypeExpectedAccuracyStatus
Valid (standard)active=true100%
Valid (Token Exchange)active=true100%
Expiredactive=false100%
Revokedactive=false100%
Wrong Audienceactive=false100%
Wrong Clientactive=false100%
Security MetricResult
False Positives0
False Negatives0
Claim Integrity (scope/aud/sub/iss)100%
act claim (RFC 8693)100%
resource claim (RFC 8693)100%

Capacity Recommendations

Load LevelRPSMonthly RequestsShardsRecommendation
Low~300~780M16✅ Recommended
Medium~500~1.3B16△ Acceptable
High~750~1.9B32⚠️ Requires monitoring
Limit1000+2.6B+32+❌ Requires scale-out

Industry Comparison

Service ScaleMonthly Active UsersEstimated RPS
Small/Medium~1M~50 RPS
Medium~5M~200 RPS
Large~10M~400 RPS
Very Large~50M~1,500 RPS

Note: Introspect is typically used with caching, so actual server load is 10-30% of the above.

Sharding Effect

750 RPS Comparison

ShardsP95HTTP FailuresImprovement
162,687ms2-
322,605ms0✅ Errors eliminated

Key Findings

1. 300 RPS is the Stable Operating Point

All metrics within thresholds at 300 RPS.

2. Token Validation is 100% Accurate

Even under high load, token type detection remains perfect with zero false positives/negatives.

3. Sharding Eliminates Errors

32 shards achieved zero HTTP failures at 750 RPS (vs 2 failures with 16 shards).

4. Cache Effectiveness Depends on Usage Pattern

  • Test conditions: Limited effect (token count ≈ RPS)
  • Production: Expected improvement with token reuse
  • Security: Revocation check always performed

5. Bottleneck is DO Request Volume

At 500+ RPS, Worker-DO communication becomes the limiting factor.

Bottleneck Analysis

Layer300 RPS500 RPS750 RPS
Worker P9947ms193ms503ms
DO P99417ms325ms1,771ms
K6 P95324ms1,110ms2,605ms
VerdictHeadroomAt thresholdHigh load

Infrastructure Architecture

flowchart TB
    subgraph Test["Test Environment"]
        k6["k6 Cloud (Portland)"]
    end

    subgraph CF["Cloudflare Edge"]
        subgraph Worker["op-management Worker"]
            IE["Introspect Endpoint"]
            JV["JWT Validation
(signature/expiry)"] RC["Response Cache
(KV TTL 60s, active=true only)"] end subgraph Revocation["Revocation Check (Region-Aware Sharding)"] SR["Shard Router"] TRS["TokenRevocationStore DO
16/32 shards (wnam: 0-15/0-31)"] end subgraph DB["Database"] D1["D1 Database (conformance)"] end end k6 -->|HTTPS| IE IE --> JV JV --> RC RC -->|"Cache miss or revocation check"| SR SR -->|"JTI: g{gen}:{region}:{shard}:{random}"| TRS TRS --> D1

Note: Response cache only caches active=true responses. Revocation check is always performed for security.

Improvement Recommendations

Short Term (Operations)

  1. Monitoring: Alert at 300+ RPS, critical at 500+ RPS
  2. Sharding: Use 32 shards when 500+ RPS expected

Medium Term (Architecture)

  1. Server-side cache optimization: Already implemented (KV TTL 60s)
    • Production benefit expected with token reuse patterns
    • Revocation check always performed for security
  2. Client-side caching: Resource servers cache introspect results (TTL 30-60s)
    • Can reduce server load by 70-90%
    • RFC 7662 compliant (no caching beyond exp)
  3. Dynamic shard adjustment: Auto-scale based on load

Long Term

  1. Geographic distribution: Multi-region DO placement
  2. D1 Read Replicas: Read optimization for global deployment
  3. Event-driven invalidation: Immediate cache invalidation on revoke

Conclusion

Token Introspection endpoint achieves:

  • Up to 300 RPS: Stable operation with all metrics within thresholds
  • Up to 500 RPS: Latency increases but zero errors
  • 32 shards enables 750 RPS: Zero errors at high load

100% token validation accuracy at all load levels - security is never compromised for performance.

Primary bottleneck is Worker-DO communication volume. Caching effectiveness depends on real-world usage patterns (same tokens accessed multiple times).