Token Exchange Load Test

Test Overview

Item	Details
Test Date	December 14, 2025
Target Endpoint	`POST /token` (grant_type=token-exchange)
Purpose	Measure performance for microservice authentication and SSO audience switching

Test Environment

K6 Cloud Configuration

Component	Details
Load Generator	K6 Cloud (amazon:us:portland)
Target	https://conformance.authrim.com
Protocol	Client Secret (Basic Auth)

Infrastructure

Component	Technology
Compute	Cloudflare Workers (op-token)
Revocation Check	Durable Objects (TokenRevocationStore)
Key Management	Durable Objects (KeyManager)
Database	Cloudflare D1

Sharding Configuration

Durable Object	Shards	Purpose
AuthorizationCodeStore	8	Auth code management
SessionStore	8	Session management
ChallengeStore	8	Challenge management
TokenRevocationStore	8	Token revocation check (main DO)
RefreshTokenRotator	16	Refresh token management

Test Methodology

Token Mix (RFC 8693 + Industry Standard)

Token Type	Ratio	Expected Result	Validation
Valid (standard)	56%	New token issued	Scope/sub integrity
Valid (with actor)	14%	New token issued	Delegation flow (RFC 8693)
Expired	10%	400 error	Immediate detection
Invalid signature	10%	400 error	Signature verification
Revoked	10%	400 error	Real-time revocation check

Token Exchange Variations

Variation	Types	Examples
Target Audience	20	api.example.com/gateway, /users, /payments, …
Scope Patterns	4	openid, openid profile, openid profile email, full
Resource URI	10	resource.example.com/api/v1, data.example.com/graphql, …
Service Clients	5	service-gateway, service-bff, service-worker, …

Load Pattern

{
  scenarios: {
    warmup: {
      executor: 'constant-arrival-rate',
      rate: 50,
      duration: '30s',
      exec: 'warmupScenario',
    },
    token_exchange_benchmark: {
      executor: 'ramping-arrival-rate',
      startRate: 0,
      timeUnit: '1s',
      preAllocatedVUs: 3600,
      maxVUs: 4500,
      stages: [
        { target: 1500, duration: '15s' },
        { target: 3000, duration: '180s' },
        { target: 0, duration: '15s' },
      ],
      startTime: '30s',
    },
  },
}

Test Configuration

RFC 8693 Request Format

POST /token
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&subject_token={access_token}
&subject_token_type=urn:ietf:params:oauth:token-type:access_token
&audience={target_audience}
&scope={requested_scope}

Success Criteria

Valid tokens → 200 with new token
Expired/Invalid/Revoked → 400 with error
100% token validation accuracy

Results - Performance Metrics

Summary

RPS	Total Requests	K6 P95	CF Worker P99	CF DO P99	Status
2,000	292,343	500ms	307ms	1,020ms	⚠️
2,500	365,624	225ms	313ms	271ms	✅
3,000	390,444	2,144ms	316ms	2,222ms	❌

Criteria: ✅ K6 P95 < 300ms AND DO P99 < 500ms

K6 Client Latency (ms)

RPS	Median	P95	P99	Min	Max
2,000	112	500	589	40	3,448
2,500	76	225	297	39	5,075
3,000	1,657	2,144	2,269	53	4,236

RPS Achievement

Target RPS	Avg RPS	Peak RPS	Achievement
2,000	1,373	2,137	107%
2,500	1,717	2,494	100%
3,000	1,833	2,714	90%

Note: 3,000 RPS test had “Insufficient VUs” warning

Results - Infrastructure Metrics

Worker Duration (ms)

RPS	Total Req	P50	P75	P90	P99	P999
2,000	293,747	23.83	26.04	60.67	306.87	309.91
2,500	367,484	23.09	24.52	26.43	312.86	315.63
3,000	398,732	204.33	236.41	265.56	315.89	337.54

Worker CPU Time (ms)

RPS	P50	P75	P90	P99	P999
2,000	2.27	3.06	3.64	8.47	16.59
2,500	2.23	2.52	3.19	8.44	15.89
3,000	2.13	2.37	3.03	4.97	7.23

Key Finding: CPU time stable at ~2.3ms P50 - CPU is NOT the bottleneck

Durable Objects Wall Time (ms)

TokenRevocationStore + KeyManager DO:

RPS	Total DO Req	DO Errors	P50	P75	P90	P99	P999
2,000	620,073	0	17.76	29.88	102.05	1,019.69	1,312.22
2,500	764,279	0	15.08	28.63	46.66	271.45	378.61
3,000	759,010	8	759.34	1,512.10	1,821.78	2,222.33	2,450.69

Key Finding:

2,500 RPS is the sweet spot (P99 271ms)
3,000 RPS saturates DO (P50 jumps to 759ms)
DO errors begin at 3,000 RPS

D1 Database Metrics

RPS	Read Queries	Write Queries	Rows Read	Rows Written
2,000	1,010	6	1,016	14
2,500	810	6	816	14
3,000	1,010	6	1,016	14

Note: Token Exchange only reads client info from D1 (high cache hit rate)

DO Call Parallelization Effect

Implementation

// Before: Sequential (2 RTT)
const publicKey = await getVerificationKeyFromJWKS(env, kid);
await verifyToken(...);
const revoked = await isTokenRevoked(env, jti);

// After: Parallel (1 RTT)
const [publicKey, revoked] = await Promise.all([
  getVerificationKeyFromJWKS(env, kid),
  isTokenRevoked(env, jti)
]);
await verifyToken(...);

Impact at 3,000 RPS

Metric	Before	After	Change
DO P50	49ms	28ms	-43%
DO P99	2,141ms	2,222ms	~same
Worker P99	307ms	316ms	~same

P50 significantly improved. P99 unchanged due to queuing dominance at high load.

Capacity Recommendations

Usage	Recommended RPS	Rationale
Normal Operation	≤1,500	Comfortable stable operation
Peak Handling	≤2,500	K6 P95 225ms, DO P99 271ms - optimal point
Absolute Limit	≤2,700	Peak RPS achievable

Key Findings

1. CPU Processing is Fast and Stable

CPU Time P50 at 2.1-2.3ms across all RPS - CPU is NOT the bottleneck.

2. DO is the Bottleneck

At 3,000 RPS, DO P50 jumps from 15ms to 759ms (50x degradation).

3. 2,500 RPS is the Sweet Spot

Best performance achieved at 2,500 RPS:

K6 P95: 225ms
DO P99: 271ms

4. 3,000 RPS Saturates the System

DO queuing delay dominates
Performance degrades rapidly
DO errors begin (8 errors)

5. 100% Token Validation Accuracy

All token types correctly validated:

Valid → New token issued
Expired → 400 error
Invalid signature → 400 error
Revoked → 400 error (real-time check)

2,000 RPS vs 2,500 RPS Anomaly

2,000 RPS showed worse P95 (500ms) than 2,500 RPS (225ms):

Factor	Explanation
Timing	2,000 RPS: 14:49 JST, 2,500 RPS: 16:43 (~2 hours later)
DO Warmup	DOs were cold at 2,000 RPS test
VU Usage	2,000 RPS: ~878 VU, 2,500 RPS: ~437 VU

Higher VU usage at 2,000 RPS indicates slower server responses (VUs waiting).

Conclusion: 2,500 RPS is still the stable upper limit. 2,000 RPS anomaly was due to cold DO state.

Architecture Diagram

flowchart TB
    subgraph Test["Test Environment"]
        k6["k6 Cloud (Portland, OR)"]
    end

    subgraph CF["Cloudflare Edge"]
        subgraph Worker["op-token Worker"]
            TE["Token Exchange Handler (RFC 8693)"]
            CA["Client Authentication"]
            STV["Subject Token Validation"]
            SI["Scope Intersection"]
            ATG["Access Token Generation"]
        end

        subgraph DO["Durable Objects (shared)"]
            KM["KeyManager (1)
JWK management, signing key"]
            TRS["TokenRevocationStore (8 shards)
Token revocation check"]
        end

        subgraph DB["Database"]
            D1["D1: Clients, Users"]
        end
    end

    k6 -->|HTTPS| TE
    TE --> CA
    CA --> STV
    STV --> SI
    SI --> ATG
    TE -->|"RPC Call (parallel)"| KM
    TE -->|"RPC Call (parallel)"| TRS
    TRS --> D1

Bottleneck Analysis

Layer	2,000 RPS	2,500 RPS	3,000 RPS
K6 Client P95	500ms ⚠️	225ms ✅	2,144ms ❌
Worker CPU P50	2.27ms ✅	2.23ms ✅	2.13ms ✅
Worker Duration P50	23.83ms ✅	23.09ms ✅	204.33ms ⚠️
DO Wall Time P50	17.76ms ✅	15.08ms ✅	759.34ms ❌
DO Wall Time P99	1,020ms ⚠️	271ms ✅	2,222ms ❌
Verdict	Variability	Optimal	Degraded

Conclusion

Authrim’s Token Exchange (RFC 8693) endpoint achieves:

Up to 2,500 RPS: Stable operation (K6 P95 225ms, CF DO P99 271ms)
3,000+ RPS: Visible degradation (K6 P95 > 2,100ms, CF DO P50 > 750ms)

Primary bottleneck is Durable Object queuing delay - not CPU or cryptographic operations. Further scale-out requires DO optimization or architecture changes (cache layer addition).

100% success rate achieved at all RPS levels - token validation accuracy is maintained even at throughput limits.

Key Finding (English Summary)

Authrim’s Token Exchange endpoint sustains 2,500 RPS under realistic service-to-service authorization workloads, with strict token validation and revocation checks enabled.

The observed upper limit is defined by Durable Object queueing, not CPU or cryptographic operations.

This benchmark includes:

Full JWT RS256 signature verification on every request
Real-time revocation checks against Durable Object storage
Mixed token types (70% valid, 10% expired, 10% invalid, 10% revoked)
Delegation flow testing (14% with actor_token)
Audience variation (20 different target audiences)
Scope downgrading (4 scope patterns)