Reference: TPS Calculation for Payment Systems — APCosta Lab

What Is TPS

TPS (Transactions Per Second) is the primary metric for sizing payment gateways, acquirers, and financial switches.

Unlike RPS (requests per second) — which counts any HTTP request — TPS counts only complete financial transactions (authorization, capture, reversal, balance inquiry).

Essential Formulas

Average TPS

Average_TPS = Total daily transactions / (Operating hours × 3600)

Example: 500,000 tx/day over 16 hours of operation

Average_TPS = 500,000 / (16 × 3600) = 8.68 TPS

Peak TPS

Peaks typically occur at specific times (lunch hour, end of business day, special dates).

Peak_TPS = Average_TPS × Peak factor

Typical peak factors by context:

| Context | Typical peak factor | |---------|---------------------| | B2C e-commerce | 3x – 5x | | Generic gateway | 5x – 8x | | Black Friday / Flash sales | 10x – 15x | | Physical retail (lunch hour) | 4x – 6x |

Recommended capacity

Never size exactly for the peak — always add a safety margin:

Target_capacity = Peak_TPS × 1.3   (30% margin)

Reference Benchmarks

By transaction type (expected p95 latency)

| Operation | Acceptable p95 latency | Critical latency (alert) | |-----------|------------------------|--------------------------| | Online authorization | < 500ms | > 1,000ms | | Balance inquiry | < 200ms | > 500ms | | Reversal | < 800ms | > 2,000ms | | Capture | < 400ms | > 800ms |

TPS by setup (modern hardware, Java 21 + Spring Boot)

| Setup | Sustainable TPS | |-------|----------------| | 1 instance, 8 vCPU, local PostgreSQL | 200 – 500 TPS | | 3 instances + PgBouncer + Redis cache | 1,000 – 3,000 TPS | | Kubernetes cluster 10 pods + RDS Multi-AZ | 5,000 – 15,000 TPS |

Database Sizing

Required connections

DB_connections = Instances × Threads_per_instance × (1 + buffer)

With HikariCP (Spring Boot default):

recommended pool-size = (vCPU × 2) + effective_disk_spindles

Practical rule:

For PostgreSQL on SSD: maximum 100 connections per RDS.small instance.
Use PgBouncer in transaction mode for aggressive connection reuse.

Storage estimate (1 year)

Storage_GB = Average_TPS × 3600 × 8760 × Record_size_KB / 1,048,576

Example: 10 TPS × 2KB record = ~5.5 GB/year (raw data only, excluding indexes)

Common Bottlenecks

1. Database connection pool

Symptom: latency rises even when CPU is low. Fix: Increase pool size or add PgBouncer.

2. Table lock during reconciliation

Symptom: TPS drops sharply at scheduled times. Fix: Process reconciliation on a shadow table, not on the transactional one.

3. HSM timeout

Symptom: error rate rises at peak without high CPU usage. Fix: Dedicated HSM pool, retry with backoff, circuit breaker.

4. ISO8583 switch bottleneck

Symptom: queues grow during peak hours. Fix: Multiple TCP channels (sessions) per acquirer.

Sizing Checklist

[ ] What is the expected average TPS at launch?
[ ] What is the peak scenario (date, time, event)?
[ ] What is the maximum acceptable latency per transaction type?
[ ] Is the database pool correctly sized?
[ ] Is there a circuit breaker for the acquirer?
[ ] Has the system been tested at 130% of estimated peak TPS?
[ ] Is there a horizontal scaling plan (Kubernetes HPA or similar)?

Load Testing Tools

# k6 — modern, supports TCP/HTTP protocols
k6 run --vus 100 --duration 60s script.js

# Locust — Python, great for complex flows
locust -f locustfile.py --host http://localhost:8080

# JMeter — enterprise standard, good reporting
jmeter -n -t plan.jmx -l results.jtl