Design Intent
What your code is doing
Strategy Pattern
The RateLimiter interface gives every limiter the same contract, so the client can call one method no matter which algorithm is being used.
Factory-Friendly Structure
Your comment is right: a factory method can later choose the limiter type, while the client stays unchanged and talks only to the interface.
Concurrent Simulation
The client uses a worker pool, a blocking queue, and a notifier thread to simulate lots of requests and print results cleanly.
Interface Logic
Why the interface and `@Override` matter
Why use the interface
RateLimiter defines one behavior: isAllowRequest(). That lets the client swap algorithms without changing the calling code.
Why override the method
@Override tells the compiler that each class is implementing the interface contract correctly. It prevents signature mistakes and makes the code intention obvious.
Why this improves design
If you add another limiter tomorrow, you only implement the same method in a new class and plug it into the system.
Thread Safety
How the implementation stays safe under many threads
Fixed Window uses atomics
AtomicInteger updates the request counter safely and AtomicLong protects the current window start. compareAndSet ensures only one thread performs the reset.
Sliding Window uses a lock around the full decision
Removing expired timestamps, checking the queue size, and adding a new timestamp must happen as one atomic action. The ReentrantLock protects that whole block.
Token Bucket uses a lock for refill plus consume
Refill calculation and token consumption must happen together. Otherwise two threads might see the same token and both think they can use it.
The client uses safe concurrency primitives
AtomicInteger safely generates request IDs, LinkedBlockingQueue safely transfers events, and the executor handles multi-threaded task scheduling.
Core Contract
The shared interface and the multi-threaded client
Interface
One method, many strategies
The whole design hangs on one shared contract. Every limiter overrides the same method, which is why the client can test all implementations uniformly.
interface RateLimiter {
boolean isAllowRequest();
}
Client
How the executor, queue, and printer work together
The client creates all strategies, fires requests from a 10-thread pool, and sends results to a blocking queue so a printer thread can log them safely.
RateLimiter fixed = new FixedWindowRateLimiter(5);
RateLimiter sliding = new SlidingWindow(5);
RateLimiter token = new TokenBucket(5, 2);
ExecutorService exec = Executors.newFixedThreadPool(10);
AtomicInteger requestId = new AtomicInteger(0);
BlockingQueue<RequestEvent> queue = new LinkedBlockingQueue<>();
Thread printer = new Thread(() -> {
try {
while (!Thread.currentThread().isInterrupted()) {
RequestEvent ev = queue.poll(500, TimeUnit.MILLISECONDS);
if (ev == null) continue;
printEvent(ev);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}, "notifier-printer");
Algorithms
Accurate request flow, explanation, and full snippet for each limiter
Fixed Window
Count requests inside a fixed one-second window
This limiter only cares about the current one-second bucket. If the request count stays below the threshold, the request is green and passes. If the counter already crossed the threshold in the same time window, the request turns red because the fixed window is full.
class FixedWindowRateLimiter implements RateLimiter {
private final int threshold;
private static final long TIMELIMIT_MS = 1000L;
private final AtomicInteger counter = new AtomicInteger(0);
private final AtomicLong windowStartTime;
FixedWindowRateLimiter(int threshold){
this.threshold = threshold;
this.windowStartTime = new AtomicLong(System.currentTimeMillis());
}
@Override
public boolean isAllowRequest(){
long currTime = System.currentTimeMillis();
long startTime = this.windowStartTime.get();
if (currTime - startTime >= TIMELIMIT_MS) {
if (this.windowStartTime.compareAndSet(startTime, currTime)) {
counter.set(0);
}
}
return counter.incrementAndGet() <= threshold;
}
}
The slots show how full the current window is. Once the window is full, new requests are rejected until time resets the bucket.
Sliding Window
Keep only the recent timestamps
This limiter cleans out old requests first, then checks how many recent requests are still active. If the recent log is already full, the request turns red because the sliding window capacity is exceeded.
class SlidingWindow implements RateLimiter {
private final int threshold;
private static final long REQUEST_LIMIT_MS = 1000L;
private final Queue<Long> logTable = new ConcurrentLinkedQueue<>();
private final ReentrantLock lock = new ReentrantLock();
SlidingWindow(int threshold){
this.threshold = threshold;
}
@Override
public boolean isAllowRequest(){
long currentWindowTime = System.currentTimeMillis();
lock.lock();
try{
while (!logTable.isEmpty() &&
currentWindowTime - logTable.peek() >= REQUEST_LIMIT_MS){
logTable.poll();
}
if(logTable.size() < threshold){
logTable.offer(currentWindowTime);
return true;
} else {
return false;
}
} finally {
lock.unlock();
}
}
}
The chips are recent requests. When too many are still inside the active time range, the next request is rejected.
Token Bucket
Consume tokens and refill over time
This limiter allows bursts up to bucket capacity. A request stays green if a token exists. It turns red when the bucket is empty, and the reason is simply that no tokens are available right now.
class TokenBucket implements RateLimiter {
private final int capacity;
private int tokens;
private final int refillRate;
private long lastRefillTime;
private final ReentrantLock lock = new ReentrantLock();
TokenBucket(int limit, int refillRate) {
this.capacity = limit;
this.tokens = limit;
this.refillRate = refillRate;
this.lastRefillTime = System.currentTimeMillis();
}
@Override
public boolean isAllowRequest() {
lock.lock();
try {
long currTime = System.currentTimeMillis();
long timePassed = currTime - lastRefillTime;
int refill = (int) ((timePassed * refillRate) / 1000);
if (refill > 0) {
tokens = Math.min(capacity, tokens + refill);
lastRefillTime = currTime;
}
if (tokens > 0) {
tokens--;
return true;
} else {
return false;
}
} finally {
lock.unlock();
}
}
}
The liquid level shows available tokens. If the bucket reaches zero, the request is rejected until refill adds capacity again.
Client Flow
How your client runs the full multi-threaded simulation
Create three limiter strategies
The client creates one fixed-window limiter, one sliding-window limiter, and one token-bucket limiter through the shared interface type.
Use a 10-thread executor
Each worker repeatedly sends the same request through all three limiters, so the code tests how each algorithm behaves under pressure.
Queue printable events
Every decision becomes a RequestEvent and is pushed into a LinkedBlockingQueue, which safely transfers the event to the printer thread.
Print results asynchronously
The notifier thread prints allowed requests as one-line logs and rejected requests as detailed boxes with limiter, reason, thread, and timestamp.
Run The Code
Read it here, run it in the embedded Java compiler
Java Source
This static page stays GitHub Pages friendly. For the embedded compiler, the package line is removed so the main class can run directly.
Embedded Compiler
Interview Ready
Notes For Interview
How to explain the design
This Rate Limiter LLD exposes one `RateLimiter` interface and plugs in different algorithms behind it: Fixed Window, Sliding Window, and Token Bucket. The client depends on the interface, not concrete classes, so the limiter algorithm can be changed based on API, user tier, or traffic pattern.
Flow Diagram
UML Diagram: IS-A and HAS-A
Core classes
- RateLimiter: common contract for checking if a request is allowed.
- FixedWindowRateLimiter: simple counter reset after each fixed time window.
- SlidingWindow: tracks recent request timestamps for smoother limiting.
- TokenBucket: refills tokens over time and allows bursts up to capacity.
- RateLimiterClient: simulates concurrent calls through the shared interface.
Why these patterns are used
Strategy Pattern is used because rate limiting has multiple valid algorithms, and each one has different tradeoffs. Fixed Window is simple and memory-efficient, Sliding Window is smoother and more accurate, and Token Bucket supports controlled bursts. The client should depend on the `RateLimiter` interface and not hard-code a specific algorithm.
This design follows Open/Closed Principle. If tomorrow we need Leaky Bucket, Sliding Window Counter, or a distributed Redis-based limiter, we can add a new implementation of `RateLimiter` without changing the client code that calls `isAllowRequest()`.
Algorithm explanation
Fixed Window keeps a counter for a fixed time range. If the current request belongs to the same window, the counter increments. If the window has expired, the counter resets. It is fast and simple, but it can allow traffic spikes at window boundaries.
Sliding Window stores timestamps of recent requests. Before allowing a new request, it removes timestamps older than the allowed duration and checks the remaining count. This gives a more accurate limit across time, but it uses more memory because it tracks request history.
Token Bucket maintains a bucket of tokens. Tokens refill over time up to a maximum capacity. Each request consumes one token. If no token is available, the request is rejected. This is often preferred in production because it allows short bursts while preserving the average request rate.
Strong interview answer
"I would expose a `RateLimiter` interface and implement algorithms independently. Fixed window is simple but can allow boundary bursts. Sliding window is more accurate but stores more state. Token bucket is a strong production default because it supports controlled bursts while maintaining an average rate."
Production improvements
- Store counters in Redis for distributed rate limiting.
- Use atomic Redis commands or Lua scripts to avoid race conditions.
- Apply limits per user, IP, API key, tenant, or endpoint.
- Return retry-after metadata and structured rejection reasons.
- Add config-driven limits for free, premium, and internal clients.
Weakness to mention
The demo is in-memory and process-local. In a real multi-server system, each instance would have its own counters unless state is moved to a shared atomic store like Redis.