Rate limiter LLD

Design Intent

What your code is doing

Strategy Pattern

The RateLimiter interface gives every limiter the same contract, so the client can call one method no matter which algorithm is being used.

Factory-Friendly Structure

Your comment is right: a factory method can later choose the limiter type, while the client stays unchanged and talks only to the interface.

Concurrent Simulation

The client uses a worker pool, a blocking queue, and a notifier thread to simulate lots of requests and print results cleanly.

Interface Logic

Why the interface and `@Override` matter

Why use the interface

RateLimiter defines one behavior: isAllowRequest(). That lets the client swap algorithms without changing the calling code.

Why override the method

@Override tells the compiler that each class is implementing the interface contract correctly. It prevents signature mistakes and makes the code intention obvious.

Why this improves design

If you add another limiter tomorrow, you only implement the same method in a new class and plug it into the system.

Thread Safety

How the implementation stays safe under many threads

01

Fixed Window uses atomics

AtomicInteger updates the request counter safely and AtomicLong protects the current window start. compareAndSet ensures only one thread performs the reset.

02

Sliding Window uses a lock around the full decision

Removing expired timestamps, checking the queue size, and adding a new timestamp must happen as one atomic action. The ReentrantLock protects that whole block.

03

Token Bucket uses a lock for refill plus consume

Refill calculation and token consumption must happen together. Otherwise two threads might see the same token and both think they can use it.

04

The client uses safe concurrency primitives

AtomicInteger safely generates request IDs, LinkedBlockingQueue safely transfers events, and the executor handles multi-threaded task scheduling.

Core Contract

The shared interface and the multi-threaded client

Interface

One method, many strategies

The whole design hangs on one shared contract. Every limiter overrides the same method, which is why the client can test all implementations uniformly.

interface RateLimiter {
    boolean isAllowRequest();
}

Why this matters

The client does not care whether it is talking to Fixed Window, Sliding Window, or Token Bucket. It only cares that each one answers the same question: allow or reject?

Client

How the executor, queue, and printer work together

The client creates all strategies, fires requests from a 10-thread pool, and sends results to a blocking queue so a printer thread can log them safely.

RateLimiter fixed = new FixedWindowRateLimiter(5);
RateLimiter sliding = new SlidingWindow(5);
RateLimiter token = new TokenBucket(5, 2);

ExecutorService exec = Executors.newFixedThreadPool(10);
AtomicInteger requestId = new AtomicInteger(0);
BlockingQueue<RequestEvent> queue = new LinkedBlockingQueue<>();

Thread printer = new Thread(() -> {
    try {
        while (!Thread.currentThread().isInterrupted()) {
            RequestEvent ev = queue.poll(500, TimeUnit.MILLISECONDS);
            if (ev == null) continue;
            printEvent(ev);
        }
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }
}, "notifier-printer");

Client behavior

The queue decouples request processing from logging. Workers keep generating decisions, while the printer thread formats output asynchronously.

Algorithms

Accurate request flow, explanation, and full snippet for each limiter

Fixed Window

Count requests inside a fixed one-second window

This limiter only cares about the current one-second bucket. If the request count stays below the threshold, the request is green and passes. If the counter already crossed the threshold in the same time window, the request turns red because the fixed window is full.

class FixedWindowRateLimiter implements RateLimiter {
    private final int threshold;
    private static final long TIMELIMIT_MS = 1000L;
    private final AtomicInteger counter = new AtomicInteger(0);
    private final AtomicLong windowStartTime;

    FixedWindowRateLimiter(int threshold){
        this.threshold = threshold;
        this.windowStartTime = new AtomicLong(System.currentTimeMillis());
    }

    @Override
    public boolean isAllowRequest(){
        long currTime = System.currentTimeMillis();
        long startTime = this.windowStartTime.get();
        if (currTime - startTime >= TIMELIMIT_MS) {
            if (this.windowStartTime.compareAndSet(startTime, currTime)) {
                counter.set(0);
            }
        }
        return counter.incrementAndGet() <= threshold;
    }
}

Fixed Window request flow

REQ F12

->

1 sec window

->

Allowed

Reason: count within threshold

The slots show how full the current window is. Once the window is full, new requests are rejected until time resets the bucket.

Sliding Window

Keep only the recent timestamps

This limiter cleans out old requests first, then checks how many recent requests are still active. If the recent log is already full, the request turns red because the sliding window capacity is exceeded.

class SlidingWindow implements RateLimiter {
    private final int threshold;
    private static final long REQUEST_LIMIT_MS = 1000L;

    private final Queue<Long> logTable = new ConcurrentLinkedQueue<>();
    private final ReentrantLock lock = new ReentrantLock();

    SlidingWindow(int threshold){
        this.threshold = threshold;
    }

    @Override
    public boolean isAllowRequest(){
        long currentWindowTime = System.currentTimeMillis();
        lock.lock();
        try{
            while (!logTable.isEmpty() &&
                   currentWindowTime - logTable.peek() >= REQUEST_LIMIT_MS){
                logTable.poll();
            }

            if(logTable.size() < threshold){
                logTable.offer(currentWindowTime);
                return true;
            } else {
                return false;
            }
        } finally {
           lock.unlock();
        }
    }
}

Sliding Window request flow

REQ S21

->

recent log

->

Allowed

Reason: recent log has room

The chips are recent requests. When too many are still inside the active time range, the next request is rejected.

Token Bucket

Consume tokens and refill over time

This limiter allows bursts up to bucket capacity. A request stays green if a token exists. It turns red when the bucket is empty, and the reason is simply that no tokens are available right now.

class TokenBucket implements RateLimiter {
    private final int capacity;
    private int tokens;
    private final int refillRate;
    private long lastRefillTime;

    private final ReentrantLock lock = new ReentrantLock();

    TokenBucket(int limit, int refillRate) {
        this.capacity = limit;
        this.tokens = limit;
        this.refillRate = refillRate;
        this.lastRefillTime = System.currentTimeMillis();
    }

    @Override
    public boolean isAllowRequest() {
        lock.lock();
        try {
            long currTime = System.currentTimeMillis();

            long timePassed = currTime - lastRefillTime;
            int refill = (int) ((timePassed * refillRate) / 1000);

            if (refill > 0) {
                tokens = Math.min(capacity, tokens + refill);
                lastRefillTime = currTime;
            }

            if (tokens > 0) {
                tokens--;
                return true;
            } else {
                return false;
            }

        } finally {
            lock.unlock();
        }
    }
}

Token Bucket request flow

REQ T07

->

token bucket

->

Allowed

Reason: token available

5 / 5

The liquid level shows available tokens. If the bucket reaches zero, the request is rejected until refill adds capacity again.

Client Flow

How your client runs the full multi-threaded simulation

01

Create three limiter strategies

The client creates one fixed-window limiter, one sliding-window limiter, and one token-bucket limiter through the shared interface type.

02

Use a 10-thread executor

Each worker repeatedly sends the same request through all three limiters, so the code tests how each algorithm behaves under pressure.

03

Queue printable events

Every decision becomes a RequestEvent and is pushed into a LinkedBlockingQueue, which safely transfers the event to the printer thread.

04

Print results asynchronously

The notifier thread prints allowed requests as one-line logs and rejected requests as detailed boxes with limiter, reason, thread, and timestamp.

Run The Code

Read it here, run it in the embedded Java compiler

Java Source

This static page stays GitHub Pages friendly. For the embedded compiler, the package line is removed so the main class can run directly.

import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.ReentrantLock;
import java.util.concurrent.*;

interface RateLimiter {
    boolean isAllowRequest();
}

class FixedWindowRateLimiter implements RateLimiter {
    private final int threshold;
    private static final long TIMELIMIT_MS = 1000L;
    private final AtomicInteger counter = new AtomicInteger(0);
    private final AtomicLong windowStartTime;

    FixedWindowRateLimiter(int threshold){
        this.threshold = threshold;
        this.windowStartTime = new AtomicLong(System.currentTimeMillis());
    }

    @Override
    public boolean isAllowRequest(){
        long currTime = System.currentTimeMillis();
        long startTime = this.windowStartTime.get();
        if (currTime - startTime >= TIMELIMIT_MS) {
            if (this.windowStartTime.compareAndSet(startTime, currTime)) {
                counter.set(0);
            }
        }
        return counter.incrementAndGet() <= threshold;
    }
}

class SlidingWindow implements RateLimiter {
    private final int threshold;
    private static final long REQUEST_LIMIT_MS = 1000L;

    private final Queue<Long> logTable = new ConcurrentLinkedQueue<>();
    private final ReentrantLock lock = new ReentrantLock();

    SlidingWindow(int threshold){
        this.threshold = threshold;
    }

    @Override
    public boolean isAllowRequest(){
        long currentWindowTime = System.currentTimeMillis();
        lock.lock();
        try{
            while (!logTable.isEmpty()  && currentWindowTime - logTable.peek() >= REQUEST_LIMIT_MS){
                logTable.poll();
            }

            if(logTable.size()<threshold){
                logTable.offer(currentWindowTime);
                return true;
            }else {
                return false;
            }
        } finally {
           lock.unlock();
        }
    }
}

class TokenBucket implements RateLimiter {

    private final int capacity;
    private int tokens;
    private final int refillRate;
    private long lastRefillTime;

    private final ReentrantLock lock = new ReentrantLock();

    TokenBucket(int limit, int refillRate) {
        this.capacity = limit;
        this.tokens = limit;
        this.refillRate = refillRate;
        this.lastRefillTime = System.currentTimeMillis();
    }

    @Override
    public boolean isAllowRequest() {
        lock.lock();
        try {
            long currTime = System.currentTimeMillis();

            long timePassed = currTime - lastRefillTime;
            int refill = (int) ((timePassed * refillRate) / 1000);

            if (refill > 0) {
                tokens = Math.min(capacity, tokens + refill);
                lastRefillTime = currTime;
            }

            if (tokens > 0) {
                tokens--;
                return true;
            } else {
                return false;
            }

        } finally {
            lock.unlock();
        }
    }
}

public class RateLimiterClient {
    public RateLimiterClient() {
        System.out.println("=== Rate Limiter Client ===");
        System.out.println("Running multi-threaded simulation...\n");
        multiThreadTest();
    }

    public static void main(String[] args) {
        System.out.println("=== Rate Limiter Client ===");
        multiThreadTest();
    }

    private static void multiThreadTest() {
        RateLimiter fixed = new FixedWindowRateLimiter(5);
        RateLimiter sliding = new SlidingWindow(5);
        RateLimiter token = new TokenBucket(5, 2);

        ExecutorService exec = Executors.newFixedThreadPool(10);
        AtomicInteger requestId = new AtomicInteger(0);

        BlockingQueue<RequestEvent> queue = new LinkedBlockingQueue<>();

        Thread printer = new Thread(() -> {
            try {
                while (!Thread.currentThread().isInterrupted()) {
                    RequestEvent ev = queue.poll(500, TimeUnit.MILLISECONDS);
                    if (ev == null) continue;
                    printEvent(ev);
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }, "notifier-printer");

        printer.start();

        Runnable task = () -> {
            String threadName = Thread.currentThread().getName();
            for (int i = 0; i < 20; i++) {
                int id = requestId.incrementAndGet();

                boolean f = fixed.isAllowRequest();
                queue.offer(new RequestEvent(id, "FIXED", f, f ? null : "window threshold reached", threadName, System.currentTimeMillis()));

                boolean s = sliding.isAllowRequest();
                queue.offer(new RequestEvent(id, "SLIDING", s, s ? null : "sliding window capacity exceeded", threadName, System.currentTimeMillis()));

                boolean t = token.isAllowRequest();
                queue.offer(new RequestEvent(id, "TOKEN", t, t ? null : "no tokens available", threadName, System.currentTimeMillis()));

                try {
                    Thread.sleep(50);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    break;
                }
            }
        };

        for (int i = 0; i < 10; i++) {
            exec.submit(task);
        }

        exec.shutdown();
        try {
            exec.awaitTermination(15, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        while (!queue.isEmpty()) {
            try {
                Thread.sleep(50);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            }
        }

        printer.interrupt();
        try {
            printer.join(1000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private static void printEvent(RequestEvent ev) {
        if (ev.allowed) {
            System.out.printf("[REQ %04d] %s allowed by %s\n", ev.id, ev.limiter, ev.threadName);
        } else {
            System.out.println("+------------------------------------------------+");
            System.out.printf("| REQ %04d REJECTED by %-14s |\n", ev.id, ev.threadName);
            System.out.printf("| Limiter: %-33s |\n", ev.limiter);
            System.out.printf("| Reason: %-35s |\n", ev.reason);
            System.out.printf("| Time: %d |\n", ev.timestamp);
            System.out.println("+------------------------------------------------+");
        }
    }

    private static class RequestEvent {
        final int id;
        final String limiter;
        final boolean allowed;
        final String reason;
        final String threadName;
        final long timestamp;

        RequestEvent(int id, String limiter, boolean allowed, String reason, String threadName, long timestamp) {
            this.id = id;
            this.limiter = limiter;
            this.allowed = allowed;
            this.reason = reason;
            this.threadName = threadName;
            this.timestamp = timestamp;
        }
    }
}

Open full compiler

Embedded Compiler

Interview Ready

Notes For Interview

How to explain the design

This Rate Limiter LLD exposes one `RateLimiter` interface and plugs in different algorithms behind it: Fixed Window, Sliding Window, and Token Bucket. The client depends on the interface, not concrete classes, so the limiter algorithm can be changed based on API, user tier, or traffic pattern.

Flow Diagram

Incoming request arrives

Identify client or API key

Pick RateLimiter implementation

Check window / bucket state

Update counters atomically

Allow or reject request

UML Diagram: IS-A and HAS-A

RateLimiter

IS-A

FixedWindowRateLimiter

IS-A

SlidingWindow

RateLimiter

IS-A

TokenBucket

HAS-A

capacity, tokens, refillRate

RateLimiterClient

HAS-A / USES

RateLimiter

HAS-A

RequestEvent

Core classes

RateLimiter: common contract for checking if a request is allowed.
FixedWindowRateLimiter: simple counter reset after each fixed time window.
SlidingWindow: tracks recent request timestamps for smoother limiting.
TokenBucket: refills tokens over time and allows bursts up to capacity.
RateLimiterClient: simulates concurrent calls through the shared interface.

Why these patterns are used

Strategy Pattern is used because rate limiting has multiple valid algorithms, and each one has different tradeoffs. Fixed Window is simple and memory-efficient, Sliding Window is smoother and more accurate, and Token Bucket supports controlled bursts. The client should depend on the `RateLimiter` interface and not hard-code a specific algorithm.

This design follows Open/Closed Principle. If tomorrow we need Leaky Bucket, Sliding Window Counter, or a distributed Redis-based limiter, we can add a new implementation of `RateLimiter` without changing the client code that calls `isAllowRequest()`.

Algorithm explanation

Fixed Window keeps a counter for a fixed time range. If the current request belongs to the same window, the counter increments. If the window has expired, the counter resets. It is fast and simple, but it can allow traffic spikes at window boundaries.

Sliding Window stores timestamps of recent requests. Before allowing a new request, it removes timestamps older than the allowed duration and checks the remaining count. This gives a more accurate limit across time, but it uses more memory because it tracks request history.

Token Bucket maintains a bucket of tokens. Tokens refill over time up to a maximum capacity. Each request consumes one token. If no token is available, the request is rejected. This is often preferred in production because it allows short bursts while preserving the average request rate.

Strong interview answer

"I would expose a `RateLimiter` interface and implement algorithms independently. Fixed window is simple but can allow boundary bursts. Sliding window is more accurate but stores more state. Token bucket is a strong production default because it supports controlled bursts while maintaining an average rate."

Production improvements

Store counters in Redis for distributed rate limiting.
Use atomic Redis commands or Lua scripts to avoid race conditions.
Apply limits per user, IP, API key, tenant, or endpoint.
Return retry-after metadata and structured rejection reasons.
Add config-driven limits for free, premium, and internal clients.

Weakness to mention

The demo is in-memory and process-local. In a real multi-server system, each instance would have its own counters unless state is moved to a shared atomic store like Redis.