Skip to content
Feb 28

Design a Chat System

MT
Mindli Team

AI-Generated Content

Design a Chat System

Designing a scalable, real-time chat system is a quintessential system design challenge, testing your ability to reason about concurrent connections, stateful communication, and data consistency. Whether for an app like WhatsApp, Slack, or a custom platform, such a system must deliver messages instantly, guarantee their arrival, and inform users who's online—all while serving millions of users simultaneously.

Core Concepts: From Connection to Delivery

The architecture of a chat system revolves around managing persistent, bidirectional connections and ensuring messages flow reliably from sender to recipient(s). The journey of a single message touches several critical subsystems.

1. Connection Management with WebSockets

The foundation of real-time chat is a persistent, full-duplex communication channel. While HTTP is request-response, WebSocket connections allow the server to push data to the client instantly without being polled. A client establishes a WebSocket handshake (initiated via HTTP), which upgrades the connection to a persistent TCP socket.

In a scaled system, clients don't connect to a monolithic server. Instead, they connect to a pool of Connection Management Servers (often called WebSocket servers or gateway servers). These servers maintain the open socket for each user, manage authentication, and are responsible for keeping the connection alive with heartbeats (ping/pong frames). Their primary job is to be the bridge: they receive outgoing messages from their connected clients and listen for incoming messages destined for those clients. A major challenge here is session management; if a user reconnects (e.g., after a network drop), they should ideally land on the same server or have their connection state seamlessly transferred to maintain message ordering guarantees.

2. Message Routing and Delivery Guarantees

When User A sends a message to User B, it first travels from A's client to A's connection server. That server does not know where User B is connected. This is where a Message Routing service (or a message queue/bus) becomes essential. The connection server publishes the message to a central routing system, tagging it with the recipient's ID.

The routing system's job is to find B's connection location. This is typically done by consulting a Presence Service or a session store that maps user IDs to their current connection server ID. Once located, the router forwards the message to B's connection server, which then pushes it down B's open WebSocket.

Delivery guarantees are critical. A common pattern is using acknowledgments (ACKs). The sender's client expects an ACK from the server upon successful receipt. The recipient's client also sends an ACK back to the server upon receiving and rendering the message. Only when the sender receives the ACK is the message marked as "delivered" (one checkmark). When the recipient's ACK returns, it's marked as "read" (two checkmarks). Messages are stored durably at each step (e.g., in a queue) until the next ACK is received, allowing for retries in case of failure.

3. Persistent Storage and Offline Messaging

Not all users are online all the time. Offline message storage ensures no message is lost. When the routing system attempts to deliver a message but finds the recipient is offline (the presence service has no active session), it diverts the message to a persistent database. This is often a write-optimized, scalable database like Cassandra or S3-backed timeline storage, keyed by recipient ID.

When the user comes back online, their client synchronizes with the server. The connection server queries the persistent storage for undelivered messages for that user ID and "replays" them in order. For group chats, this storage also maintains the conversation history, allowing new members to scroll back. The choice of storage is crucial: it must handle a high write volume (every message sent) and efficient range reads by user/conversation.

4. Group Chat and the Fan-Out Problem

A direct chat is a one-to-one flow. A group chat is a one-to-many problem, known as fan-out. When a message is sent to a group of N users, the system must deliver it to N recipients. A naive approach would be for the sender's connection server to write N copies of the message to storage and route to N recipients. This is wasteful and slow.

An optimized design uses a hybrid approach. The message is written once to persistent storage with a group ID. The Message Router then performs a logical fan-out: it retrieves the list of member IDs from a groups service and, for each online member (checked via the Presence Service), routes the message to their connection server. For offline members, a pointer to the single stored message is added to their offline inbox queue. This approach, called lazy fan-out, saves storage and processing for online delivery while still guaranteeing delivery to offline users.

5. Presence Tracking and Push Notifications

Presence tracking (showing "Online," "Typing...", or "Last seen...") requires a stateful service. When a client establishes a WebSocket connection, it updates the Presence Service (a fast, in-memory store like Redis). This store holds the user's status and their current connection server address. Heartbeats keep this entry alive. Upon graceful disconnect, the status changes to "offline" and a "last seen" timestamp is saved. This service must be low-latency, as it's queried on every message routing operation and every time a user opens a contact list.

What if the user's chat app is closed? Push Notification Integration is the fallback. Mobile Operating Systems (iOS/Android) provide push services (APNs/FCM). When a message needs to be delivered to an offline user whose app is not running, the system sends a payload to the relevant push service, which then wakes up the device with an alert. This is a separate, asynchronous pipeline that often involves a dedicated notification server.

Common Pitfalls

  1. Ignoring Message Ordering Guarantees: In a distributed system with multiple connection servers and retries, messages can arrive out of order. A common pitfall is not implementing sequence IDs. Correction: Attach a monotonically increasing sequence number (or a timestamp with sufficient granularity) to each message within a conversation on the server-side. Clients can use this to reorder messages upon receipt.
  2. Overloading the Database with Online Fan-Out: Writing a separate copy of a group message for each recipient (online or not) will cripple your database under load. Correction: Implement the lazy fan-out strategy described above. Store the message body once per group, and only store lightweight pointers or metadata in each recipient's sync timeline.
  3. Treating the WebSocket Connection as Forever Reliable: Networks fail, phones sleep, servers crash. Designing as if the connection is always stable leads to lost messages and ghost users. Correction: Implement robust heartbeats (ping/pong) to detect dead connections. Use client-side and server-side ACK/retry mechanisms. Ensure the Presence Service has a timeout shorter than the heartbeat interval to quickly mark users as offline.
  4. Forgetting the Scale of Connection State: Storing millions of active WebSocket connections in memory on a single server is impossible. A related pitfall is not planning for how to broadcast a system-wide message or update. Correction: Use a cluster of connection servers behind a load balancer that supports WebSocket persistence (e.g., using session affinity). For system broadcasts, use a dedicated pub/sub system where all connection servers subscribe and fan-out to their local connected clients.

Summary

  • Real-time communication is powered by WebSockets, managed by scalable connection servers that maintain persistent, stateful links to millions of clients.
  • Reliability is achieved through a multi-stage acknowledgment system and durable persistent storage, which together ensure no message is lost, whether the recipient is online or offline.
  • Group chats require efficient fan-out, optimally handled by writing the message once and lazily fanning out notifications or pointers to online and offline users separately.
  • System state is tracked by a dedicated, low-latency Presence Service, which is essential for routing and showing active status, with push notifications serving as a fallback channel for inactive apps.
  • A robust chat system is a federation of specialized services—connection management, message routing, presence, storage, and push notifications—orchestrated to create the illusion of seamless, instant communication.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.