Data Structures: Graphs

Graphs are one of the most practical data structures in computer science because they model relationships. When you work with road networks, internet routing, dependency graphs in build systems, recommendations in social networks, or scheduling problems, you are usually working with a graph, whether you name it that way or not.

At a high level, a graph is a set of vertices (also called nodes) and edges (connections between nodes). Edges may be directed (from A to B) or undirected (A connected to B), and they may carry weights such as distance, cost, latency, or capacity.

What graphs represent in real systems

Graphs show up whenever “things” are connected:

Routing and navigation: intersections are vertices, roads are edges, weights are travel time or distance.
Computer networks: devices are vertices, links are edges, weights may be latency or bandwidth cost.
Social networks: users are vertices, relationships are edges; directed edges represent follows, undirected edges represent friendships.
Project planning and dependencies: tasks are vertices; an edge $A \to B$ means “A must happen before B.”
Optimization problems: many can be reduced to shortest paths, spanning trees, or flow problems on graphs.

Understanding representation choices and a handful of core algorithms goes a long way in solving these problems efficiently.

Core terminology and types of graphs

A few definitions matter because they guide algorithm choice:

Degree: number of edges touching a vertex. In directed graphs, you distinguish in-degree and out-degree.
Path: a sequence of vertices connected by edges.
Cycle: a path that starts and ends at the same vertex.
Connected graph (undirected): every vertex is reachable from every other.
Strongly connected (directed): every vertex is reachable from every other following edge directions.
DAG (Directed Acyclic Graph): a directed graph with no cycles, common for dependencies.

In many applications you will also care whether the graph is sparse (few edges relative to vertices) or dense (many edges). This has a direct impact on performance.

Graph representations

How you store a graph determines memory usage and how fast you can query neighbors.

Adjacency list

An adjacency list stores, for each vertex, a list of its outgoing neighbors (and weights if any). For example, in a weighted graph you might store pairs like (neighbor, weight).

Space: $O (V + E)$
Best for: sparse graphs, most real-world networks
Pros: efficient iteration over neighbors
Cons: checking whether a specific edge exists can be slower unless you add a hash set or similar structure

Adjacency matrix

An adjacency matrix is a $V \times V$ table where entry $(i, j)$ indicates whether an edge exists (and possibly its weight).

Space: $O (V^{2})$
Best for: dense graphs, small graphs, constant-time edge existence checks
Pros: edge lookup is $O (1)$
Cons: expensive memory usage for large graphs; iterating neighbors costs $O (V)$ per vertex

Edge list

An edge list stores all edges as a list of (u, v) or (u, v, w) tuples.

Space: $O (E)$
Best for: algorithms that primarily sort or scan edges (notably minimum spanning tree algorithms)
Pros: simple, compact
Cons: slower neighborhood queries

Breadth-First Search (BFS)

BFS explores a graph level by level starting from a source vertex, using a queue. In an unweighted graph, BFS finds the shortest path in terms of number of edges.

Typical uses: shortest paths in unweighted graphs, finding connected components, testing bipartiteness, minimum number of steps in state-space problems
Time complexity: $O (V + E)$ with adjacency lists

Example: In a social network, if you want “friends within 2 hops,” BFS naturally expands one layer of connections at a time.

A common practical detail is tracking:

a visited set to prevent reprocessing
a parent map to reconstruct paths
a distance map for the level number

Depth-First Search (DFS)

DFS explores as far as possible down one branch before backtracking. It can be implemented recursively or iteratively with a stack.

Typical uses: cycle detection, topological sorting (for DAGs), finding strongly connected components (with additional methods), exploring all reachable states
Time complexity: $O (V + E)$ with adjacency lists

DFS is often the tool you reach for when the question feels like “explore everything and detect structure.” For example, in a dependency graph, DFS can detect a cycle that would make a build impossible.

Shortest path algorithms

“Shortest path” depends on edge weights and constraints. Choosing the right algorithm is crucial for correctness.

Dijkstra’s algorithm

Dijkstra’s algorithm computes shortest paths from a source to all vertices when all edge weights are non-negative. It repeatedly picks the next vertex with the smallest tentative distance, typically using a priority queue.

When to use: routing with non-negative costs, travel times, distances
Time complexity: commonly $O ((V + E) lo g V)$ with a binary heap priority queue
Key constraint: does not work with negative edge weights

In practice, Dijkstra is used in navigation and network routing because negative “distances” do not make physical sense in those domains.

Bellman-Ford (when negative weights exist)

When negative edge weights are possible, Bellman-Ford can compute shortest paths and detect negative cycles (which imply there is no well-defined shortest path).

When to use: graphs that can contain negative weights
Time complexity: $O (V E)$

Even if your final system never expects negative weights, Bellman-Ford is useful for validation and for domains like financial arbitrage graphs where negative cycles have meaning.

Shortest paths in unweighted graphs

If every edge has the same cost, BFS is effectively the shortest path algorithm. This is a common optimization: if weights are uniform, do not pay for Dijkstra.

Minimum Spanning Tree (MST) algorithms

In an undirected weighted graph, a spanning tree connects all vertices with no cycles. A minimum spanning tree is the spanning tree with the smallest total edge weight.

MSTs matter in network design and clustering. If you need to connect a set of sites with minimum total cable length, or build a backbone network, an MST is a natural solution.

Kruskal’s algorithm

Kruskal’s algorithm sorts edges by weight and adds them in increasing order, skipping any edge that would form a cycle. Cycle checks are typically handled with a Disjoint Set Union (Union-Find) structure.

Best for: sparse graphs, when edges are easy to sort and scan
Time complexity: dominated by sorting, typically $O (E lo g E)$

Prim’s algorithm

Prim’s algorithm grows the MST from an arbitrary start vertex by repeatedly adding the cheapest edge that connects the current tree to a new vertex, often using a priority queue.

Best for: dense graphs or when using adjacency structures efficiently
Time complexity: commonly $O (E lo g V)$ with a heap; can be improved with specialized structures

Practical considerations and common pitfalls

Directed vs undirected assumptions

Many mistakes come from applying an undirected mental model to directed data. Reachability, connectedness, and cycles behave differently when edge direction matters.

Weights and correctness

If there is any chance of negative edge weights, Dijkstra can silently return incorrect results. A safe approach is to validate weights or select Bellman-Ford when negative weights are possible.

Graph size and representation choice

For large, sparse graphs, an adjacency list is usually the only realistic choice. An adjacency matrix can be prohibitively expensive because $V^{2}$ grows quickly. For example, $100, 000$ vertices would imply $1 0^{10}$ matrix entries, far beyond typical memory limits.

Path reconstruction

“Find the shortest distance” is often not enough in real systems; you need the actual route. Store predecessor pointers during BFS or Dijkstra so you can reconstruct the path by walking backward from the target.

Putting it together: choosing the right tool

A simple decision guide helps:

Need reachability or structure exploration? Use BFS or DFS.
Need shortest path with equal edge costs? Use BFS.
Need shortest path with non-negative weights? Use Dijkstra.
Need shortest path with possible negative weights or cycle detection? Use Bellman-Ford.
Need a low-cost way to connect all nodes in an undirected weighted graph? Use Kruskal or Prim for an MST.

Graphs are a deep topic, but these representations and algorithms cover a large fraction of real-world work. Once you are comfortable with them, you can tackle more advanced areas like topological sorting for DAG scheduling, strongly connected components for dependency analysis, and flow algorithms for allocation and logistics.

Data Structures: Graphs

Data Structures: Graphs

What graphs represent in real systems

Core terminology and types of graphs

Graph representations

Adjacency list

Adjacency matrix

Edge list

Breadth-First Search (BFS)

Depth-First Search (DFS)

Shortest path algorithms

Dijkstra’s algorithm

Bellman-Ford (when negative weights exist)

Shortest paths in unweighted graphs

Minimum Spanning Tree (MST) algorithms

Kruskal’s algorithm

Prim’s algorithm

Practical considerations and common pitfalls

Directed vs undirected assumptions

Weights and correctness

Graph size and representation choice

Path reconstruction

Putting it together: choosing the right tool

Write better notes with AI