Operating Systems
Operating Systems
An operating system (OS) is the layer of software that turns hardware into a usable, reliable computing environment. It decides which program runs next, how memory is shared safely, how files are stored and retrieved, how devices like disks and network cards are accessed, and how users and applications are kept isolated from each other. When the OS does its job well, applications can be written as if they “own” the machine, even though many programs, users, and devices are competing for the same resources.
Modern operating systems are built around a few core responsibilities: process and thread management, scheduling, virtual memory, synchronization, file systems, I/O management, and security fundamentals. These pieces are tightly connected; changing one often affects the others.
Process and Thread Management
A process is a running program together with its execution context: its address space, open files, environment, and bookkeeping data the OS maintains. A thread is a unit of execution within a process. Multiple threads in the same process share memory and resources, which makes communication fast but also increases the risk of interference if synchronization is poor.
Why processes and threads exist
The OS uses processes and threads to provide:
- Isolation: One process crashing should not corrupt another process’s memory.
- Concurrency: Many tasks appear to run at once, improving responsiveness and throughput.
- Resource control: The OS can account for CPU time, memory usage, and I/O per process.
In practice, a web browser might run separate processes for tabs (for isolation) and use multiple threads inside each process (for rendering, networking, and UI responsiveness).
Context switching
When the OS switches the CPU from one thread to another, it performs a context switch: saving registers, program counter, and other state for the current thread, then restoring the next thread’s state. Context switching enables multitasking, but it is not free. Excessive switching can reduce performance, which is why scheduling and workload design matter.
CPU Scheduling
Because CPU time is limited, the OS must decide which runnable thread gets to execute. This is the job of the scheduler. Scheduling affects latency, throughput, and fairness, and the “best” policy depends on the system’s purpose.
Common scheduling goals
- Responsiveness: Interactive systems prioritize quick reaction to input.
- Throughput: Batch systems aim to maximize completed work per unit time.
- Fairness: Prevent starvation, where a thread waits indefinitely.
- Predictability: Real-time systems need bounded response times.
Typical scheduling approaches
- Time-sharing (preemptive scheduling): Threads get a time slice. The OS can preempt a running thread to let another run. This supports interactive workloads.
- Priority-based scheduling: Higher-priority threads run first. Practical implementations must prevent low-priority tasks from starving, often via priority aging or quotas.
- Real-time scheduling: Policies are designed around deadlines and worst-case timing. These systems may trade average performance for guarantees.
A key insight is that scheduling is coupled to I/O. Threads that block on disk or network often yield the CPU quickly, while CPU-bound threads consume longer slices. Many schedulers incorporate heuristics to balance these behaviors.
Virtual Memory and Memory Management
Physical memory (RAM) is finite, but applications behave as if they have large, private memory spaces. The OS provides this illusion through virtual memory.
Address spaces and protection
Each process typically gets its own virtual address space. The hardware memory management unit (MMU), guided by OS-managed page tables, translates virtual addresses to physical addresses. This enables:
- Protection: A process cannot read or write another process’s memory by default.
- Relocation: Programs can run without knowing where they reside in physical RAM.
- Sharing: Memory regions can be shared intentionally, such as shared libraries or shared-memory IPC.
Paging and performance
Virtual memory is commonly implemented with paging, where memory is divided into fixed-size pages. If a needed page is not in RAM, a page fault occurs and the OS loads it from secondary storage. This enables large address spaces but introduces major performance implications:
- Locality matters: Programs with good spatial and temporal locality perform well.
- Thrashing: If working sets exceed RAM, the system spends excessive time paging rather than executing useful work.
A practical consequence is that memory management is not just about capacity, but about minimizing expensive page faults and managing caches effectively.
Synchronization and Deadlocks
Concurrency introduces correctness problems. When two threads access shared data without coordination, results can be inconsistent or corrupted. Operating systems provide synchronization primitives to manage this safely.
Core synchronization tools
- Mutexes/locks: Ensure mutual exclusion for critical sections.
- Semaphores: Coordinate access to a limited resource pool or signal events.
- Condition variables: Allow threads to sleep until a condition becomes true.
- Atomic operations: Hardware-supported primitives used to build higher-level locks.
The OS must also manage synchronization internally, protecting kernel data structures such as process tables, memory maps, and file system metadata.
Deadlocks and how they arise
A deadlock occurs when a set of threads each waits for resources held by others, and none can proceed. The classic conditions associated with deadlocks are:
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
Operating systems address deadlocks through a combination of:
- Prevention: Designing rules that break at least one condition (for example, ordering lock acquisition).
- Avoidance: Making allocation decisions that keep the system in a safe state.
- Detection and recovery: Detecting cycles and terminating or rolling back work, common in some database-like subsystems.
In real systems, the most effective approach is often disciplined locking and resource ordering, because full avoidance can be too expensive.
File Systems
A file system defines how data is named, stored, retrieved, and protected. It bridges user-level concepts like “folders” and “files” with low-level realities like blocks on a disk or objects in flash storage.
Key concepts
- Namespaces and directories: Organize files into hierarchical paths.
- Metadata: Ownership, permissions, timestamps, size, and allocation information.
- Caching: The OS caches file data and metadata in memory to reduce I/O latency.
- Consistency: The system must remain correct across crashes and power loss.
Many modern file systems use techniques such as journaling or copy-on-write to improve reliability. Even without detailing specific implementations, the core goal is the same: avoid losing or corrupting data when failures occur.
Permissions and access control
File systems are a central enforcement point for security. Permissions determine who can read, write, or execute a file, and they interact with broader OS security models such as user accounts, groups, and discretionary access control.
I/O Management
Devices are heterogeneous: disks, SSDs, keyboards, GPUs, network interfaces, and sensors all behave differently. The OS standardizes access through device drivers and I/O abstractions.
Drivers and abstractions
Drivers translate generic OS requests into device-specific commands. The OS typically offers higher-level abstractions such as:
- Streams for sequential access
- Block devices for storage
- Sockets for networking
This design lets applications use consistent APIs while the OS handles device complexity.
Interrupts, buffering, and DMA
I/O is often asynchronous. Devices signal completion via interrupts, letting the CPU do other work while waiting. Performance depends on careful buffering and minimizing copying:
- Buffer caches reduce disk reads.
- Direct Memory Access (DMA) allows devices to transfer data to RAM without constant CPU involvement.
I/O management also intersects with scheduling. A system may prioritize threads waiting on I/O to improve perceived responsiveness, while ensuring background tasks still make progress.
Security Fundamentals
Operating systems enforce boundaries. Security is not a single feature but an outcome of design choices across process isolation, memory protection, authentication, and auditing.
The principle of least privilege
Applications and users should have only the permissions they need. OS features that support this include:
- User and group identities
- File permissions and access control checks
- Process privileges and protected system calls
Isolation and attack surface
Strong isolation comes from virtual memory protection, careful handling of kernel interfaces, and reducing the amount of code that runs with elevated privilege. Since the kernel is highly trusted, vulnerabilities in drivers, file systems, or system-call handling can be critical.
How the Pieces Fit Together
Operating systems are best understood as a set of trade-offs. More aggressive caching can speed up file access but increase memory pressure. Tighter isolation improves security but can increase overhead. Scheduling policies that reduce latency for interactive tasks may reduce throughput for batch jobs.
What remains constant is the OS mission: provide safe, efficient sharing of hardware. Understanding processes and threads, scheduling, virtual memory, synchronization, deadlocks, file systems, I/O management, and security fundamentals gives you the conceptual map needed to reason about performance problems, reliability issues, and design decisions in real systems.