A-Level Computer Science: Data Representation
A-Level Computer Science: Data Representation
Understanding how computers store and interpret all forms of data—from numbers and text to images and sound—is a foundational pillar of computer science. Without a precise system for representing information in binary form, the complex software and digital media we rely on would be impossible. This knowledge is not only critical for exam success but also essential for grasping how hardware interacts with software, debugging programs, and making informed decisions about data storage and transmission.
Number Systems and Conversion
At its core, a computer's processor and memory are vast collections of microscopic switches that can only be in one of two states: on or off. This is why the binary number system, or base-2, is the native language of all digital hardware. In binary, each digit is called a bit (binary digit), and it can only be a 0 or a 1. A group of 8 bits is universally known as a byte, the standard unit for measuring data.
We, however, are accustomed to the denary (or decimal) system, which is base-10, using digits 0-9. To bridge the human and machine worlds, you must master conversion between these bases. Converting from denary to binary involves repeated division by 2 and recording the remainders. For example, converting the denary number 45:
45 / 2 = 22 remainder 1 (LSB - Least Significant Bit)
22 / 2 = 11 remainder 0
11 / 2 = 5 remainder 1
5 / 2 = 2 remainder 1
2 / 2 = 1 remainder 0
1 / 2 = 0 remainder 1 (MSB - Most Significant Bit)Reading the remainders from bottom to top gives .
Working directly with long strings of binary is cumbersome for programmers. This is where the hexadecimal (base-16) system becomes invaluable. It uses digits 0-9 and letters A-F (representing 10-15). One hexadecimal digit neatly represents exactly four binary bits (a nibble), making it a compact and human-readable shorthand for binary. Converting between binary and hex is straightforward: group binary bits into sets of four from the right, then convert each group to its hex equivalent. The denary number 175, which is in binary, groups as 1010 (A) and 1111 (F), resulting in the hexadecimal value .
Binary Arithmetic and Representation of Negative Numbers
Computers must perform arithmetic on binary numbers. Binary addition follows simple rules: 0+0=0, 0+1=1, 1+0=1, and 1+1=0 with a carry of 1 to the next column. For example:
0011 (3 in denary)
+ 0110 (6 in denary)
----
1001 (9 in denary)Representing negative numbers is more complex. The most common method is two's complement. To represent a negative number:
- Write the positive binary equivalent for the given number of bits (e.g., 8 bits).
- Invert all the bits (change 0s to 1s and 1s to 0s). This is the ones' complement.
- Add 1 to the result.
To represent -6 in 8-bit two's complement:
- +6 is 00000110.
- Invert: 11111001.
- Add 1: 11111010.
The major advantage of two's complement is that subtraction can be performed by adding the two's complement of the subtrahend. To calculate 9 - 6 (9 + (-6)), you add 00001001 (9) to 11111010 (-6). The result is (1)00000011. The leading 1 is an extra carry bit that is discarded in a fixed number of bits, leaving 00000011 (3), which is correct.
A critical concept here is overflow. This occurs when the result of a calculation exceeds the range that can be represented with the given number of bits. In two's complement, overflow is detected when adding two positive numbers yields a negative result, or adding two negatives yields a positive result.
Representing Real Numbers: Fixed and Floating Point
Integers are insufficient for scientific calculations or any application requiring fractional values. Two primary methods are used: fixed point and floating point.
Fixed point representation arbitrarily places a virtual binary point within a fixed number of bits. For example, in an 8-bit number, we might designate the first 5 bits for the integer part and the last 3 bits for the fractional part. The bit pattern 00101101 would then be interpreted as = 5.625 in denary. While simple, it suffers from a severely limited range of representable numbers.
Floating point representation solves this by using a system similar to scientific notation, storing a number as a significand (the significant digits) and an exponent. The general form is . In a binary system, the base is 2. The common standard is IEEE 754, which for a 32-bit (single precision) number allocates 1 bit for the sign, 8 bits for the exponent (in excess-127 notation), and 23 bits for the fractional part of the significand (with an implied leading 1). This allows representation of a vast range of numbers, from very small to very large, but with variable precision. A key trade-off is that some numbers (like 0.1 in denary) cannot be represented exactly, leading to tiny rounding errors in calculations.
Representing Text, Images, and Sound
For text, each character must be assigned a unique binary code. ASCII (American Standard Code for Information Interchange) is a 7-bit code, allowing 128 unique characters, sufficient for English letters, digits, and control codes. An extended 8-bit version allows for 256 characters, accommodating some additional symbols. However, ASCII cannot represent the vast array of global scripts.
Unicode is the universal character encoding standard designed to overcome this limitation. It defines a unique code point (a number) for every character across all major writing systems. UTF-8 is a variable-length encoding scheme for Unicode that is backwards compatible with ASCII; it uses 1 byte for ASCII characters and up to 4 bytes for others, making it highly efficient for web and storage.
Bitmap images are represented as a grid of picture elements, or pixels. Each pixel's colour is stored as a binary number. The number of bits used per pixel defines the colour depth. A 1-bit depth gives 2 colours (monochrome), 8-bit gives 256 colours, and 24-bit (True Colour) gives over 16 million colours (8 bits for each Red, Green, and Blue channel). The resolution of an image is the total number of pixels, expressed as width x height (e.g., 1920 x 1080). The file size of an uncompressed bitmap can be calculated as: For a 1920x1080 image with 24-bit colour: bits, or roughly 6 MB.
Sound is an analogue wave. To store it digitally, it must be sampled. The process involves taking regular measurements of the sound wave's amplitude. The sampling rate is the number of samples taken per second, measured in Hertz (Hz). A higher sampling rate (e.g., 44.1 kHz for CD audio) captures more detail and allows for higher reproducible frequencies. The sample resolution (or bit depth) is the number of bits used to store the amplitude of each sample. A higher resolution (e.g., 16-bit) provides a more accurate amplitude measurement, improving dynamic range and reducing quantisation error (the difference between the actual analogue amplitude and its digital representation). The file size for uncompressed audio is: For one minute of stereo CD-quality audio: bits, or about 10.1 MB.
Common Pitfalls
- Two's Complement Sign Confusion: A common error is misidentifying whether a binary number is positive or negative. Remember: in two's complement, if the most significant bit (MSB) is 1, the number is negative. You must then convert it back to denary using the two's complement process to find its value. Simply converting it as a standard positive binary number will give an incorrect positive value.
- Ignoring Overflow: In exam questions, always check for overflow after a binary addition or subtraction, especially when using a specified number of bits. Stating a result without acknowledging that overflow has occurred, making the result invalid, will lose marks.
- Miscalculating Bitmap File Size: Students often forget to multiply by the colour depth in bits, not bytes, and may omit the final division by 8 to convert the answer from bits to bytes. Always write down the formula: (Width x Height x Colour Depth) / 8 = file size in bytes.
- Confusing Encoding Schemes: Do not state that "Unicode is an encoding." Unicode defines code points. UTF-8, UTF-16, and UTF-32 are different encoding schemes that map those code points to specific byte sequences. Understanding this distinction is crucial for higher-mark questions.
Summary
- All data inside a computer is ultimately represented in binary (base-2). Hexadecimal (base-16) provides a compact, human-readable format for binary values.
- Two's complement is the standard method for representing signed integers, allowing subtraction to be performed through addition. Overflow must be checked when performing arithmetic within a fixed number of bits.
- Floating point representation (e.g., IEEE 754) allows computers to handle real numbers with a wide range, using a significand and exponent, at the cost of potential rounding errors.
- Text is encoded using standards like ASCII (limited) and Unicode (universal), with UTF-8 being a dominant, efficient encoding scheme.
- Bitmap images are defined by their resolution (pixel dimensions) and colour depth (bits per pixel), which directly determine visual quality and file size.
- Sound is digitized through sampling, where the sampling rate (frequency of samples) and bit depth (accuracy of each sample) define the quality and size of an audio file.