Skip to content
Feb 25

Number Systems and Binary Arithmetic

MT
Mindli Team

AI-Generated Content

Number Systems and Binary Arithmetic

All modern digital computers operate using just two symbols: 0 and 1. Understanding how these symbols form numbers and how to perform calculations with them is the foundational language of computer engineering and digital logic design.

Positional Number Systems and Base Conversions

A positional number system is one where the value of a digit depends on its position within the number. The base, or radix, of a system defines how many unique digits are available. The decimal system we use daily is base-10 (radix 10). Digital systems use binary (base-2), where each digit is called a bit. Working directly with long strings of bits is cumbersome, so engineers use octal (base-8) and hexadecimal (base-16) as compact shorthand notations. One hexadecimal digit perfectly represents a group of four bits (a nibble), and one octal digit represents three bits.

Converting between bases is a core skill. To convert a binary number like to decimal, you sum the powers of two: . Converting from decimal to another base involves repeated division. To convert to binary, you repeatedly divide by 2 and record the remainders from bottom to top:

Reading the remainders upwards gives . Conversion to/from hexadecimal and octal is often done via binary. Group binary digits from the right: becomes , which is in hex, or , which is in octal.

Binary Arithmetic Operations

Arithmetic in binary follows the same logical rules as decimal arithmetic, but with a simpler digit set.

Addition rules are: , , , and with a carry of 1 to the next more significant column. Adding (6) and (7) proceeds column by column from the right:

  1. LSB: , no carry.
  2. Next: , carry 1.
  3. Next: (carry) + , carry 1.
  4. MSB: (carry) + .

The result is , which is 13.

Subtraction can be performed directly, but it's more efficient in digital circuits to use addition via the two's complement method, covered next. Direct subtraction uses the rules: , , , and with a borrow from the next column.

Multiplication in binary is straightforward because the multiplier digits are only 0 or 1. It simplifies to shifting and adding. Multiplying (13) by (11) involves creating partial products for each bit of the multiplier and summing them with appropriate left shifts.

The result, , equals 143 in decimal.

Representing Signed Integers: Two's Complement

To represent negative numbers, digital systems almost universally use the two's complement representation. For an n-bit number, the range of representable values is from to . For example, an 8-bit two's complement number can range from -128 to +127.

To find the two's complement of a number (i.e., its negative equivalent), you invert all the bits (find the ones' complement) and then add 1 to the least significant bit. To represent in 4-bit two's complement:

  1. Start with in binary: .
  2. Invert the bits: (this is the ones' complement).
  3. Add 1: .

Thus, is in 4-bit two's complement.

The beauty of this system is that subtraction is performed as . Using our 4-bit example, to calculate ():

  1. Find two's complement of (5): (-5).
  2. Add: .
  3. Ignore the extra carry-out bit beyond the 4-bit width. The remaining 4-bit result is , which is —the correct answer.

Detecting Overflow in Signed Arithmetic

Overflow occurs when the result of an arithmetic operation exceeds the range that can be represented by the fixed number of bits. In two's complement arithmetic, overflow is detected by examining the carries into and out of the most significant bit (the sign bit).

The rule is: Overflow occurs if the carry into the sign bit is different from the carry out of the sign bit. Consider adding two 4-bit positive numbers, (7) and (3):

The carry into the sign bit (the leftmost column) is 1, and the carry out is 0. They differ, indicating overflow. The result, , is interpreted as -6 in two's complement, which is clearly wrong for adding two positive numbers. The hardware would flag this error.

Fixed-Point Number Representation

Computers often need to represent fractional numbers without using complex floating-point units. Fixed-point representation does this by implicitly defining a binary point at a fixed location within a binary word. It treats numbers as integers but assigns a scaling factor. For example, in an 8-bit format where 4 bits are for the integer part and 4 bits for the fractional part (Q4.4 format), the binary number is interpreted as: .

Arithmetic with fixed-point numbers uses the same integer operations, but you must track the binary point. After multiplication, the product has twice the number of fractional bits, so you often need to shift and truncate or round the result to fit the desired format. This representation is efficient for applications like digital signal processing where the range and precision are predictable.

Common Pitfalls

  1. Confusing One's and Two's Complement: A frequent error is forgetting the final "+1" when calculating two's complement. Remember, one's complement (bitwise inversion) is only the intermediate step. The correct negative representation requires adding one to that result.
  2. Overflow vs. Carry-Out Confusion: In unsigned arithmetic, a carry-out from the most significant bit indicates the result is too large. In signed (two's complement) arithmetic, you must use the rule comparing the carry-in and carry-out of the sign bit. Mistaking a simple carry-out for a signed overflow will lead to incorrect error detection.
  3. Sign Extension Errors: When moving a two's complement number to a wider bit field (e.g., from 8 to 16 bits), you must sign-extend by replicating the sign bit (the MSB). Incorrectly extending with zeros will change a negative number into a positive one.
  4. Misplacing the Binary Point in Fixed-Point Arithmetic: After adding fixed-point numbers, the binary point stays aligned. After multiplying, the number of fractional bits doubles. Forgetting to scale (shift) the result back to the original format leads to magnitude errors by a power of two.

Summary

  • Digital systems rely on binary (base-2) numbers, with hexadecimal and octal providing compact human-readable notations for binary strings.
  • Binary arithmetic for addition, subtraction, and multiplication follows logical rules, with subtraction often implemented via two's complement addition.
  • The two's complement system is the standard for representing signed integers, allowing subtraction to be performed with the same hardware as addition.
  • Overflow in signed arithmetic is a critical error condition detected by comparing the carry into and out of the sign bit, distinct from the carry-out used in unsigned arithmetic.
  • Fixed-point representation allows efficient handling of fractional numbers by assuming a fixed binary point position within a binary word, requiring careful scaling during operations.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.