Python Variables and Data Types

Python's flexibility with data is one of its greatest strengths, especially in data science where you might work with everything from sensor readings and financial figures to text documents and categorical labels. Mastering how Python stores, labels, and manipulates different kinds of data is the essential first step toward writing effective and error-free code. This foundation allows you to clean datasets, perform calculations, and build models with confidence.

Variables: Labels for Data

In Python, a variable is not a storage container but a name, or label, attached to a piece of data (an object) in your computer's memory. Think of it like putting a sticky note on a box; the variable name is the note, and the object inside the box is the data. You perform variable assignment using a single equals sign (=), which binds the name on the left to the object on the right.

year = 2024
company_name = "DataCorp"

In this example, the name year is now a reference to the integer object 2024, and company_name references the string object "DataCorp". Python uses dynamic typing, meaning you don't declare a variable's type explicitly; the type is inferred from the object it currently references. This allows great flexibility, as the same variable name can be rebound to a different type of object later, though this is generally poor practice for code clarity.

Choosing good names is critical. Python naming conventions follow the snake_case style for variables and functions: use lowercase letters and separate words with underscores (e.g., customer_id, average_score). Names cannot start with a number, cannot contain spaces or most symbols, and should be descriptive.

Fundamental Data Types

Python has several built-in core data types for representing different kinds of information. Understanding their properties is key to avoiding bugs.

Integers (int): These are whole numbers, positive or negative, with no decimal point. They have unlimited precision in Python 3, meaning you can store very large numbers.

population = 8_000_000_000  # Underscores for readability
temperature = -5

Floating-Point Numbers (float): These represent real numbers and are written with a decimal point or using scientific notation. They are approximations due to how computers handle binary fractions, which can lead to precision errors in calculations.

pi_approx = 3.14159
avogadro_constant = 6.022e23  # Scientific notation for 6.022 x 10^23

Strings (str): Strings are sequences of characters used for text. You can create them with single quotes ('), double quotes ("), or triple quotes for multi-line strings. They are immutable, meaning you cannot change a string in-place; operations create new strings.

greeting = "Hello, World!"
multiline_sql = """
SELECT * FROM users
WHERE status = 'active';
"""
# Common data science operation: splitting a string into a list
columns = "date,revenue,expenses".split(',')

Booleans (bool): This type has only two possible objects: True and False. They are the result of comparison operations (==, >, <, in) and are fundamental for control flow (if/else) and filtering data.

is_complete = True
within_budget = actual_cost <= projected_cost

The NoneType (None): None is a special constant that represents the absence of a value. It is its own type (NoneType) and is often used to initialize variables or to indicate that a function doesn't explicitly return anything.

initial_value = None  # To be calculated later

Inspecting and Managing Types

Because Python uses dynamic typing, you often need to check what type of object a variable references. The type() function returns the type object of the given instance.

print(type(42))        # <class 'int'>
print(type(3.14))      # <class 'float'>
print(type("Hello"))   # <class 'str'>
value = 100
print(type(value))     # <class 'int'>

For more robust checks, especially when considering inheritance, use isinstance(). It checks if an object is an instance of a class or a tuple of classes.

num = 5.0
print(isinstance(num, float))  # True
print(isinstance(num, (int, float)))  # True, because it's a float

In data science, you frequently need to convert between types, a process called type casting. You use the type constructor functions: int(), float(), str(), bool().

# Reading numeric data from a text file gives strings
str_price = "29.99"
float_price = float(str_price)  # Convert to float for calculation

# Convert a float to an int (truncates toward zero)
orders = 17.7
print(int(orders))  # 17

# Useful truthiness conversions in filtering logic
print(bool(1))     # True
print(bool(0))     # False
print(bool("Hi"))  # True
print(bool(""))    # False
print(bool(None))  # False

Objects, References, and Memory

Understanding how variables reference objects in memory prevents subtle errors. When you assign a = 100, Python creates the int object 100 in memory and makes the name a refer to it. If you then assign b = a, you are not copying the object 100; you are creating a new name b that references the same object. For immutable types like integers, strings, and tuples, this distinction is often harmless because their value cannot change.

However, for mutable types like lists and dictionaries (common in data science), this behavior is critical.

list_a = [1, 2, 3]
list_b = list_a      # list_b is now another reference to the SAME list
list_b.append(4)
print(list_a)        # [1, 2, 3, 4] - list_a is also modified!

# To create an independent copy, you must explicitly copy it
import copy
list_c = [1, 2, 3]
list_d = list_c.copy()  # or list_d = list_c[:] for simple lists
list_d.append(4)
print(list_c)           # [1, 2, 3] - list_c is unchanged

You can check if two variables refer to the exact same object in memory using the is operator and the id() function.

x = [10, 20]
y = x
z = [10, 20]

print(x is y)   # True, same object
print(x is z)   # False, different objects with same value
print(id(x) == id(y))  # True

Common Pitfalls

Misunderstanding Mutable vs. Immutable Assignment: Assuming that assigning a list to a new variable creates a copy is a major source of bugs. As shown above, new_list = old_list creates a new reference, not a new list. For mutable objects, use .copy() or the copy module for nested structures.

Ignoring Float Precision: Comparing floats for exact equality (==) can fail due to tiny rounding errors inherent in their binary representation. Instead, check if the absolute difference is within a small tolerance (epsilon).

Incorrect for floats

if (0.1 + 0.2) == 0.3: # This evaluates to False!

Correct approach

epsilon = 1e-10 if abs((0.1 + 0.2) - 0.3) < epsilon: print("Essentially equal")

Incorrect Type Assumptions in Operations: Python does not automatically convert types during operations. Trying to concatenate a string and an integer with + raises a TypeError. You must perform explicit conversion.

age = 30

message = "I am " + age + " years old." # TypeError!

message = "I am " + str(age) + " years old." # Correct

Poor Variable Names: Using vague names like x, data, or temp makes code unreadable, especially in data science scripts you may revisit months later. A name like daily_revenue_list is far more informative than lst.

Summary

Variables are names that reference objects in memory, assigned with =. Python uses dynamic typing, so a variable's type is determined by the object it currently references.
The core data types include integers (int), floating-point numbers (float), text strings (str), boolean truth values (bool), and None to signify "no value".
Use type() to inspect an object's type and isinstance() for robust type checking. Convert between types explicitly using constructors like int(), str(), and float().
Variables hold references, not the data itself. For mutable objects like lists, assigning one variable to another creates a reference to the same object, not a copy. Use .copy() to create independent duplicates.
Avoid common errors by remembering float precision issues, performing explicit type conversion for operations, and choosing clear, descriptive snake_case names for your variables.

Python Variables and Data Types

Python Variables and Data Types

Variables: Labels for Data

Fundamental Data Types

Inspecting and Managing Types

Objects, References, and Memory

Common Pitfalls

Incorrect for floats

if (0.1 + 0.2) == 0.3: # This evaluates to False!

Correct approach

message = "I am " + age + " years old." # TypeError!

Summary

Write better notes with AI