Skip to content
Feb 26

Python Classes and Objects

MT
Mindli Team

AI-Generated Content

Python Classes and Objects

Mastering object-oriented programming (OOP) is a cornerstone of writing clean, maintainable, and scalable Python code, especially in data science. It transforms your code from a collection of scripts into a structured model of your problem domain, allowing you to encapsulate complex logic and data into reusable, intuitive components. This systematic approach is essential for building robust data pipelines, custom machine learning models, and simulation frameworks.

What Are Classes and Objects?

At its heart, OOP is about bundling related data and the functions that operate on that data into a single unit. In Python, a class is the blueprint or template for creating such units. It defines the structure—what attributes (data) and methods (functions) an object will have. An object is a specific, concrete instance created from that class. If a class is the architectural plan for a house, an object is the actual, built house you can live in. This relationship is fundamental: you define a class once, but you can create many unique objects from it, each with its own distinct data. For example, in data science, you might have a Dataset class that defines how data is loaded and cleaned; each object instantiated from it could represent a different CSV file or database table.

Defining Your First Class: Structure and __init__

You define a new class using the class keyword, followed by the class name (by convention, starting with a capital letter). The most important method you'll write inside a class is the __init__ method (pronounced "dunder init"). This special method is called automatically when a new object is created from the class. Its primary job is to initialize the object's attributes—the data that makes each instance unique.

The first parameter of every instance method, including __init__, is always self. The self parameter is a reference to the current instance of the class. It's how an instance accesses its own attributes and methods from within the class definition. When you call a method later, Python automatically passes the instance as the self argument, so you never explicitly provide it.

class DataColumn:
    def __init__(self, name, values):
        self.name = name    # Instance attribute
        self.values = values # Instance attribute
        self.mean = None    # Instance attribute, initialized

In this DataColumn example, __init__ takes three parameters: self, name, and values. The lines self.name = name create instance attributes, storing the data passed in (name, values) as part of the specific object being created.

The Instance Creation Process

Creating an instance, also called instantiation, is straightforward: you call the class name as if it were a function. When you write my_column = DataColumn("Temperatures", [72, 68, 75]), Python executes a specific sequence:

  1. It creates a new, empty object of type DataColumn.
  2. It automatically calls the __init__ method for this new object, passing the arguments you provided ("Temperatures", [72, 68, 75]) to the parameters after self.
  3. Inside __init__, the attributes (self.name, etc.) are bound to the newly created object.
  4. Finally, the now-initialized object is returned and assigned to the variable my_column.

This my_column variable now holds a reference to a fully formed DataColumn object, with its own personal copy of the data. You could create a second, entirely separate object with other_column = DataColumn("Pressures", [1013, 1015, 1012]). Both my_column and other_column are instances of the same class but contain different data.

Adding Behavior with Methods

Attributes hold data; methods define behavior. Methods are simply functions defined inside a class that operate on the object's data. They must have self as their first parameter so they can access the instance's attributes and other methods.

class DataColumn:
    def __init__(self, name, values):
        self.name = name
        self.values = values
        self.mean = None

    def calculate_mean(self):  # Instance method
        """Calculates and stores the mean of the values."""
        if self.values:
            self.mean = sum(self.values) / len(self.values)
        return self.mean

    def describe(self):
        """Returns a formatted summary string."""
        mean = self.calculate_mean()  # Calls another instance method
        return f"Column '{self.name}': Mean = {mean:.2f}"

To call a method, you use dot notation on an instance: my_column.calculate_mean(). Notice you do not pass a value for self; Python does this for you, so my_column.calculate_mean() is translated to DataColumn.calculate_mean(my_column) behind the scenes. This allows the method to act on my_column's specific data (self.values).

Organizing Data and Behavior: A Data Science Scenario

The true power of classes lies in their ability to cohesively organize related data and behavior. Consider a more integrated data science example: a simple linear regression model.

class SimpleLinearRegressor:
    def __init__(self):
        self.slope = None
        self.intercept = None
        self.training_data = None

    def fit(self, X, y):
        """Calculates slope and intercept from training data."""
        n = len(X)
        mean_x = sum(X) / n
        mean_y = sum(y) / n
        numerator = sum((xi - mean_x) * (yi - mean_y) for xi, yi in zip(X, y))
        denominator = sum((xi - mean_x) ** 2 for xi in X)
        self.slope = numerator / denominator if denominator != 0 else 0
        self.intercept = mean_y - self.slope * mean_x
        self.training_data = (X, y)
        return self

    def predict(self, x_value):
        """Predicts y for a given x using the fitted model."""
        if self.slope is None:
            raise ValueError("Model must be fitted before prediction.")
        return self.slope * x_value + self.intercept

    def r_squared(self):
        """Calculates the coefficient of determination."""
        X, y = self.training_data
        y_pred = [self.predict(x) for x in X]
        ss_res = sum((yi - ypi) ** 2 for yi, ypi in zip(y, y_pred))
        y_mean = sum(y) / len(y)
        ss_tot = sum((yi - y_mean) ** 2 for yi in y)
        return 1 - (ss_res / ss_tot)

This class encapsulates the entire lifecycle of a model: initialization (__init__), training (fit), prediction (predict), and evaluation (r_squared). All related data (parameters slope, intercept, training_data) and the functions that use them live together. You can create multiple independent models (model_a, model_b), fit them to different datasets, and query their predictions without any risk of data collision—a clear, safe, and reusable structure.

Common Pitfalls

  1. Forgetting the self Parameter: The most common error is defining a method without self as the first parameter. If you define def predict(x_value):, Python will later call it as predict(instance, x_value), causing a TypeError about too many arguments. Always include self.
  1. Confusing Class and Instance Attributes: Variables defined directly under the class statement are class attributes, shared by all instances. Variables assigned with self. in __init__ or a method are instance attributes, unique to each object. Modifying a mutable class attribute (like a list) from one instance affects all others, which is rarely intended.
  1. Using Mutable Default Arguments in __init__: A dangerous pattern is def __init__(self, data=[]):. The default empty list is created once when the function is defined, not each time it's called. All instances using the default will share the same list object. The safe fix is def __init__(self, data=None): and then self.data = data if data is not None else [].
  1. Creating Instance Attributes Outside __init__ or Methods: While Python allows you to add attributes dynamically (e.g., obj.new_attr = 5), it leads to disorganized, hard-to-debug code. All instance attributes an object is expected to have should be initialized in __init__, even if set to None. This provides a clear "data contract" for your class.

Summary

  • A class is a blueprint defined with the class keyword, while an object is a specific instance created from that class. The __init__ method initializes each new object's attributes.
  • The self parameter is a reference to the current instance, allowing methods to access and modify the object's own data. It is automatically passed by Python when a method is called on an instance.
  • Instance attributes (self.attribute) store data unique to each object, and methods (functions with self as the first parameter) define the object's behaviors that operate on that data.
  • The process of instantiation involves calling the class name, which triggers __init__ to configure the new object. You access attributes and methods using dot notation on an instance.
  • The core value of classes is encapsulation—organizing related data and behavior into a single, coherent unit. This creates more logical, reusable, and manageable code structures, which is critical for complex data science workflows.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.