Python Unit Testing with Pytest

Robust code is the foundation of reliable data science. Whether you're building a complex machine learning pipeline or a simple data cleaning script, unit testing—the practice of testing individual units of code in isolation—is what separates a fragile prototype from a production-ready system. Pytest has become the de facto testing framework in Python due to its simplicity, powerful features, and extensive plugin ecosystem.

Core Concepts and Test Structure

At its heart, pytest is elegantly simple. You write ordinary Python functions that use the assert statement to verify expectations. Pytest automatically discovers and runs these functions. A test function is any function whose name starts with test_, and a test file is any file whose name starts with test_ or ends with _test.py. This convention-based test discovery means you don't need to manually register tests.

Consider a basic data validation function and its test:

# data_utils.py
def is_valid_percentage(value):
    """Returns True if value is a number between 0 and 100 inclusive."""
    return isinstance(value, (int, float)) and 0 <= value <= 100

# test_data_utils.py
def test_is_valid_percentage_true_for_midrange():
    assert is_valid_percentage(50) == True

def test_is_valid_percentage_false_for_negative():
    assert is_valid_percentage(-5) == False

When you run pytest in your terminal, it will search for these files, execute the test functions, and report whether the assert statements passed or failed. If an assertion fails, pytest provides a detailed, readable diff of the expected versus actual result, which is invaluable for debugging.

To organize many tests, you can structure them into classes (named Test*) or simply keep them as functions in modules. A logical structure for a data science project might separate tests for data loading, feature engineering, and model evaluation into different files within a tests/ directory, mirroring your source code's structure.

Leveraging Fixtures for Setup and Teardown

In data science, tests often require common setup: loading a sample dataset, establishing a database connection, or instantiating a model. Repeating this code in every test is wasteful and error-prone. Pytest fixtures solve this. A fixture is a function you define and decorate with @pytest.fixture. Test functions can then request this fixture by name as an argument, and pytest will inject the returned value.

Fixtures manage the setup/teardown lifecycle. The code before the yield statement is setup; the code after is teardown, which runs after the test completes, regardless of pass/fail. This is perfect for ensuring clean-up, like closing file handles or deleting temporary data.

import pytest
import pandas as pd
import tempfile
import os

@pytest.fixture
def sample_dataframe():
    # Setup: Create a DataFrame
    df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
    yield df  # Provide the value to the test
    # Teardown: Any cleanup would go here. In this case, it's not needed.
    print("Test completed.")

@pytest.fixture
def temp_csv_file():
    # Creates a temporary file, provides the path, and deletes it after.
    with tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False) as f:
        f.write('col1,col2\n1,2\n3,4')
        temp_path = f.name
    yield temp_path
    os.unlink(temp_path)  # Teardown: Delete the file

def test_dataframe_shape(sample_dataframe):
    assert sample_dataframe.shape == (3, 2)

def test_read_temp_csv(temp_csv_file):
    df = pd.read_csv(temp_csv_file)
    assert 'col1' in df.columns

By using fixtures, your tests become concise, focused on the logic being tested, and reliable because teardown is guaranteed.

Parametrization and Efficient Test Design

You frequently need to test a function with many different input and expected output combinations. Writing a separate test for each case is tedious. Parametrize tests using the @pytest.mark.parametrize decorator allows you to define these combinations in one place. Pytest will then expand them into multiple, independent test runs. This is exceptionally useful for testing data functions with edge cases, different data types, or error conditions.

import pytest

@pytest.mark.parametrize("input_value, expected_output", [
    (0, True),      # Lower bound
    (100, True),    # Upper bound
    (50.5, True),   # Float within range
    (-0.1, False),  # Just below lower bound
    (100.1, False), # Just above upper bound
    ("50", False),  # Wrong type (string)
    (None, False),  # Wrong type (None)
])
def test_is_valid_percentage_parametrized(input_value, expected_output):
    assert is_valid_percentage(input_value) == expected_output

When run, this generates seven separate test items. If one fails, the others still run, and the report clearly shows which parameter set caused the failure. For data science, this is ideal for validating data cleaning functions against a comprehensive suite of dirty or edge-case inputs.

Isolating Tests with Mocking

Data science code often interacts with external dependencies: APIs, databases, cloud storage, or large pretrained models. Testing these directly can be slow, unreliable, or expensive. Mocking (or patching) replaces these real dependencies with simulated objects during a test. The unittest.mock module (or the pytest-mock plugin) is used for this purpose. A mock object can be programmed to return specific values, raise exceptions, and track how it was called.

Imagine a function that fetches data from an external API:

# data_fetcher.py
import requests

def fetch_user_data(user_id):
    response = requests.get(f'https://api.example.com/users/{user_id}')
    response.raise_for_status()
    return response.json()

Testing this without hitting the real network is crucial. You can mock the requests.get call.

# test_data_fetcher.py
import pytest
from unittest.mock import Mock, patch
from data_fetcher import fetch_user_data

def test_fetch_user_data_success(mocker):  # 'mocker' fixture from pytest-mock
    # 1. Create a mock response object
    mock_response = Mock()
    mock_response.json.return_value = {'id': 123, 'name': 'Alice'}
    mock_response.raise_for_status = Mock()  # A no-op method

    # 2. Patch 'requests.get' to return our mock response
    mock_get = mocker.patch('data_fetcher.requests.get')
    mock_get.return_value = mock_response

    # 3. Call the function under test
    result = fetch_user_data(123)

    # 4. Assert the function behaves correctly
    assert result == {'id': 123, 'name': 'Alice'}
    mock_get.assert_called_once_with('https://api.example.com/users/123')
    mock_response.raise_for_status.assert_called_once()

Mocking allows you to test the logic of your function—how it processes the returned data and handles errors—without the complexity of the external world. You can also mock expensive model predictions to speed up tests of surrounding pipeline code.

Workflow Integration: Markers, Coverage, and TDD

As your test suite grows, you need tools to manage it. Pytest markers (@pytest.mark.*) are decorators that let you tag tests for selective execution. Common built-in markers include @pytest.mark.skip (skip a test) and @pytest.mark.xfail (expect a test to fail). You can define custom markers like @pytest.mark.slow for tests that require heavy computation or @pytest.mark.integration for tests that require external services. You can then run a subset: pytest -m "not slow".

To measure the effectiveness of your tests, use coverage reporting. The pytest-cov plugin integrates with pytest to measure which lines of your source code are executed during the test run. Run your tests with pytest --cov=my_project to see a coverage report. Aiming for high coverage (e.g., >90%) encourages you to write tests for edge cases and error handlers that are often overlooked in data scripts.

All these tools support a Test-Driven Development (TDD) workflow: 1) Write a failing test for a new feature, 2) Write the minimal code to make the test pass, 3) Refactor the code while keeping tests green. For data science, this might mean writing a test that defines the expected output shape of a new feature engineering function before you implement it, ensuring the contract of your data pipeline is always clear and verified.

Common Pitfalls

Testing Implementation, Not Behavior: A common mistake is writing tests that break when you refactor internal code, even if the external behavior remains correct. Your tests should assert what the function does (e.g., "it returns a sorted list"), not how it does it (e.g., "it calls quicksort()"). This makes your codebase more flexible.
Insufficient Isolation with Mocks: Over-mocking can make tests brittle. If you mock too many internal components, your test no longer validates how they work together. Mock at the "seam"—the boundary where your code meets an external, unstable, or slow dependency (like an API, database, or filesystem). Avoid mocking your own internal helper functions.
Ignoring Non-Happy Paths: Data science is messy. Don't just test with clean, perfect data. Use parametrization to test how your functions handle NaN values, empty DataFrames, incorrect data types, and missing files. Tests for error handling (using pytest.raises(ExpectedException)) are just as important as tests for success.
Slow, Unmaintainable Test Suites: If tests are slow (e.g., loading large datasets repeatedly) or difficult to read, they will not be run. Use fixtures strategically to share expensive resources. Keep tests small and focused on one scenario. Clear test function names, like test_clean_data_removes_duplicates, serve as documentation.

Summary

Pytest simplifies test writing with a convention-over-configuration approach: name your test files and functions with test_, and use plain assert statements for verification.
Fixtures are the primary tool for managing test setup/teardown, providing a clean, reusable way to prepare the state (like sample data) required for your tests.
Parametrization via @pytest.mark.parametrize allows you to run the same test logic with multiple input/output combinations, making it efficient to test edge cases and data variations.
Use mocking (with unittest.mock) to replace external dependencies like APIs and databases, isolating the unit of code you intend to test and making your tests fast and reliable.
Integrate testing into your daily workflow using markers to organize tests, coverage reporting to gauge completeness, and embrace a Test-Driven Development mindset to ensure reliability from the first line of code you write in a data project.

Python Unit Testing with Pytest

Python Unit Testing with Pytest

Core Concepts and Test Structure

Leveraging Fixtures for Setup and Teardown

Parametrization and Efficient Test Design

Isolating Tests with Mocking

Workflow Integration: Markers, Coverage, and TDD

Common Pitfalls

Summary

Write better notes with AI