Pandas Indexing with loc and iloc

Efficient data manipulation starts with precise selection. In Pandas, your primary tools for selecting and assigning data are the loc and iloc indexers. Mastering their distinct behaviors is not just a syntax detail; it is fundamental to writing clean, fast, and error-free data analysis code, enabling you to slice through datasets with intention and clarity.

Understanding the Core Distinction: Labels vs. Positions

At the heart of Pandas indexing lies a crucial conceptual split: label-based versus integer position-based selection. This distinction is embodied in the two main accessors.

The .loc[] indexer is used for label-based indexing. This means you select data based on the index and column labels. Labels can be integers, strings, or even datetime objects. The key principle is that .loc is inclusive of the last element in a slice when using labels.

Conversely, the .iloc[] indexer is used for integer position-based indexing. You select data based on the integer position (i.e., 0, 1, 2, ...) in the DataFrame or Series. This follows Python and NumPy slicing conventions, where the end of a slice is exclusive. It is purely integer-based and ignores the actual index labels.

Consider a simple DataFrame:

import pandas as pd
data = {'A': [10, 20, 30, 40], 'B': [50, 60, 70, 80]}
df = pd.DataFrame(data, index=['x', 'y', 'z', 'w'])

To select the row with label 'y' using .loc, you would write df.loc['y']. To select the second row (position 1) using .iloc, you write df.iloc[1]. Both return the same data, but the logic behind the selection is fundamentally different.

Basic Selection and Slicing with Single Axes

Both indexers allow you to select rows, columns, or specific cells. The general syntax is df.loc[row_selection, column_selection] and df.iloc[row_selection, column_selection]. You can omit the column selection to get all columns for the chosen rows.

For single row selection, you provide a single label (df.loc['z']) or a single integer position (df.iloc[2]). For single column selection, you must use a column label with .loc (df.loc[:, 'A']) or a column integer position with .iloc (df.iloc[:, 0]). The colon : by itself on an axis means "select all."

Slicing demonstrates the inclusive/exclusive difference most clearly. With .loc, df.loc['y':'z'] selects rows with labels 'y' and 'z'. With .iloc, df.iloc[1:3] selects rows at integer positions 1 and 2 (it excludes position 3). This mirroring of Python's list slicing makes .iloc intuitive for programmers.

Boolean and Multi-Axis Selection

One of the most powerful features of .loc is boolean indexing. You can pass a boolean Series or list to select rows (or columns) where the condition is True. For example, df.loc[df['A'] > 25] selects all rows where the value in column 'A' exceeds 25. This is a label-based operation, so the returned DataFrame retains the original index labels of the selected rows. While .iloc can accept a boolean list, it is less commonly used in this way, as .loc is the standard and more expressive tool for conditional selection.

Multi-axis selection allows you to pinpoint a specific subset. You specify both the row and column criteria within the same square brackets. With .loc, you use labels: df.loc[['x', 'w'], 'B'] selects the 'B' column values for rows 'x' and 'w'. With .iloc, you use integer positions: df.iloc[[0, 3], 1] selects the second column (position 1) for the first and fourth rows (positions 0 and 3). You can mix slices and lists: df.loc['x':'z', ['A']] selects column 'A' for rows from 'x' through 'z'.

Setting Values Using loc and iloc

These indexers are not just for viewing data; they are the recommended way to modify your DataFrame. Assignment works by selecting the target cells and using the assignment operator (=). This method is efficient and avoids the potential pitfalls of "chained indexing."

For instance, to set all values in column 'A' where column 'B' is greater than 65 to 99, you would write:

df.loc[df['B'] > 65, 'A'] = 99

This uses .loc for label-based boolean indexing on the rows and label-based selection on the column. Similarly, you can use .iloc for position-based assignment: df.iloc[0:2, 1] = -1 sets the first two rows of the second column to -1.

Scalar Access with at[] and iat[]

For accessing or setting a single scalar value (one cell), Pandas offers even faster specialized accessors: .at[] and .iat[]. They function like ultra-fast versions of .loc and .iloc, respectively, but they can only select one cell at a time.

Use .at for label-based scalar access: df.at['y', 'A'] returns the value 20. Use .iat for integer position-based scalar access: df.iat[1, 0] also returns 20. Their syntax is simpler (df.at[row_label, col_label]) and their execution speed is significantly higher for repeated operations on individual cells, making them ideal for focused loops or updates. However, for any selection involving more than one cell, you should stick with .loc and .iloc.

Common Pitfalls

Confusing Inclusive and Exclusive Slices: The most frequent error is forgetting that .loc slicing is label-inclusive while .iloc slicing is position-exclusive. df.loc['a':'c'] includes row 'c'. df.iloc[0:3] includes rows at positions 0, 1, and 2, but not position 3. Always verify which indexer you are using.
Using Integer Index Labels with .iloc: If your DataFrame has an integer index (e.g., 10, 20, 30), df.iloc[0] correctly gets the first row. However, a beginner might incorrectly try df.loc[0], which would fail unless 0 is actually a label in the index. Remember: .iloc looks at position; .loc looks at the index label, even if that label is a number.
Chained Indexing for Assignment: Avoid syntax like df['A'][df['B'] > 65] = 99. This is called chained indexing (two successive bracket operations) and may work but can lead to unpredictable SettingWithCopyWarning errors or fail to modify the original DataFrame. The correct, idiomatic approach is to use .loc for the combined selection and assignment in one step: df.loc[df['B'] > 65, 'A'] = 99.
Overusing .at and .iat for Non-Scalar Operations: Remember that .at and .iat are strictly for single-cell access. Attempting to use them for a slice, like df.at['x':'z', 'A'], will raise an error. For any multi-cell selection, default to .loc or .iloc.

Summary

.loc[] is label-based and inclusive in slices, while .iloc[] is integer position-based and exclusive in slices, following Python's standard slicing rules.
Both indexers support single item selection, slicing, boolean indexing, and multi-axis selection (rows and columns simultaneously), and they are the primary tools for assigning new values to a DataFrame.
Boolean indexing is most naturally performed with .loc, allowing you to filter rows based on column conditions.
For accessing or setting a single, specific cell, use the faster specialized accessors .at[] (label-based) and .iat[] (position-based).
Always prefer a single, combined selection with .loc or .iloc for assignment to avoid the pitfalls of chained indexing and ensure your code is efficient and reliable.

Pandas Indexing with loc and iloc

Pandas Indexing with loc and iloc

Understanding the Core Distinction: Labels vs. Positions

Basic Selection and Slicing with Single Axes

Boolean and Multi-Axis Selection

Setting Values Using loc and iloc

Scalar Access with at[] and iat[]

Common Pitfalls

Summary

Write better notes with AI