pandas: Replace NaN (missing values) with fillna()

You can replace NaN in pandas.DataFrame and pandas.Series with any value using the fillna() method.

Contents

Replace NaN with the same value
Replace NaN with different values for each column
Replace NaN with mean, median, mode, etc., for each column
Replace NaN with previous/following valid values: method, limit
Update the original object: inplace
For pandas.Series

While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.

Missing values in pandas (nan, None, pd.NA)

To fill missing values with linear or spline interpolation, consider using the interpolate() method.

pandas: Interpolate NaN (missing values) with interpolate()

See the following article on extracting, removing, and counting missing values.

pandas: Find rows/columns with NaN (missing values)
pandas: Remove NaN (missing values) with dropna()
pandas: Detect and count NaN (missing values) with isnull(), isna()

The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values.

sample_pandas_normal_nan.csv

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

Replace `NaN` with the same value

By specifying the scalar value for the first argument value in fillna(), all NaN values are replaced with this value.

print(df.fillna(0))# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0

source: pandas_nan_fillna.py

Note that the data type (dtype) of a column of numbers including NaN is float, so even if you replace NaN with an integer number, the data type remains float. If you want to convert it to int, use astype().

pandas: How to use astype() to cast dtype of DataFrame

Replace `NaN` with different values for each column

By specifying a dictionary (dict) for the first argument value in fillna(), you can assign different values to each column.

You can specify a dictionary in the form {column_name: value}.

NaN in unspecified columns are not replaced and thus remain as they are. Furthermore, any key not matching a column name is simply ignored.

print(df.fillna({'name': 'XXX', 'age': 20, 'ZZZ': 100}))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

You can also specify Series. The labels of Series correspond to the key of dict.

s_for_fill = pd.Series(['XXX', 20, 100], index=['name', 'age', 'ZZZ'])print(s_for_fill)# name XXX# age 20# ZZZ 100# dtype: objectprint(df.fillna(s_for_fill))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

The mean() method can be used to calculate the mean of each column, returning a Series. NaN is excluded, but the result for a column where all elements are NaN is NaN. The numeric_only argument can be set to True to include only numeric columns.

pandas.DataFrame.mean — pandas 2.0.3 documentation

print(df.mean(numeric_only=True))# age 40.666667# point 79.000000# other NaN# dtype: float64

source: pandas_nan_fillna.py

If you specify this Series for the first argument value in fillna(), it replaces NaN in the relevant column with the mean.

print(df.fillna(df.mean(numeric_only=True)))# name age state point other# 0 Alice 24.000000 NY 79.0 NaN# 1 NaN 40.666667 NaN 79.0 NaN# 2 Charlie 40.666667 CA 79.0 NaN# 3 Dave 68.000000 TX 70.0 NaN# 4 Ellen 40.666667 CA 88.0 NaN# 5 Frank 30.000000 NaN 79.0 NaN

source: pandas_nan_fillna.py

Similarly, to replace NaN values with the median, use the median() method. If the number of elements is even, the average of the two median values is returned.

pandas.DataFrame.median — pandas 2.0.3 documentation

print(df.fillna(df.median(numeric_only=True)))# name age state point other# 0 Alice 24.0 NY 79.0 NaN# 1 NaN 30.0 NaN 79.0 NaN# 2 Charlie 30.0 CA 79.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN 79.0 NaN

source: pandas_nan_fillna.py

Replace `NaN` with previous/following valid values: `method`, `limit`

The method argument of fillna() can be used to replace NaN with previous/following valid values.

If method is set to 'ffill' or 'pad', NaN are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill', they are replaced with the following valid values (= backward fill).

print(df.fillna(method='ffill'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill'))# name age state point other# 0 Alice 24.0 NY 70.0 NaN# 1 Charlie 68.0 CA 70.0 NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

If the method argument is specified, as in the example above, all consecutive NaN will be replaced by default. The limit argument can be used to specify the maximum number of consecutive replacements.

print(df.fillna(method='ffill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

Although it might not be a common use case, you can set the axis argument to 1 or 'columns' to replace NaN with the values from the left and right.

print(df.fillna(method='ffill', axis=1))# name age state point other# 0 Alice 24.0 NY NY NY# 1 NaN NaN NaN NaN NaN# 2 Charlie Charlie CA CA CA# 3 Dave 68.0 TX 70.0 70.0# 4 Ellen Ellen CA 88.0 88.0# 5 Frank 30.0 30.0 30.0 30.0print(df.fillna(method='bfill', axis=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie CA CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen CA CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

Methods that correspond to the method argument are also provided individually.

ffill() is equivalent to fillna(method='ffill'), and bfill() is equivalent to fillna(method='bfill'). You can also specify limit.

print(df.ffill())# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.bfill(limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_fillna.py

pad() and backfill() are also provided, but have been deprecated since version 2.0.0.

Update the original object: `inplace`

By default, as shown above, a new object is returned without changing the original. However, if inplace=True, the original object will be updated in place.

df.fillna(0, inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0

source: pandas_nan_fillna.py

For `pandas.Series`

As demonstrated in the previous DataFrame examples, you can also apply fillna() to Series.

s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.fillna(100))# 0 24.0# 1 100.0# 2 100.0# 3 68.0# 4 100.0# 5 30.0# Name: age, dtype: float64print(s.fillna({1: 100, 4: -100}))# 0 24.0# 1 100.0# 2 NaN# 3 68.0# 4 -100.0# 5 30.0# Name: age, dtype: float64print(s.fillna(method='bfill', limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64

source: pandas_nan_fillna.py

Methods that correspond to the method argument are also provided individually for Series.

print(s.bfill(limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64

source: pandas_nan_fillna.py

pad() and backfill() are also provided, but have been deprecated since version 2.0.0.

pandas: Replace NaN (missing values) with fillna() | note.nkmk.me (2024)

Replace NaN with the same value

Replace NaN with different values for each column

Replace NaN with previous/following valid values: method, limit

Update the original object: inplace

For pandas.Series

Replace `NaN` with the same value

Replace `NaN` with different values for each column

Replace `NaN` with previous/following valid values: `method`, `limit`

Update the original object: `inplace`

For `pandas.Series`