You can replace NaN
in pandas.DataFrame
and pandas.Series
with any value using the fillna()
method.
- pandas.DataFrame.fillna — pandas 2.0.3 documentation
- pandas.Series.fillna — pandas 2.0.3 documentation
Contents
- Replace NaN with the same value
- Replace NaN with different values for each column
- Replace NaN with mean, median, mode, etc., for each column
- Replace NaN with previous/following valid values: method, limit
- Update the original object: inplace
- For pandas.Series
While this article primarily deals with NaN
(Not a Number), it's important to note that in pandas, None
is also treated as a missing value.
- Missing values in pandas (nan, None, pd.NA)
To fill missing values with linear or spline interpolation, consider using the interpolate()
method.
- pandas: Interpolate NaN (missing values) with interpolate()
See the following article on extracting, removing, and counting missing values.
- pandas: Find rows/columns with NaN (missing values)
- pandas: Remove NaN (missing values) with dropna()
- pandas: Detect and count NaN (missing values) with isnull(), isna()
The sample code in this article uses pandas version 2.0.3
. As an example, read a CSV file with missing values.
import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
Replace NaN
with the same value
By specifying the scalar value for the first argument value
in fillna()
, all NaN
values are replaced with this value.
print(df.fillna(0))# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0
source: pandas_nan_fillna.py
Note that the data type (dtype
) of a column of numbers including NaN
is float
, so even if you replace NaN
with an integer number, the data type remains float
. If you want to convert it to int
, use astype()
.
- pandas: How to use astype() to cast dtype of DataFrame
Replace NaN
with different values for each column
By specifying a dictionary (dict
) for the first argument value
in fillna()
, you can assign different values to each column.
You can specify a dictionary in the form {column_name: value}
.
NaN
in unspecified columns are not replaced and thus remain as they are. Furthermore, any key not matching a column name is simply ignored.
print(df.fillna({'name': 'XXX', 'age': 20, 'ZZZ': 100}))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
You can also specify Series
. The labels of Series
correspond to the key of dict
.
s_for_fill = pd.Series(['XXX', 20, 100], index=['name', 'age', 'ZZZ'])print(s_for_fill)# name XXX# age 20# ZZZ 100# dtype: objectprint(df.fillna(s_for_fill))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
The mean()
method can be used to calculate the mean of each column, returning a Series
. NaN
is excluded, but the result for a column where all elements are NaN
is NaN
. The numeric_only
argument can be set to True
to include only numeric columns.
print(df.mean(numeric_only=True))# age 40.666667# point 79.000000# other NaN# dtype: float64
source: pandas_nan_fillna.py
If you specify this Series
for the first argument value
in fillna()
, it replaces NaN
in the relevant column with the mean.
print(df.fillna(df.mean(numeric_only=True)))# name age state point other# 0 Alice 24.000000 NY 79.0 NaN# 1 NaN 40.666667 NaN 79.0 NaN# 2 Charlie 40.666667 CA 79.0 NaN# 3 Dave 68.000000 TX 70.0 NaN# 4 Ellen 40.666667 CA 88.0 NaN# 5 Frank 30.000000 NaN 79.0 NaN
source: pandas_nan_fillna.py
Similarly, to replace NaN
values with the median, use the median()
method. If the number of elements is even, the average of the two median values is returned.
print(df.fillna(df.median(numeric_only=True)))# name age state point other# 0 Alice 24.0 NY 79.0 NaN# 1 NaN 30.0 NaN 79.0 NaN# 2 Charlie 30.0 CA 79.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN 79.0 NaN
source: pandas_nan_fillna.py
The mode can be obtained with the mode()
method. Since mode()
returns a DataFrame
, and in this example, iloc[0]
is used to retrieve the first row as a Series
. Please note that mode()
can also handle strings.
print(df.fillna(df.mode().iloc[0]))# name age state point other# 0 Alice 24.0 NY 70.0 NaN# 1 Alice 24.0 CA 70.0 NaN# 2 Charlie 24.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 24.0 CA 88.0 NaN# 5 Frank 30.0 CA 70.0 NaN
source: pandas_nan_fillna.py
Replace NaN
with previous/following valid values: method
, limit
The method
argument of fillna()
can be used to replace NaN
with previous/following valid values.
If method
is set to 'ffill'
or 'pad'
, NaN
are replaced with previous valid values (= forward fill), and if 'bfill'
or 'backfill'
, they are replaced with the following valid values (= backward fill).
print(df.fillna(method='ffill'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill'))# name age state point other# 0 Alice 24.0 NY 70.0 NaN# 1 Charlie 68.0 CA 70.0 NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
If the method
argument is specified, as in the example above, all consecutive NaN
will be replaced by default. The limit
argument can be used to specify the maximum number of consecutive replacements.
print(df.fillna(method='ffill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
Although it might not be a common use case, you can set the axis
argument to 1
or 'columns'
to replace NaN
with the values from the left and right.
print(df.fillna(method='ffill', axis=1))# name age state point other# 0 Alice 24.0 NY NY NY# 1 NaN NaN NaN NaN NaN# 2 Charlie Charlie CA CA CA# 3 Dave 68.0 TX 70.0 70.0# 4 Ellen Ellen CA 88.0 88.0# 5 Frank 30.0 30.0 30.0 30.0print(df.fillna(method='bfill', axis=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie CA CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen CA CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
Methods that correspond to the method
argument are also provided individually.
- pandas.DataFrame.ffill — pandas 2.0.3 documentation
- pandas.DataFrame.bfill — pandas 2.0.3 documentation
ffill()
is equivalent to fillna(method='ffill')
, and bfill()
is equivalent to fillna(method='bfill')
. You can also specify limit
.
print(df.ffill())# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.bfill(limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN
source: pandas_nan_fillna.py
pad()
and backfill()
are also provided, but have been deprecated since version 2.0.0
.
- pandas.DataFrame.pad — pandas 2.0.3 documentation
- pandas.DataFrame.backfill — pandas 2.0.3 documentation
Update the original object: inplace
By default, as shown above, a new object is returned without changing the original. However, if inplace=True
, the original object will be updated in place.
df.fillna(0, inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0
source: pandas_nan_fillna.py
For pandas.Series
As demonstrated in the previous DataFrame
examples, you can also apply fillna()
to Series
.
s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.fillna(100))# 0 24.0# 1 100.0# 2 100.0# 3 68.0# 4 100.0# 5 30.0# Name: age, dtype: float64print(s.fillna({1: 100, 4: -100}))# 0 24.0# 1 100.0# 2 NaN# 3 68.0# 4 -100.0# 5 30.0# Name: age, dtype: float64print(s.fillna(method='bfill', limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64
source: pandas_nan_fillna.py
Methods that correspond to the method
argument are also provided individually for Series
.
print(s.bfill(limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64
source: pandas_nan_fillna.py
pad()
and backfill()
are also provided, but have been deprecated since version 2.0.0
.