pandas#Missing Data

2018/11/29 pandas, Python, 勉強

A lot of time when you are using pandas to read in the data, right? if you have a missing point,pandas will automatically fill in that missing point with ‘NaN’ value.

下記の例はNan値が入ってる辞書があって新しいDataFrameを作成します。
np.nanとは？NaNs can be used as a poor-man’s mask (if you don’t care what the original value was)

data={'A':[1,2,np.nan],
'B':[5,np.nan,np.nan],
'C':[1,2,3]
}

df=pd.DataFrame(data)

これは実際jupyterで出力されたTableです。

	A	B	C
0	1.0	5.0	1
1	2.0	NaN	2
2	NaN	NaN	3

df.dropna()でNan値が入ってるRows/Columnsを削除します。

df.dropna()

これは実際jupyterで出力されたTableです。

	A	B	C
0	1.0	5.0	1

もしAxis=1を入れれば、今度はColumnsがNan値入ってるところが削除します。

df.dropna(axis=1)

これは実際jupyterで出力されたTableです。

	C
0	1
1	2
2	3

今度はthresh=2を入れてみますね。DataFrameのRowことに2つ以上Non-Nan値があれば削除しない。

df.dropna(thresh=2)

これは実際jupyterで出力されたTableです。

	A	B	C
0	1.0	5.0	1
1	2.0	NaN	2

df.fillna()を使うとNan値のところが別の値を入れることができます。

まずはNan値を9999入れてみます。

df.fillna(value=9999)

これは実際jupyterで出力されたTableです。

	A	B	C
0	1.0	5.0	1
1	2.0	9999.0	2
2	9999.0	9999.0	3

今度は文字列”FILL IT!”を入れてみます。

df.fillna(value='FILL IT!')

これは実際jupyterで出力されたTableです。

	A	B	C
0	1	5	1
1	2	FILL IT!	2
2	FILL IT!	FILL IT!	3

最後はmean()を試してみよう。

df.fillna(value=df['A'].mean())

これは実際jupyterで出力されたTableです。A[2]はA[0]とA[1]の平均値に入れました。

	A	B	C
0	1.0	5.0	1
1	2.0	1.5	2
2	1.5	1.5	3

それじゃねー