Using groupby methods to light a group rows of data together and calling the aggregate functions.

So, What is group by? It allows you to groups together rows based of a columns and perform an aggregate function.

So, What is aggregate function? it is just a term for any function and any values.

下記のコードはまずDataFrameを作成します。

data={
'Company':['Goo','Goo','MS','MS','FB','FB'],
'Person':['Sam','Chris','John','Mary','Amy','Jack'],
'Sales':[200,120,340,124,243,350]
}
df=pd.DataFrame(data)

これは実際jupyterで出力されたTableです。

	Company	Person	Sales
0	Goo	Sam	200
1	Goo	Chris	120
2	MS	John	340
3	MS	Mary	124
4	FB	Amy	243
5	FB	Jack	350

groupby()を使ってみます。Columns”Company”をBaseにしてGroupします。

df2=df.groupby('Company')

これは実際jupyterで出力されたものです。groupby Objectが戻りました。

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x11cc8a940>

いまからこのObjectを使ってみよう。

まずgroupbyされたObjectの平均値を求めます。

df2.mean()

これは実際jupyterで出力されたTableです。

	Sales
Company
FB	296.5
Goo	160.0
MS	232.0

今度は標準偏差を求めます。

df2.std()

これは実際jupyterで出力されたTableです。

	Sales
Company
FB	75.660426
Goo	56.568542
MS	152.735065

Seriesだけが戻りたい場合、loc()かiloc()使えばOKです。

df2.sum().loc['FB']
df2.sum().iloc[0]

これは実際jupyterで出力されたものです。

Sales    593
Name: FB, dtype: int64

いろんな情報を一気にもらう。

df2.describe()

これは実際jupyterで出力されたTableです。

	Sales
	count	mean	std	min	25%	50%	75%	max
Company
FB	2.0	296.5	75.660426	243.0	269.75	296.5	323.25	350.0
Goo	2.0	160.0	56.568542	120.0	140.00	160.0	180.00	200.0
MS	2.0	232.0	152.735065	124.0	178.00	232.0	286.00	340.

同じですが、今度はColumnsです。

df2.describe().T

これは実際jupyterで出力されたTableです。

	Company	FB	Goo	MS
Sales	count	2.000000	2.000000	2.000000
	mean	296.500000	160.000000	232.000000
	std	75.660426	56.568542	152.735065
	min	243.000000	120.000000	124.000000
	25%	269.750000	140.000000	178.000000
	50%	296.500000	160.000000	232.000000
	75%	323.250000	180.000000	286.000000
	max	350.000000	200.000000	340.000000

更にSeriesを指定し戻ります。

df2.describe().T['FB']

これは実際jupyterで出力されたものです。

Sales  count      2.000000
       mean     296.500000
       std       75.660426
       min      243.000000
       25%      269.750000
       50%      296.500000
       75%      323.250000
       max      350.000000
Name: FB, dtype: float64

それじゃねー