pandas#Series |

まずPandasとは？

Open source library built on top of numpy
Allows for fas analysis and data cleaning and preparation
Excels in performance and productivity
has build-in visualizations
can work with data from a wide variety of sources

インストール

conda install pandas

pip install pandas

Seriesとは？

The main data type will working with pandas.
Series is very like to numpy array and it is actually build on the numpy object, but you can index by a label.

下記のコードはString list、Int list、Numpy配列と辞書を作ってそれそれをpd.Series()でSeriesを作成してみます。

import numpy as np
import pandas as pd

labels=['a','b','c']
my_data=[10,20,30]
arr=np.array(my_data)
d={'a':10,'b':20,'c':30}

IntのListを入れてみます。

pd.Series(data=my_data)

0    10
1    20
2    30
dtype: int64

Numpyの配列を入れてみます。

pd.Series(data=arr)

0    10
1    20
2    30
dtype: int64

Seriesのdata変数はintのList,Indexの変数はStringのListを入れます。

pd.Series(data=my_data,index=labels)　

a    10
b    20
c    30
dtype: int64

次は辞書を入れます。

pd.Series(d)

a    10
b    20
c    30
dtype: int64

もちろん、Seriesは数値だけではなく、文字列もOKです。

pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

そのほかにはFunctionも入れてます(多分あまり使わないですが)

pd.Series(data=[sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

Pandas make use of these index or numbers by a allowing for very fast lookups of information and it works just like a hash table or dictionary.

いま２つのSeriesを作成してみます。

ser1=pd.Series([1,2,3,4],['a','b','c','d'])

a    1
b    2
c    3
d    4
dtype: int64

ser2=pd.Series([1,2,5,4],['a','b','e','d'])

a    1
b    2
e    5
d    4
dtype: int64

キー’a’をとってみよう、結果が１なのはindex aの値は１なので。

ser['a']

1

２つのSeriesをくっついてみます。(Indexベース）、Keyの一部がゼロの原因はそれぞれのSeriesに足りない部分があります。例えばSer1はIndex”e”がありません。あと数値は精度を失わないため勝手にfloat64に変わる。

Ser1+Ser2

a    2.0
b    4.0
c    NaN
d    8.0
e    NaN
dtype: float64

それじゃねー