この記事はpandasからCSV,Excel,HTMLを読み込む方法を教えます。
まず、HTMLとSQLを読み込みう場合は下記のライブラリを入れてください。
pip install sqlalachemy pip install lxml pip install html5lib pip install BeautifulSoup4
CSV
今回使ってるCSVは下記のLINKでダウンロードしました。
https://support.spatialkey.com/spatialkey-sample-csv-data/
read_csv()でCSVを読み込みます。
import pandas as pd df=pd.read_csv('FL_insurance_sample.csv')
to_csvでCSVを保存します。
df.to_csv('ouputfiles.csv',index=False)
Excel
pandasは値をしか読み込めないです。つまりMacros・Format・ImageはImportできません。
read_excelでExcelを読み込みます。
pd.read_excel('your file name',sheet_name='your sheet name')
to_excelでExcelを保存します。
df2.to_excel('your file name',sheet_name='your sheet name')
HTML
pandasは頑張ってHTMLの中にあるTable tagsを取ってくれます。
df3=pd.read_html('https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table')
結果はこうになります。
[ 0 1 0 Content categories Flow content 1 Permitted content In this order: an optional <caption> element,... 2 Tag omission None, both the starting and ending tag are man... 3 Permitted parents Any element that accepts flow content 4 Permitted ARIA roles Any 5 DOM interface HTMLTableElement, 0 1 2 3 0 NaN black = "#000000" NaN green = "#008000" 1 NaN silver = "#C0C0C0" NaN lime = "#00FF00" 2 NaN gray = "#808080" NaN olive = "#808000" 3 NaN white = "#FFFFFF" NaN yellow = "#FFFF00" 4 NaN maroon = "#800000" NaN navy = "#000080" 5 NaN red = "#FF0000" NaN blue = "#0000FF" 6 NaN purple = "#800080" NaN teal = "#008080" 7 NaN fuchsia = "#FF00FF" NaN aqua = "#00FFFF" ...]
ListがReturnされます。つまり、Index操作ができます。
df3[4]
結果はこうになります。
Unnamed: 0_level_0 | Desktop | Mobile | Unnamed: 3_level_0 | Unnamed: 4_level_0 | Unnamed: 5_level_0 | Unnamed: 6_level_0 | Unnamed: 7_level_0 | Unnamed: 8_level_0 | Unnamed: 9_level_0 | Unnamed: 10_level_0 | Unnamed: 11_level_0 | Unnamed: 12_level_0 | Unnamed: 13_level_0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Unnamed: 0_level_1 | Chrome | Edge | Firefox | Internet Explorer | Opera | Safari | Android webview | Chrome for Android | Edge Mobile | Firefox for Android | Opera for Android | Safari on iOS | Samsung Internet | |
0 | Basic support | Chrome Full support 1 | Edge Full support Yes | Firefox Full support 1 | IE Full support Yes | Opera Full support Yes | Safari Full support Yes | WebView Android Full support 1 | Chrome Android Full support 18 | Edge Mobile Full support Yes | Firefox Android Full support 4 | Opera Android Full support Yes | Safari iOS Full support Yes | Samsung Internet Android Full support Yes |
1 | align Deprecated | Chrome Full support 1 | Edge Full support Yes | Firefox Full support 1 | IE Full support Yes | Opera Full support Yes | Safari Full support Yes | WebView Android Full support 1 | Chrome Android Full support 18 | Edge Mobile Full support Yes | Firefox Android Full support 4 | Opera Android Full support Yes | Safari iOS Full support Yes | Samsung Internet Android Full support Yes |
2 | bgcolor Deprecated | Chrome Full support 1 | Edge Full support Yes | Firefox Full support 1 | IE Full support Yes | Opera Full support Yes | Safari Full support Yes | WebView Android Full support 1 | Chrome Android Full support 18 | Edge Mobile Full support Yes | Firefox Android Full support 4 | Opera Android Full support Yes | Safari iOS Full support Yes | Samsung Internet Android Full support Yes |
3 | border Deprecated | Chrome Full support 1 | Edge Full support Yes | Firefox Full support 1 | IE Full support Yes | Opera Full support Yes | Safari Full support Yes | WebView Android Full support 1 | Chrome Android Full support 18 | Edge Mobile Full support Yes | Firefox Android Full support 4 | Opera Android Full support Yes | Safari iOS Full support Yes | Samsung Internet Android Full support Yes |
それじゃねー