📘
Python web crawler note
  • Introduction
  • 1. 環境安裝與爬蟲的基本
  • 1.1 環境安裝
  • 1.2 IDE設定
  • 1.3 一隻很原始的爬蟲
  • 1.4 幫爬蟲裝煞車
  • 2. 用BeautifuSoup來分析網頁
  • 2.1 BeautifulSoup範例 - 1
  • 2.2 BeautifulSoup說明
  • 2.3 BeautifulSoup範例 - 2
  • 2.4 加入Regular Expression
  • 2.5 Dcard今日十大熱門文章
  • 3. 更多實際的應用
  • 3.1 PTT八卦版今日熱門文章
  • 3.2 Yahoo奇摩電影本週新片
  • 3.3 蘋果日報/自由時報今日焦點
  • 3.4 Google Finance 個股資訊
  • 3.5 Yahoo奇摩字典
  • 4. 基於API的爬蟲
  • 4.1 八卦版鄉民從哪來?
  • 4.2 Facebook Graph API
  • 4.3 imdb電影資訊查詢
  • 4.4 Google Finance API
  • 4.5 台灣證券交易所API
  • 5. 資料儲存
  • 5.1 痴漢爬蟲(PTT表特版下載器)
  • 5.2 儲存成CSV檔案
  • 5.3 儲存至SQLite
  • 6. 不同編碼/文件類型的爬蟲
  • 6.1 非UTF-8編碼的文件
  • 6.2 XML文件
  • 7. 比價爬蟲
  • 7.1 momo購物網爬蟲
  • 7.2 PChome 24h API爬蟲
  • 7.3 比價圖表程式
  • 8. 處理POST請求/登入頁面
  • 8.1 空氣品質監測網
  • 9. 動態網頁爬蟲
  • 9.1 台銀法拍屋資訊查詢
  • 10. 自然語言處理
  • 10.1 歌詞頻率與歌詞雲
Powered by GitBook
On this page

Was this helpful?

6.2 XML文件

這邊用的是ElementTree.XML套件:

import xml.etree.ElementTree as ET


if __name__ == '__main__':
    tree = ET.parse('example.xml')
    root = tree.getroot()
    print(root.attrib)
    total = root.attrib['totalResults']
    movies = list()
    for tag in root.findall('result'):
        print(tag.attrib)
        movies.append(tag.attrib['title'])
    print('-----')
    print('There are', total, 'results in the xml file.')
    print('Top 10 record:')
    print('\n'.join(movies))

輸出結果:

{'totalResults': '81', 'response': 'True'}
{'title': 'Iron Man', 'year': '2008', 'imdbID': 'tt0371746', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTczNTI2ODUwOF5BMl5BanBnXkFtZTcwMTU0NTIzMw@@._V1_SX300.jpg'}
{'title': 'Iron Man 3', 'year': '2013', 'imdbID': 'tt1300854', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTkzMjEzMjY1M15BMl5BanBnXkFtZTcwNTMxOTYyOQ@@._V1_SX300.jpg'}
{'title': 'Iron Man 2', 'year': '2010', 'imdbID': 'tt1228705', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTM0MDgwNjMyMl5BMl5BanBnXkFtZTcwNTg3NzAzMw@@._V1_SX300.jpg'}
{'title': 'The Man in the Iron Mask', 'year': '1998', 'imdbID': 'tt0120744', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BZjM2YzcxMmQtOTc2Mi00YjdhLWFlZjUtNmFmMDQzYzU2YTk5L2ltYWdlXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_SX300.jpg'}
{'title': 'The Man with the Iron Fists', 'year': '2012', 'imdbID': 'tt1258972', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTg5ODI3ODkzOV5BMl5BanBnXkFtZTcwMTQxNjUwOA@@._V1_SX300.jpg'}
{'title': 'Tetsuo, the Iron Man', 'year': '1989', 'imdbID': 'tt0096251', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTg2MjAzOTU3MF5BMl5BanBnXkFtZTcwOTMxODkyMQ@@._V1_SX300.jpg'}
{'title': 'The Invincible Iron Man', 'year': '2007', 'imdbID': 'tt0903135', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BOGRmZDg1YjMtMDA5YS00OTFjLTgyMjQtNDgzNTIyNzAwZDg0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_SX300.jpg'}
{'title': 'Iron Man: Rise of Technovore', 'year': '2013', 'imdbID': 'tt2654124', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BNDkzMTM1ODk4N15BMl5BanBnXkFtZTcwNzU0NDYxOQ@@._V1_SX300.jpg'}
{'title': 'The Man with the Iron Fists 2', 'year': '2015', 'imdbID': 'tt3625152', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BODkyMTMwMjA0Nl5BMl5BanBnXkFtZTgwMzQ3MDc4NDE@._V1_SX300.jpg'}
{'title': 'Man of Iron', 'year': '1981', 'imdbID': 'tt0082222', 'type': 'movie', 'poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMjIyMzAzNzIwOF5BMl5BanBnXkFtZTgwMjU0MjkwMTE@._V1_SX300.jpg'}
-----
There are 81 results in the xml file.
Top 10 record:
Iron Man
Iron Man 3
Iron Man 2
The Man in the Iron Mask
The Man with the Iron Fists
Tetsuo, the Iron Man
The Invincible Iron Man
Iron Man: Rise of Technovore
The Man with the Iron Fists 2
Man of Iron

Process finished with exit code 0
Previous6.1 非UTF-8編碼的文件Next7. 比價爬蟲

Last updated 5 years ago

Was this helpful?

原始碼

點我