📘
Python web crawler note
  • Introduction
  • 1. 環境安裝與爬蟲的基本
  • 1.1 環境安裝
  • 1.2 IDE設定
  • 1.3 一隻很原始的爬蟲
  • 1.4 幫爬蟲裝煞車
  • 2. 用BeautifuSoup來分析網頁
  • 2.1 BeautifulSoup範例 - 1
  • 2.2 BeautifulSoup說明
  • 2.3 BeautifulSoup範例 - 2
  • 2.4 加入Regular Expression
  • 2.5 Dcard今日十大熱門文章
  • 3. 更多實際的應用
  • 3.1 PTT八卦版今日熱門文章
  • 3.2 Yahoo奇摩電影本週新片
  • 3.3 蘋果日報/自由時報今日焦點
  • 3.4 Google Finance 個股資訊
  • 3.5 Yahoo奇摩字典
  • 4. 基於API的爬蟲
  • 4.1 八卦版鄉民從哪來?
  • 4.2 Facebook Graph API
  • 4.3 imdb電影資訊查詢
  • 4.4 Google Finance API
  • 4.5 台灣證券交易所API
  • 5. 資料儲存
  • 5.1 痴漢爬蟲(PTT表特版下載器)
  • 5.2 儲存成CSV檔案
  • 5.3 儲存至SQLite
  • 6. 不同編碼/文件類型的爬蟲
  • 6.1 非UTF-8編碼的文件
  • 6.2 XML文件
  • 7. 比價爬蟲
  • 7.1 momo購物網爬蟲
  • 7.2 PChome 24h API爬蟲
  • 7.3 比價圖表程式
  • 8. 處理POST請求/登入頁面
  • 8.1 空氣品質監測網
  • 9. 動態網頁爬蟲
  • 9.1 台銀法拍屋資訊查詢
  • 10. 自然語言處理
  • 10.1 歌詞頻率與歌詞雲
Powered by GitBook
On this page

Was this helpful?

9.1 台銀法拍屋資訊查詢

不囉唆, 直接上code.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By


TW_BANK_HOUSE_URL = 'http://www.bot.com.tw/house/default.aspx'
DRIVER_PATH = '../driver/'
CHROME_DRIVER = 'chromedriver'
PHANTOMJS_DRIVER = 'phantomjs'


def get_selenium_driver(execute_core):
    if execute_core == CHROME_DRIVER:
        # The chrome driver will launch chrome browser in your computer.
        return webdriver.Chrome(DRIVER_PATH + CHROME_DRIVER)
    elif execute_core == PHANTOMJS_DRIVER:
        # With PhantomJS, it will not trigger a real browser, instead, the crawler will run in background.
        return webdriver.PhantomJS(DRIVER_PATH + PHANTOMJS_DRIVER)
    else:
        return None


def init_selenium_driver(driver, url):
    driver.maximize_window()
    driver.set_page_load_timeout(60)
    driver.get(url)
    return driver


def launch_driver(driver, from_date, to_date):
    try:
        # Target the date fields and input date values.
        element = driver.find_element_by_id('fromdate_TextBox')
        element.send_keys(from_date)
        element = driver.find_element_by_id('todate_TextBox')
        element.send_keys(to_date)

        # Click the option list.
        driver.find_element_by_id('purpose_DDL').click()

        # Choose the specified option.
        for option in driver.find_elements_by_tag_name('option'):
            if option.text == '住宅':
                option.click()

        # Submit the form.
        element = driver.find_element_by_id('Submit_Button').click()

        # Wait until the result appear.
        element = WebDriverWait(driver, 5).until(
            expected_conditions.presence_of_element_located((By.ID, 'House_GridView'))
        )

        # page_source will return the current content shown on browser.
        dom = BeautifulSoup(driver.page_source, 'html5lib')
        table = dom.find(id='House_GridView')
        for row in table.find_all('tr'):
            print([s for s in row.stripped_strings])
    finally:
        # Close the browser and finish the webdriver process.
        driver.quit()


def main():
    from_date = '1020101'
    to_date = '1060101'
    driver = get_selenium_driver(PHANTOMJS_DRIVER)
    if driver:
        driver = init_selenium_driver(driver, TW_BANK_HOUSE_URL)
        launch_driver(driver, from_date, to_date)
    else:
        print('Driver not found.')


if __name__ == '__main__':
    main()

結果輸出:

['管理編號', '拍賣日期', '門\u3000牌', '用途', '樓層別', '建坪', '地坪', '拍賣總價', '執行法院', '案\u3000號', '拍賣次數', '備\u3000註']
['J027050002', '105.11.28', '新北市新店區三民路81號2樓', '住宅', '2', '29.00', '13,500,000', '台北地院', '105年度司執字第31434號', '3', '可點交']
['J250020002', '105.10.20', '雲林縣北港鎮文仁路158巷9號4樓', '住宅', '4', '36.80', '11.04', '1,731,000', '雲林地院', '105年度司執字第2234號', '特', '特別推薦', '可點交']
['J042040004', '105.10.18', '新北市三重區重新路四段214巷2號', '住宅', '1', '29.42', '2.35', '13,000,000', '臺灣新北地方法院', '104年度司執字第141791號', '減價拍賣', '特別推薦', '可點交']
['J120050002', '105.06.29', '高雄市林園區鳳林路三段685巷16弄10之6號', '住宅', '2', '28.78', '16.92', '2,333,000', '法務部行政執行署高雄分署', '104年度房稅執字第00003747號', '公告應買', '特別推薦', '可點交']

Process finished with exit code 0
Previous9. 動態網頁爬蟲Next10. 自然語言處理

Last updated 5 years ago

Was this helpful?

原始碼

點我