# 2.3 BeautifulSoup範例 - 2

再來一個範例:

```python
import requests
from bs4 import BeautifulSoup

# Structure of the example html page:
#  body
#   - div
#     - h2
#     - p
#     - table.table
#       - thead
#         - tr
#           - th
#           - th
#           - th
#           - th
#       - tbody
#         - tr
#           - td
#           - td
#           - td
#           - td
#             - a
#               - img
#         - tr
#         - ...


def main():
    url = 'http://blog.castman.net/web-crawler-tutorial/ch2/table/table.html'
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, 'html.parser')

    count_course_number(soup)
    calculate_course_average_price1(soup)
    calculate_course_average_price2(soup)
    retrieve_all_tr_contents(soup)


def count_course_number(soup):
    print('Total course count: ' + str(len(soup.find('table', 'table').tbody.find_all('tr'))) + '\n')


def calculate_course_average_price1(soup):
    # To calculate the average course price
    # Retrieve the record with index:
    prices = []
    rows = soup.find('table', 'table').tbody.find_all('tr')
    for row in rows:
        price = row.find_all('td')[2].text
        print(price)
        prices.append(int(price))
    print('Average course price: ' + str(sum(prices) / len(prices)) + '\n')


def calculate_course_average_price2(soup):
    # Retrieve the record via siblings:
    prices = []
    links = soup.find_all('a')
    for link in links:
        price = link.parent.previous_sibling.text
        prices.append(int(price))
    print('Average course price: ' + str(sum(prices) / len(prices)) + '\n')


def retrieve_all_tr_contents(soup):
    # Retrieve all tr record:
    rows = soup.find('table', 'table').tbody.find_all('tr')
    for row in rows:
        # Except all_tds = row.find_all('td'), you can also retrieve all td record with the following line code:
        all_tds = [td for td in row.children]
        if 'href' in all_tds[3].a.attrs:
            href = all_tds[3].a['href']
        else:
            href = None
        print(all_tds[0].text, all_tds[1].text, all_tds[2].text, href, all_tds[3].a.img['src'])


if __name__ == '__main__':
    main()
```

跟前一個範例比起來, 在這種類型的網頁中, find()跟find\_all()不見得就是最好用的, 在這種走訪網頁結構的過程中, parent, children, next/previous siblings也可以有很好的效果.

輸出如下:

```
Total course count: 6

1490
1890
1890
1890
1890
1890
Average course price: 1823.3333333333333

Average course price: 1823.3333333333333

初心者 - Python入門 初學者 1490 http://www.pycone.com img/python-logo.png
Python 網頁爬蟲入門實戰 有程式基礎的初學者 1890 http://www.pycone.com img/python-logo.png
Python 機器學習入門實戰 (預計) 有程式基礎的初學者 1890 http://www.pycone.com img/python-logo.png
Python 資料科學入門實戰 (預計) 有程式基礎的初學者 1890 http://www.pycone.com img/python-logo.png
Python 資料視覺化入門實戰 (預計) 有程式基礎的初學者 1890 http://www.pycone.com img/python-logo.png
Python 網站架設入門實戰 (預計) 有程式基礎的初學者 1890 None img/python-logo.png

Process finished with exit code 0
```

原始碼[點我](https://github.com/yotsuba1022/web-crawler-practice/blob/master/ch2/beautifulSoupDemo02.py)

參考資料: <https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-the-tree>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://clu.gitbook.io/python-web-crawler-note/23-beautifulsoupfan-li-2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
