# 1.3 一隻很原始的爬蟲

那麼, 爬蟲大概都長成什麼樣子呢?

下面是一隻很原始的爬蟲, 首先, 去指定的網頁把html爬下來, 若爬得到東西(status code 200), 就開始從中取得所需的資訊:

```python
import requests
from bs4 import BeautifulSoup


def main():
    try:
        resp = requests.get('http://blog.castman.net/web-crawler-tutorial/ch1/connect.html')
    except:
        resp = None

    if resp and resp.status_code == 200:
        print(resp.status_code)
        print(resp.text)
        soup = BeautifulSoup(resp.text, 'html.parser')
        print(soup)
        try:
            h1 = soup.find('h1')
        except:
            h1 = None
        if h1:
            print(soup.find('h1'))
            print(soup.find('h1').text)
        try:
            h2 = soup.find('h2')
        except:
            h2 = None
        if h2:
            print(soup.find('h2').text)
        else:
            print('h2 tag not found!')

if __name__ == '__main__':
    main()
```

你可以點進目標網頁裡面, 打開瀏覽器的開發者工具, 看一下該網頁的html結構與內容, 並且跟擷取資訊的程式碼比對一下, 就可以了解這隻蟲想幹嘛了, 至於BeautifulSoup的使用, 之後會介紹.

原始碼[點我](https://github.com/yotsuba1022/web-crawler-practice/blob/master/ch1/connect.py)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://clu.gitbook.io/python-web-crawler-note/13-yi-zhi-hen-yuan-shi-de-pa-chong.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
