# 6.1 非UTF-8編碼的文件

這邊的範例會去爬兩張非UTF-8編碼的網頁, 其中, 在取得response的時候, 就可以先指定原文件的編碼, 方式如下:

```python
# 表示爬回來的網頁內容是以BIG-5編碼為基礎的
resp.encoding = 'big5'
```

最後, 在處理完資料然後要儲存成檔案時, 再轉換成UTF-8編碼:

```python
with open('xxx.txt', 'w', encoding='UTF-8') as file:
```

這樣最後儲存的結果就會是UTF-8編碼了.

```python
import requests
from bs4 import BeautifulSoup


def baidu_encoding():
    resp = requests.get('https://zhidao.baidu.com/question/48795122.html')
    resp.encoding = 'gbk'
    soup = BeautifulSoup(resp.text, 'html.parser')
    title = soup.find('span', 'ask-title').text.strip()
    content = soup.find('span', 'con').text.strip().replace('\n', '')
    print('title', title)
    print('content', content)
    try:
        with open(title + '.txt', 'w', encoding='UTF-8') as file:
            file.write(content)
    except Exception as e:
        print(e)


def gold_66_encoding():
    resp = requests.get('http://www.books.com.tw/activity/gold66_day/')
    resp.encoding = 'big5'
    soup = BeautifulSoup(resp.text, 'html.parser')
    books = list()
    for div in soup.find_all('div', 'sec_day'):
        books.append(div.h1.a.text + div.find_all('h2')[1].text + div.find_all('h2')[2].text)
    print('\n'.join(books))
    try:
        with open('66.txt', 'w', encoding='UTF-8') as file:
            file.write('\n'.join(books))
    except Exception as e:
        print(e)


if __name__ == '__main__':
    baidu_encoding()
    gold_66_encoding()
```

輸出結果:

```
title 网页出现乱码，浏览器字体问题。
content 打网页全乱码点击查看-编码没字体选择两选项左右文档右左文档点击Internet选项点击：字体没反应请问办
再怎麼身體僵硬的人都可以唰ㄚ∼的劈腿，享受身心柔軟的神奇伸展法定價：300元66折優惠價：198元
最會說故事的日本史：不必死記年代、人名，翻到哪讀到哪，課本沒講的、日劇沒演的，一看全明白。定價：340元66折優惠價：224元
徒步中國：從北京走到新疆 一個德國人4646公里的文化長路探索定價：380元66折優惠價：251元
用得到的化學：建構世界的美妙分子定價：700元66折優惠價：462元
勇敢小火車：卡爾的特別任務(加贈劇場版故事CD)定價：360元66折優惠價：238元
麵包的藝術:老麵麵種、食材應用、揉麵技法與長時間低溫發酵：紐約最強麵包大師 超過20年烘焙修練精華，頂級口感與風味技藝總整理定價：1200元66折優惠價：647元
年年18%，一生理財這樣做就對了定價：350元66折優惠價：231元

Process finished with exit code 0
```

原始碼[點我](https://github.com/yotsuba1022/web-crawler-practice/blob/master/ch6/non_utf.py)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://clu.gitbook.io/python-web-crawler-note/61-fei-utf-8-bian-ma-de-wen-jian.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
