使用 Python 爬取 BigGo RTX 4090 顯示卡價格資訊

一、教學目標

知識目標：理解網站爬蟲的基本原理及靜態與動態網頁的差異。
技能目標：學會使用 requests 與 BeautifulSoup 爬取靜態內容，以及 Selenium 爬取動態內容，並將資料整理為 CSV 格式。
態度目標：培養學生對價格比較與市場趨勢分析的興趣，並認識資料擷取的實用性。

二、前置準備

適用對象

對象：高中生、大學生或對 Python、網頁爬蟲與電商數據有興趣的初學者。
先備知識：基礎 Python（檔案操作、套件使用），HTML 基本結構。

所需軟硬體

硬體：具備網路連線的電腦。
軟體：
- Python 3.x（建議搭配 Anaconda）。
- Python 套件：requests、beautifulsoup4、pandas、selenium、webdriver-manager。
- 文字編輯器（如 VS Code 或 Jupyter Notebook）。
- Google Chrome 瀏覽器（用於 Selenium）。
其他：穩定的網路環境。

教師準備

預先安裝所有套件並測試程式碼。
準備 BigGo 網站結構簡介（HTML 標籤、動態載入說明）。
提供範例 CSV 檔案作為參考。

三、教學流程

時間分配：總計 120 分鐘

導入（15 分鐘）
方法 1：使用 requests 爬取（40 分鐘）
方法 2：使用 Selenium 爬取（50 分鐘）
總結與討論（15 分鐘）

1. 導入（15 分鐘）

活動：情境引導

教師提問：
「如果你想買 RTX 4090 顯示卡，如何快速比較不同賣場的價格？」
簡介 BigGo 與爬蟲：
- BigGo 是什麼？（台灣價格比較平台）。
- 爬蟲用途：自動收集網頁資料（如價格、賣場）。
- 靜態 vs 動態網頁：介紹 requests（靜態）與 Selenium（動態）的差異。
目標導引：
「今天我們將學習兩種方法，從 BigGo 爬取 RTX 4090 價格，並存成表格，幫助你掌握市場動態！」

2. 方法 1：使用 `requests` 與 `BeautifulSoup` 爬取（40 分鐘）

講解（15 分鐘）

環境設定：

安裝套件：

pip install requests beautifulsoup4 pandas

程式說明：
- requests.get()：發送 HTTP 請求。
- BeautifulSoup：解析 HTML，提取標籤內容。
- pandas：整理資料並輸出 CSV。
展示範例：運行程式並顯示部分結果。

實作活動（25 分鐘）

學生任務：
1. 執行以下程式碼，爬取 RTX 4090 價格：

import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = "https://biggo.com.tw/s/?q=RTX+4090"
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}

response = requests.get(URL, headers=HEADERS)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, "html.parser")
    products = soup.find_all("div", class_="Product")

    product_list = []
    for product in products:
        try:
            title = product.find("h2").text.strip()
            price = product.find("span", class_="price").text.strip()
            seller = product.find("span", class_="seller").text.strip()
            link = product.find("a", class_="title")["href"]
            product_list.append([title, price, seller, f"https://biggo.com.tw{link}"])
        except AttributeError:
            continue

    df = pd.DataFrame(product_list, columns=["商品名稱", "價格", "賣場", "連結"])
    print(df)
    df.to_csv("biggo_rtx4090_prices.csv", index=False, encoding="utf-8-sig")
    print("✅ 已成功爬取並儲存 RTX 4090 價格資訊！")
else:
    print("❌ 無法取得 BigGo 網頁資料，請檢查 URL 或 Headers 設定。")

檢查 biggo_rtx4090_prices.csv，確認包含名稱、價格等欄位。

成果檢查：若資料不完整，討論可能原因（如動態載入）。

3. 方法 2：使用 `Selenium` 爬取動態內容（50 分鐘）

講解（20 分鐘）

環境設定：
- 安裝套件：
```
pip install selenium pandas webdriver-manager
```
- WebDriver 說明：介紹 ChromeDriver 與 webdriver-manager。
程式說明：
- webdriver.Chrome()：模擬瀏覽器。
- --headless：無視窗運行。
- find_elements()：動態提取資料。
展示範例：運行程式並比較與方法 1 的差異。

實作活動（30 分鐘）

學生任務：
1. 執行以下程式碼，爬取動態數據：

import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
URL = "https://biggo.com.tw/s/?q=RTX+4090"
driver.get(URL)
time.sleep(3)

products = driver.find_elements(By.CLASS_NAME, "Product")
product_list = []

for product in products:
    try:
        title = product.find_element(By.TAG_NAME, "h2").text.strip()
        price = product.find_element(By.CLASS_NAME, "price").text.strip()
        seller = product.find_element(By.CLASS_NAME, "seller").text.strip()
        link = product.find_element(By.CLASS_NAME, "title").get_attribute("href")
        product_list.append([title, price, seller, link])
    except Exception as e:
        print(f"❌ 抓取錯誤：{e}")
        continue

driver.quit()
df = pd.DataFrame(product_list, columns=["商品名稱", "價格", "賣場", "連結"])
df.to_csv("biggo_rtx4090_prices_selenium.csv", index=False, encoding="utf-8-sig")
print("✅ 已成功爬取並儲存 RTX 4090 價格資訊！")

檢查 biggo_rtx4090_prices_selenium.csv，確認資料完整性。

成果檢查：比較兩種方法的 CSV，討論 Selenium 的優勢。

4. 總結與討論（15 分鐘）

教師總結：
- 方法 1（requests）：適合靜態網頁，快速但受限於動態內容。
- 方法 2（Selenium）：適合動態網頁，功能強大但較慢。
學生討論：
- 「哪個賣場的 RTX 4090 價格最低？」
- 「爬蟲數據可以用來做什麼？（如價格追蹤、市場分析）」

四、評量方式

實作成果（70%）：
- 成功生成 biggo_rtx4090_prices.csv（方法 1）。
- 成功生成 biggo_rtx4090_prices_selenium.csv（方法 2）。
參與討論（30%）：
- 在總結時分享觀察或問題。

五、延伸應用

進階挑戰：
- 修改程式，爬取多頁資料（處理分頁）。
- 加入時間戳記，定期爬取並追蹤價格變化。
- 使用 matplotlib 繪製價格趨勢圖。
實務專案：
- 爬取其他商品（如 RTX 4080），比較價格差異。
- 設計自動化腳本，定時更新價格並通知最低價。

六、教學資源

參考連結：
附件：
- 程式碼檔案（.py 格式）。
- HTML 結構解析範例（PDF）。

使用 Python 爬取 BigGo RTX 4090 顯示卡價格資訊

使用 Python 爬取 BigGo RTX 4090 顯示卡價格資訊

一、教學目標

二、前置準備

適用對象

所需軟硬體

教師準備

三、教學流程

時間分配：總計 120 分鐘

1. 導入（15 分鐘）

活動：情境引導

2. 方法 1：使用 requests 與 BeautifulSoup 爬取（40 分鐘）

講解（15 分鐘）

實作活動（25 分鐘）

3. 方法 2：使用 Selenium 爬取動態內容（50 分鐘）

講解（20 分鐘）

實作活動（30 分鐘）

4. 總結與討論（15 分鐘）

四、評量方式

五、延伸應用

六、教學資源

延伸閱讀

標籤

2. 方法 1：使用 `requests` 與 `BeautifulSoup` 爬取（40 分鐘）

3. 方法 2：使用 `Selenium` 爬取動態內容（50 分鐘）