AI網絡爬蟲:用kimi根據RSS批量下載播客音頻

部落人有文化 2024-05-17 10:33:18

很多優質的播客想下載到電腦上保存慢慢聽,用kimi就可以很方便的批量下載。

在播客上點擊subscribe via rss:

打開就是一個rss地址:

裏面有播客的標題、摘要和音頻下載地址;

在kimi中輸入提示詞:

你是一個Python編程專家,要完成一個批量下載播客音頻的Python腳本,一步一步的思考:

打開網頁https://feeds.megaphone.fm/create-like-the-greats

定位所有<item>標簽,

在<item>標簽中定位<title>標簽,提取其文本內容作爲播客音頻的文件名;

在<item>標簽中<enclosure> 標簽,提取其url屬性作爲播客音頻的URL;

下載播客音頻,保存到E盤文件夾:Create Like the Greats

注意:每一步都要輸出信息到屏幕上

多線程下載,以加快下載速度;

顯示下載進度條;

Kimi生成Python代碼:

import os

import requests

from bs4 import BeautifulSoup

from concurrent.futures import ThreadPoolExecutor

# 定義下載函數

def download_audio(podcast_title, podcast_url, save_path):

try:

local_filename = os.path.join(save_path, podcast_title)

with requests.get(podcast_url, stream=True) as r:

r.raise_for_status() # 抛出HTTPError異常,如果響應狀態碼不是200

with open(local_filename, 'wb') as f:

for chunk in r.iter_content(chunk_size=1024):

if chunk: # 過濾掉保持連接的chunk

f.write(chunk)

print(f"{podcast_title} 下載完成,保存在 {local_filename}")

except requests.exceptions.HTTPError as e:

print(f"{podcast_title} 下載失敗,錯誤碼:{e.response.status_code}")

# 檢查保存路徑是否存在,如果不存在則創建

save_path = "E:\\Create Like the Greats"

if not os.path.exists(save_path):

os.makedirs(save_path)

# 獲取並解析播客Feed

url = "https://feeds.megaphone.fm/create-like-the-greats"

response = requests.get(url)

if response.status_code == 200:

print("網頁內容已成功獲取。")

soup = BeautifulSoup(response.content, 'html.parser')

items = soup.find_all('item')

print(f"找到 {len(items)} 個播客音頻項。")

# 使用線程池下載所有音頻

with ThreadPoolExecutor(max_workers=5) as executor: # 可以調整線程數

futures = []

for item in items:

title_tag = item.find('title')

enclosure_tag = item.find('enclosure', attrs={'url': True})

if title_tag and enclosure_tag:

title = title_tag.get_text().strip()

url = enclosure_tag['url']

# 清理標題以創建合法文件名

filename = "".join([c for c in title if c.isalpha() or c.isdigit() or c==' ']).rstrip() + ".mp3"

future = executor.submit(download_audio, filename, url, save_path)

futures.append(future)

print(f"安排下載:{filename}")

else:

print("缺少標題或鏈接信息,跳過該項。")

# 等待所有下載任務完成

for future in futures:

future.result()

else:

print("無法獲取網頁內容,錯誤碼:", response.status_code)

print("所有任務完成。")

在vscode中運行代碼,很快播客音頻就下載完成了:

0 阅读:14

部落人有文化

簡介:感謝大家的關注