AI金融投資:批量下載深交所公募REITs公開說明書

部落人有文化 2024-06-16 19:18:51

打開深交所公募REITs公開說明書頁面,F12查看網絡,找到真實地址:https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616

{

"announceCount": 39,

"data": [

{

"id": "80bc99a7-8a04-4803-b42a-d9cca1e6c5d5",

"annId": 1220300147,

"title": "華夏華潤商業REIT:華夏華潤商業資産封閉式基礎設施證券投資基金招募說明書更新",

"content": null,

"publishTime": "2024-06-08 00:00:00",

"attachPath": "/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",

"attachFormat": "PDF",

"attachSize": 6265,

"secCode": [

"180601"

],

"secName": [

"華夏華潤商業REIT"

],

"bondType": null,

"bigIndustryCode": null,

"bigCategoryId": null,

"smallCategoryId": null,

"channelCode": null,

"_index": "ows_disclosure-20180825"

},

返回的是json數據,PDF地址在這裏:"/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",

打開下載頁面,查看網站URL:https://disc.static.szse.cn/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF

那麽,開頭要添加的是“https://disc.static.szse.cn”

在deepseek中輸入提示詞:

你是一個Python編程專家,寫一個Python腳本,具體步驟如下:

請求網址:

https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616

請求方法:

POST

狀態代碼:

200 OK

遠程地址:

58.251.50.138:443

引薦來源網址政策:

strict-origin-when-cross-origin

請求載荷:

{"seDate":["",""],"channelCode":["reits-xxpl"],"bigCategoryId":["directions"],"pageSize":50,"pageNum":1}

請求標頭:

Accept:

application/json, text/javascript, */*; q=0.01

Accept-Encoding:

gzip, deflate, br, zstd

Accept-Language:

zh-CN,zh;q=0.9,en;q=0.8

Connection:

keep-alive

Content-Length:

104

Content-Type:

application/json

Host:

http://reits.szse.cn

Origin:

https://reits.szse.cn

Referer:

https://reits.szse.cn/disclosure/index.html

Sec-Ch-Ua:

"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"

Sec-Ch-Ua-Mobile:

?0

Sec-Ch-Ua-Platform:

"Windows"

Sec-Fetch-Dest:

empty

Sec-Fetch-Mode:

cors

Sec-Fetch-Site:

same-origin

User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

X-Request-Type:

ajax

X-Requested-With:

XMLHttpRequest

獲取網頁返回的響應,這是一個嵌套的json數據;

定位到 "data"鍵下"title"鍵對應的值,這是PDF文件的標題;

定位到 "data"鍵下 "attachPath"鍵對應的值,這是PDF文件URL,前面加上“https://disc.static.szse.cn”,構成一個完整的PDF下載URL;

下載PDF文件,保存到文件夾:F:\AI自媒體內容\AI炒股\REITs

注意:每一步都要輸出信息

PDF文件標題中可能包括一些不符合window系統命名規則的特殊符號,在重命名PDF文件前要先進行處理;

每下一個PDF文件,就隨機暫停3-6秒;

源代碼:

import requests

import json

import os

import time

import random

import re

# 定義請求URL和請求頭

url = "https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616"

headers = {

"Accept": "application/json, text/javascript, */*; q=0.01",

"Accept-Encoding": "gzip, deflate, br, zstd",

"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",

"Connection": "keep-alive",

"Content-Type": "application/json",

"Host": "http://reits.szse.cn",

"Origin": "https://reits.szse.cn",

"Referer": "https://reits.szse.cn/disclosure/index.html",

"Sec-Ch-Ua": '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',

"Sec-Ch-Ua-Mobile": "?0",

"Sec-Ch-Ua-Platform": '"Windows"',

"Sec-Fetch-Dest": "empty",

"Sec-Fetch-Mode": "cors",

"Sec-Fetch-Site": "same-origin",

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",

"X-Request-Type": "ajax",

"X-Requested-With": "XMLHttpRequest"

}

# 定義請求載荷

payload = {

"seDate": ["", ""],

"channelCode": ["reits-xxpl"],

"bigCategoryId": ["directions"],

"pageSize": 50,

"pageNum": 1

}

# 發送POST請求

response = requests.post(url, headers=headers, json=payload)

# 檢查響應狀態碼

if response.status_code == 200:

print("請求成功,狀態碼: 200 OK")

else:

print(f"請求失敗,狀態碼: {response.status_code}")

exit()

# 解析JSON響應

data = response.json()

# 檢查是否有數據

if "data" in data and isinstance(data["data"], list):

for item in data["data"]:

# 獲取PDF標題

pdf_title = item.get("title", "unknown_title")

print(f"PDF標題: {pdf_title}")

# 獲取PDF URL

pdf_url = item.get("attachPath", "")

if pdf_url:

pdf_url = "https://disc.static.szse.cn" + pdf_url

print(f"PDF URL: {pdf_url}")

# 處理PDF標題中的非法字符

pdf_title = re.sub(r'[<>:"/\\|?*]', '_', pdf_title)

# 定義保存路徑

save_path = f"F:\\AI自媒體內容\\AI炒股\\REITs\\{pdf_title}.pdf"

# 下載PDF文件

pdf_response = requests.get(pdf_url)

if pdf_response.status_code == 200:

with open(save_path, 'wb') as f:

f.write(pdf_response.content)

print(f"PDF文件已保存到: {save_path}")

else:

print(f"下載PDF文件失敗,狀態碼: {pdf_response.status_code}")

# 隨機暫停3-6秒

time.sleep(random.uniform(3, 6))

else:

print("沒有找到數據")

0 阅读:0

部落人有文化

簡介:感謝大家的關注