I have posted a lot of articles about AI in this blog. I thought of posting them as tweets to my Twitter account, which I currently use for posting AI-related information to promote my AI Course.
With the help of ChatGPT, I created the Python code below, and it worked well to get the links in the expected format. Also, it handled the 280-character limit of tweets. I thought of using it for setting up a cron job to post them automatically using the twitter api. But it seems the free tier has a lot of restrictions. So, I decided to do the posting manually.
import requests
from bs4 import BeautifulSoup
import pandas as pd
BASE_URL = "https://www.blog.qualitypointtech.com"
ARCHIVE_URL = f"{BASE_URL}/2025/"
def fetch_posts(url):
posts = []
while url:
print(f"Fetching {url} ...")
r = requests.get(url, timeout=15)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
# Extract titles & links
for a in soup.select("h3.post-title a"):
title = a.get_text(strip=True)
link = a["href"]
if not link.startswith("http"):
link = BASE_URL + link
# Ensure title+url ≤ 280 chars
combined = f"{title} {link}"
if len(combined) > 280:
allowed_len = 280 - len(link) - 1
title = (title[:allowed_len-3] + "...") if allowed_len < len(title) else title
combined = f"{title} {link}"
posts.append(combined)
# Find "Older Posts" link to move to next page
older = soup.select_one("a.blog-pager-older-link")
url = older["href"] if older else None
return posts
if __name__ == "__main__":
all_posts = fetch_posts(ARCHIVE_URL)
print(f"Collected {len(all_posts)} posts.")
# Save to CSV (single column "tweet")
df = pd.DataFrame(all_posts, columns=["tweet"])
df.to_csv("qualitypointtech_2025_posts.csv", index=False)
print("Saved to qualitypointtech_2025_posts.csv")
I expected it to collect links of 2025 posts, but it collected all the posts starting from 2008.
No comments:
Post a Comment