Wednesday, September 17, 2025

Simple Python code to scrap all links from a blog to post as tweets


 I have posted a lot of articles about AI in this blog. I thought of posting them as tweets to my Twitter account, which I currently use for posting AI-related information to promote my AI Course.

With the help of ChatGPT, I created the Python code below, and it worked well to get the links in the expected format.  Also, it handled the 280-character limit of tweets. I thought of using it for setting up a cron job to post them automatically using the twitter api. But it seems the free tier has a lot of restrictions. So, I decided to do the posting manually. 

import requests
from bs4 import BeautifulSoup
import pandas as pd

BASE_URL = "https://www.blog.qualitypointtech.com"
ARCHIVE_URL = f"{BASE_URL}/2025/"

def fetch_posts(url):
  posts = []
  while url:
      print(f"Fetching {url} ...")
      r = requests.get(url, timeout=15)
      r.raise_for_status()
      soup = BeautifulSoup(r.text, "html.parser")

      # Extract titles & links
      for a in soup.select("h3.post-title a"):
          title = a.get_text(strip=True)
          link = a["href"]
          if not link.startswith("http"):
              link = BASE_URL + link

          # Ensure title+url ≤ 280 chars
          combined = f"{title} {link}"
          if len(combined) > 280:
              allowed_len = 280 - len(link) - 1
              title = (title[:allowed_len-3] + "...") if allowed_len < len(title) else title
              combined = f"{title} {link}"
          posts.append(combined)

      # Find "Older Posts" link to move to next page
      older = soup.select_one("a.blog-pager-older-link")
      url = older["href"] if older else None

  return posts

if __name__ == "__main__":
  all_posts = fetch_posts(ARCHIVE_URL)
  print(f"Collected {len(all_posts)} posts.")

  # Save to CSV (single column "tweet")
  df = pd.DataFrame(all_posts, columns=["tweet"])
  df.to_csv("qualitypointtech_2025_posts.csv", index=False)
  print("Saved to qualitypointtech_2025_posts.csv")


I expected it to collect links of 2025 posts, but it collected all the posts starting from 2008.


No comments:

Search This Blog