A Python Web Scraper and Database Updater

July 29, 2023

Today, I’m keeping my blog post simple. This morning, I wrote a Python script called that does the following:

  1. Scraps data from my blog
  2. Updates my Firebase Realtime Database

It’s similar to code I’ve written in the past but not identical to anything. So today, I’ll break down every function within the script and explain:

  1. What the function does
  2. Why the function exists

Let’s do it…

initialize_app

This function gets permission to use my Firebase Realtime Database. It’s crucial because, without it, I wouldn’t be able to pull data from my database or update it.

Python code to access Firebase Realtime Database

get_last_item

This looks at every item index in my database and returns the most recent item I’ve added. I need this for the highest_id_num and recent_date functions.

Python code to get last item in database

highest_id_num

This uses the dictionary returned from get_last_item() and returns the value from the ID key. I need this so that I know what index to start at when I add data to the database.

Python code to find highest id num from database

recent_date

This also uses the dictionary returned from get_last_item() and returns the value from the date key. This date value is useful because when I’m scraping data from my blog, the recent_date() function tells me where to start looking.

Python code to find recent date from database

is_date_greater

This function takes two dates as arguments and compares the two. Then it returns True or False based on whether or not the first date is greater than the second one.

Python code to compare dates

update_url

This function takes the URL for my blog posts page and shortens it. I need this because when I scrap data for each blog post, one div block gives the path and page for the post’s link. Using that info with this function allows me to build the link that leads to a post.

python code to edit a URL

scraper

This function scraps the data from my blog posts page and returns a dictionary with all items that have dates greater than the most recent date. Why do I need this function?

It’s obvious…

So that I have data to add to my database.

python code to scrap my blog

update_firebase

This is perhaps the most important function in this script. It uses a combination of all previous functions to update my Firebase Realtime Database and insert the values I want to add.

python code to update my database