Here's how I crafted a real estate agent for myself with Python and CALLR SMS API
I live in Paris, France, and yes, it is beautiful. The bistrots, the bread, the cheese and the wine, the apéros... All of these are real. But Parisians, either native or adopted, have a love-hate relationship with their city. More often than not, the level of difficulty involved in finding an apartment plays a tremendous role in this.
More applicants than available places, picky landlords, and dealing with real-estate agencies, are some of the issues apartment-hunters face. But you could always go the rental by owners’ route; no middleman, and room for improvisation if your application is a little less than ideal, but you made a nice first impression nonetheless. Some online platforms connect owners and applicants, and PAP.fr is one of the best.
When using one of these sites, the critical success factor is time. The faster you spot a listing and get in touch with the owner, the higher your chances are of getting it. But more often than not, before you can even go to your email alert and follow the link you’ve been sent to the listing, other applicants are already on it, and before you know it, the owner’s voicemail is full.
Here’s the trick to nail it — leave a thumbnail open on the search result page on PAP.fr, and refresh it every 15 minutes or so. But you won’t, because, just like me, you’re at work and you’re busy all day, right?
Well… I ended up building my own bot to assist me in my flat search and, lo and behold, here I am now in my new Montmartre apartment. So in this post, I’ll teach you how to use Python, Google Spreadsheet and CALLR to find an apartment in Paris.
You can check out the Github repository to clone or follow along with the final code.
I assume you have minimal code literacy — enough at least to open the code and run it after changing a line or two to fit your own need. The code isn’t hard for a Pythonista, but it is not trivial either. You’ll need Python 3.5 to get started. I strongly suggest you set up a virtual environment before you write a single line of code. Installing both is outside the scope of this tutorial, but the provided links should get you going.
We’ll build a bot that will automate fetching the data you’d read from a search result page of apartment listings. Python’s terseness and readability lends itself perfectly for it. It will use Google Spreadsheet as a simple data table to store the listings, and CALLR API to send us the newest ones, in concise and timely text messages. SMS are still the safest option when you need to be alerted and take action right away, and this use-case is a perfect example. I know I always check out my texts — but emails? — not so much.
Our bot will be triggered every 15 minutes by a scheduled task manager (AT or Scheduled Task on Windows, cron job on Linux and MacOS) from our machine. That way we can still work during the day while it does most of the search for us, running in the background.
Part 1 – Setting up your search bot
Install the dependencies as needed:
We install requests to facilitate fetching the HTML pages via HTTP, beautifulsoup4 to extract data from the markup, and pygsheets to ease our control of a remote Google spreadsheet via code.
Create and access the spreadsheet
The Google API requires creating a service account to use it the way we want.
- Head to the Google API console.
- Create a new project.
- Click Enable API, then use the search function to find the Google Drive and Google Sheets API, and enable them both.
- Create credentials for a new Web Server accessing Application Data.
- Name your service and give it a role — Editor or Owner is fine.
- Set Key Type to JSON and click Continue to download the file.
Rename it credentials.json, open it, and find the client_email entry. Copy the value. Now go create a new spreadsheet, and once on it, click the “Share” button. Paste the email you’ve copied, so that your project is now able to read and write in this file.
Copy the URL of your spreadsheet from the browser address bar and keep it handy, we’ll use it in a minute. Now we’re ready to start coding.
Want to add SMS capabilities to your bot?Check out our API
Part 2 – Scraping HTML content
Go to PAP.fr and make your search, like you’d normally do. Copy the URL of the result page.
Now create a bot.py file where you stored your credentials file and fill it with the following code:
Let’s break this down.
Setting up the scene
You start by importing the necessary modules.
It will import Beautiful Soup and rename it Bs in our code, so we don’t have to write its full name because we’re lazy.
Replace the SEARCH_PAGE and SPREADSHEET_URL with the values you’ve copied earlier. I personally like to store personal data like the spreadsheet URL into an environment variable.
The URL_DOMAIN is used later, when we rebuild full pages from chunks of data we’ll scrape from the HTML. Leave it as it is now, it will make more sense soon.
The next batch of constants are actually CSS selectors. You can find them in the markup of some of the page you (and subsequently, your bot) will visit when browsing through the listings. We use them here to target areas of interest in the pages for your bot during its crawl — blocks of relevant data, UI elements to use to go to the next page or open the details of a listing, etc.
Then we wrap in a try / except block the interesting part.
It connects to the Google API using the credentials file. It accesses the spreadsheet via its URL, for upcoming use. It then fetches the HTML from our search result page, and stores it in a variable, dom. I named it like that because —surprise, surprise— it’s now an object with several properties related to extracting DOM-related data. In layman’s terms, this variable allows us to dissect the content of the HTML page it was made from, get rid of the code and pour out raw data.
Add the following snippet below the dom variable.
We’re leveraging list concatenation and list comprehension to create a variable, links, that will gather the URLs of all results pages. We could have broken down the above code like so:
Scraping the results pages
Now that we have gathered all links for the results of our search, we want to do two things:
- follow them and discover the listings they display,
- then enter each listing to read and report back their details.
We will create two dedicated functions for this — process_listings_page, and process_listing. In your file, add the following snippet after the imports and variables declarations, and before the try / catch block you’ve just written. We will then examine the code in detail.
The process_listings_page take as argument a string, which is the URL of a HTML page. If you remember what we did with the previous snippet, you should know that the URLs we will be passing are those of our search results pages.
So for each of our results page, we try to do the following…
- We send the bot to visit this page and capture the HTML content for us to consume:
- We find all “details” button on the page — each linking to a listing’s full content — and gather them in a new list, details_urls.
This is what this seemingly opaque snippet does:
Let’s unfold this list comprehension to fully understand the process, and rewrite it like so:
- The result of another list comprehension is returned by this function:
Elegantly concise. Also, you may have noticed that we’re making use of the process_listing function, that we’ve not yet implemented. We’ll do so in just a moment — but again, it’s useful to take the time to unfold this snippet just for our own understanding of what it does. We could rewrite it as such:
Scraping the actual listings
Now let’s write the process_listing function, and three companion “utility” functions, clean_markup, clean_spaces, and clean_special_chars.
I’d like to bring your attention on why we want to clean special characters such as ² or the € sign. These special characters will forcibly change the encoding of your text messages, thus possibly turning your single text message into a multi-part one.
Finally, some of the DOM-related snippets look idiosyncratic and convoluted. That’s a downside of scraping, and apart from the Python code itself which depends on your level of code literacy, the best way to approach it is to look at the code, and look at the HTML source of one of your result pages at the same time to figure out what type of manipulation the bot is trying to do.
Storing the results in a spreadsheet
Let’s review the full code within our original try / catch block. We’ll take the opportunity to add the snippet that’s responsible for saving our results into a spreadsheet. Review the code and its comments carefully.
Run the bot once, and after a few seconds, you should see some nice results starting to fill rows inside your spreadsheet. But what if you run it again? Oh noes!… we’re piling up redundant data. Let’s add a quick and dirty patch to ensure we only store unique results each time.
Now run your script again… and again a few minutes later — if new listing have appeared, they will stack up before your eyes in the spreadsheet!
Launching the bot automatically with a task scheduler
On Windows, you can now use your task scheduler to run the script every 15 minutes or so. On a Unix machine (including MacOS), you’ll need a cron job for this.
If you have used a virtual environment for the development of the bot, as advised earlier, it implies that the Python binary bundled in this virtualenv instance is the one you must use to launch the bot. You can not just run python bot.by out of the blue — you will need to pop a Terminal window open, enable the virtual environment, and run the script from here. Or more conveniently, you may find the location of your the Python binary used in your virtualenv, and use its fully qualified path when running the command.
To do so, enable your virtualenv and enter the command which python in your console. The returned value is the fully qualified path to your binary.
Mine says/Users/davy.braun/.virtualenvs/househunterbot/bin/python. We also want the full path to the bot script, so that the machine can invoke it right away. Mine is /Users/davy.braun/Code/projects/house-hunter-bot/bot.py. Keep both values handy, we’re using them right away!
Run the following command: env EDITOR=nano crontab -e. This will open the scheduled job list with the built-in nano editor.
Now enter the following line — and edit it accordingly with your own values for the binary and the file location:
This adds a cron job to the jobs list, basically saying “every 15 minutes, run the bot script located at that exact location, using no other than the Python binary located there”. Press Ctrl+X and confirm to save and exit.
Now be smart — when you’re done with your research, just be civil and remove this cron job! Don’t leave a zombie bot at large… It’s your responsibility to turn it off by the end of the day (or not use an automatically scheduled task at all!).
Part 3 – Texting back your results
Our bot does the redundant part of the search for us. This is all fine and dandy, but so far there’s no added value to subscribing to an email alert on the site. So let’s add a functionality that send us text messages from now on, each time it finds a new result.
We want to receive, for each new listing, a text message with a summary of the apartment specifications, the price, the location, the nearest metro stations, and the URL to this listing to look it up right away. Now these are looooong URLs. That’s bad. So we’ll shorten them on the fly thanks to the Google Shortener API.
Setting up the CALLR API and Google Shortener API
Sign up for a CALLR account. You can now use your credentials to fiddle with your API, which we’ll do in a minute.
Now, go to your Google Developer console. Enable the Shortener API…
…then create a Google API key.
Because I’m sharing my code, I’ll be storing my CALLR credentials and Google Shortener API key as environment variables, before declaring them in my code like so:
If you’re using a virtual environment, you must declare your environment variable in a specific place: a file defined you can open with a text editor, at the following location: $VIRTUAL_ENV/bin/postactivate. Deactivate then, reactivate virtualenv, and you’ll have access to the environment variable, scoped within your virtual environment.
Import both libraries in your code…
And we’re good to go. Let’s dive back to the main try / catch block that holds the meat of our program.
Send text messages
Here’s the code, with the added part to send SMS alerts.
You’ve noticed we have created objects to deal with the CALLR and shortener APIs. We’re also invoking a send_data_via_sms function when a new listing stacks up, so let’s implement this function.
We’re accessing the listing data passed as argument, and format some of its field into the resulting message we expect to receive. The URL is passed the shortener.short function which — well, you’ve guessed it — shortens it.
Then we leverage the CALLR API to send the text. That’s just it — a one-liner. Notice the third parameter we’re passing to the api.call function. It’s an E.164-formatted phone number — and that, obviously, should be your own.
Now let’s run the bot once more to verify everything is working properly. To ensure we can capture a few new listings to trigger the SMS alerts, remove two or three lines from your Google spreadsheet. After running python bot.py, the bot will then fetch them as if they were new, and your phone should start vibrating!
Building this bot, we have covered a few interesting topics — list manipulations in Python, scraping data on a website, using a Google spreadsheet as an ad-hoc database… But I find the easiest and most interesting part was triggering the SMS alerts. Take a minute and think of all the IoT, or bot-related hacks you can build now that you have added Good Ol’ Telephone to your utility belt!
Every communication channel has its pros and cons. Hacking our own HouseHunterBot has been an interesting project to get our feet wet, especially on a use-case where quick reaction was paramount — hence the SMS alerts. If you want to talk about bots, life automation hacks, telecommunication or apéros in Paris, I’m @davypeterbraun on Twitter.
Want to add SMS capabilities to your bot?Check out our API