Scraping with Django Backend

I like to use Django as a backend in my scraping scripts. I use it to model the data being scraped with the Django ORM. Then, once the data has been collected, I view the results using the automatic admin interface which is easy to setup in just a few... Continue reading

Scraping with Python Selenium and PhantomJS

UPDATE 09/27/2018 - The site changed after this article was originally written. I’ve updated the code that waits for the jobs to load, along with the description in this article. Continue reading

Scraping by Example - Handling JSON data

Today’s post will cover scraping sites where the pages are dynamically generated from JSON data. Compared to static pages, scraping pages rendered from JSON is often easier: simply load the JSON string and iterate through each object, extracting the relevent key/value pairs as you go. Continue reading

Scraping by Example - Iterating through Select Items With Mechanize

A common scraping task is to get all of the results returned for every option in a select menu on a given form. Continue reading

Form Handling With Mechanize And Beautifulsoup

Python Mechanize is a module that provides an API for programmatically browsing web pages and manipulating HTML forms. BeautifulSoup is a library for parsing and extracting data from HTML. Together they form a powerful combination of tools for web scraping. Continue reading