Scraping with Django Backend

I like to use Django as a backend in my scraping scripts. I use it to model the data being scraped with the Django ORM. Then, once the data has been collected, I view the results using the automatic admin interface which is easy to setup in just a few lines of code.

In this post, I'll show how to set up a project template for scraping with Django.

First let's set up the environment. Create a directory for the template and start up a virtual environment in that directory.

$ mkdir django_scraping_template && cd django_scraping_template
$ virtualenv venv
$ source venv/bin/activate

Then install the dependencies (Django obviously being one of them) that you want to use for your scraping work:

$ pip install mechanize
$ pip install beautifulsoup4
$ pip install Django
$ pip freeze > requirements.txt

Next create the Django project for the settings and an app for modeling the data:

$ startproject scraper
$ cd scraper
$ python startapp custom_scraper

In the top level directory, create a file named We'll edit that it in a second.

Your directory layout at this point should look as follows:

└── scraper
    ├── scraper
    │   ├──
    │   ├──
    │   ├──
    │   └──
    └── custom_scraper
        ├── migrations
        │   └──

Edit the scraper/ file, and change the INSTALLED_APPS setting to include the string 'custom_scraper'. So it’ll look like this:


Edit custom_scraper/ and create your model definitions. Then run migrate to sync the models into the database.

$ python makemigrations
$ python migrate

Now in the top scraper directory edit the file:

#!/usr/bin/env python                                                                                                                                                                
import os
import sys
import django

sys.path.append(os.path.realpath(os.path.join(os.path.dirname(__file__), 'scraper/')))
sys.path.append(os.path.realpath(os.path.join(os.path.dirname(__file__), 'scraper/scraper/')))

os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

from django.core.exceptions import ObjectDoesNotExist
from custom_scraper.models import *

class CustomScraper(object):
    def __init__(self):
        self.url = ""

    def scrape(self):
        print 'scraping...'

if __name__ == '__main__':
    scraper = CustomScraper()

The model(s) you define in custom_scraper/ can now be used in this standalone script.

Now just make the script executable and you're all set.

$ chmod +x
$ ./

Shameless Plug

Have a scraping project you'd like done? I'm available for hire. Contact me with some details about your project and I'll give you a quote.