Revisiting Taleo with Puppeteer

I’ve demonstrated how to scrape Taleo sites in a couple of my previous posts [1] [2]. In those articles I used the CasperJS and Python/Selenium to scrape the Taleo job site at https://l3com.taleo.net. In this post, I’ll show how to scrape that same site again, this time using Puppeteer. Continue reading

Scraping with Puppeteer

In this post I’ll guide you through web scraping with Puppeteer, a Node library used to control Chrome (or Chromium) via the DevTools Protocol. I’ll also cover how to use Node’s built-in debugger so that you can step through the code to see how it works. Continue reading

Book Review: Web Scraping with Python

I just finished reading Web Scraping with Python by Richard Lawson; Packt Publishing. Continue reading

Iterating through Dynamic Select Options with Selenium

In this post I’ll use Selenium to show how to iterate through dropdown menus in a form that uses SELECT elements whose option values are dynamically generated. I’ll provide a technique that can be used to determine when the option values have loaded. I’ll wrap up the post by refactoring... Continue reading

Using Selenium to Scrape ASP.NET Pages with AJAX Pagination

In my last post I went over the nitty-gritty details of how to scrape an ASP.NET AJAX page using Python mechanize. Since mechanize can’t process Javascript, we had to understand the underlying data formats used when sending form submissions, parsing the server’s response, and how pagination is handled. In this... Continue reading