Scraping with Puppeteer

In this post I’ll guide you through web scraping with Puppeteer, a Node library used to control Chrome (or Chromium) via the DevTools Protocol. I’ll also cover how to use Node’s built-in debugger so that you can step through the code to see how it works. Continue reading

Book Review: Web Scraping with Python

I just finished reading Web Scraping with Python by Richard Lawson; Packt Publishing. Continue reading

Iterating through Dynamic Select Options with Selenium

In this post I’ll use Selenium to show how to iterate through dropdown menus in a form that uses SELECT elements whose option values are dynamically generated. I’ll provide a technique that can be used to determine when the option values have loaded. I’ll wrap up the post by refactoring... Continue reading

Using Selenium to Scrape ASP.NET Pages with AJAX Pagination

In my last post I went over the nitty-gritty details of how to scrape an ASP.NET AJAX page using Python mechanize. Since mechanize can’t process Javascript, we had to understand the underlying data formats used when sending form submissions, parsing the server’s response, and how pagination is handled. In this... Continue reading

Scraping ASP.NET Pages with AJAX Pagination

In a previous post I showed how to scrape a page that uses AJAX to return results dynamically. In that example, the results were easy to parse (XML) and the pagination scheme was straightforward (page number in the AJAX query JSON). In this post, I’ll show a more complicated example... Continue reading