0 Pages Crawled

Last week I began studying as a full-time student in Metis’ data science bootcamp. Following a review in exploratory data analysis, I worked throughout the week with my new classmates on our first project, which analyzed MTA turnstile data. I had learned web scraping with scrapy before the course, so when we determining our individual roles, I excitedly told my two group members (Gabe and Jake) that scraping our data from the web would be “no problem at all.” Soon, this task would remind me of a very valuable coding lesson.

I once had a hitting coach who before and after every lesson would tell his players, “The ball is an excellent teacher.” While a baseball’s trajectory is 100% dependent on how it made contact with the bat, the success of code can also be observed instantaneously by how the computer responds according to the instructions. When those instructions are conflicting or insufficient, we get error messages. Addressing some errors is hard, while addressing others is easy. Like most people who write python code, I often resort to StackOverflow when an error message proves frustrating to see if anybody has had a similar issue. Too often though, I find that I had resorted to StackOverflow too quickly in realizing that the cause of the error was right in front of me all along.

That is exactly what I did when the spider I built to scrape the MTA website consistently crawled 0 pages. After repeated troubleshooting and a little bit of moaning, I became tempted to blame the computer and proclaim the error to be a complete and utter mistake. I tried editing my settings so that my spider would allow certain error messages. When that didn’t work, I thought I needed a proxy to scrape the data without getting blocked. (I wasn’t getting blocked at after all). The longer I stared at my code, the more right it seemed, at which point I begrudgingly allowed myself to go to sleep.

The following day I told my group members all about the pesty error message, and while explaining the issue to them, I found a mistake in my code that I hadn’t seen myself after hours of pointless staring. For each error, there’s at least one solution (and often many more than one). This particular solution was simple, and right in front of me all along. It was a great reminder to be patient with my progress, and to come back to an issue later if I can’t find a solution.