# Beer and Quarantine

Published:

## The premise

About a week ago, I got an email from my uncle Dennis:

Hi,

I seem to recall you doing some web scraping to do something useful. Do you have any advice as to how to do this? On this site I can see the beers in stock at my local store: https://belmont.craftbeercellar.com

Here, I can find ratings of beers: https://www.beeradvocate.com

I’d like to see the average Beer Advocate rating for beers available at Craft Beer Cellar. Even better if I can sort by style. Any suggestions on how to do that?

Dennis

This sounded like a fun project to me, so I decided to take it on and build him something.

## The process

The first helpful thing I found was this reddit post that linked to this repo. Unfortunately, that code doesn’t work anymore because BeerAdvocate has changed their website formatting, so the patterns it was matching didn’t work. But it was a super useful starting point since I’d never actually done any web scraping of this kind before (I didn’t tell my uncle that).

It didn’t take too long to get a working version of the BeerAdvocate query, and then I had to build one to scrape from the Beer Cellar site. That wasn’t hard to work out with some trial and error and patterning off the BeerAdvocate one. Soon, I was ready to try to loop through all the results!

## The pickle

(not the Python serialized kind)

After almost instantaneously getting banned from the Beer Cellar website, I quickly realized I needed to drop in some pauses to avoid triggering the websites’ protections against DDOS attacks and the like. This made the code take a bit longer to run, but solved the problem handily.

## The product

In the end, I got something together that I’m quite happy with. Some features include:

• In addition to collating the score and style (as well as ABV) from BeerAdvocate, one can customizably categorize the styles into larger clusters (e.g. English and American IPA both categorized as just “IPA”) for easier sorting
• If the search on BeerAdvocate yields multiple results, uses fuzzy string matching (via the Levenshtein distance as implemented in the fuzzywuzzy and python-Levenshtein packages) to choose the best match (if this happens, a remark will be put in the “note” column of output and you can check the included BeerAdvocate link to make sure it’s the right brew)
• Since running the script when ~800 beers are available takes ~40 minutes, optionally check previous output file and only query for beers that don’t have scores listed there, or only for beers that don’t have an entry there at all (this reduces runtime to ~10-15 minutes for a relatively recent output file).

Here’s a selection of the output file content: