Masters Student at the University of Colorado

Craft Brew Story

A Craft Brew Story

In the previous year I just turned 21 and found out that I love the crisp taste of a craft beer straight from the brewery itself for this reason I wanted to analyze where the best place would be to in the United States of America in which I could get the best craft beer. Using the Craft Beers Dataset that I found on Kaggle that analyzes a list of 2,410 US craft beers and 510 breweries. The original dataset was split up into two comma separated value lists of Beers and Breweries, my intention was to go through these original datasets and combine them and remove the irrelevant columns.

The first step in the cleaning of the datasets was to go through and load both datasets into a python notebook that you have the pandas library open. By using the “pd.read_csv” function that is built into the pandas library you can pull the comma separated value (CSV) into the python notebook. The first dataset in which I pulled into the notebook was the breweries dataset in which had 558 rows and 4 columns (pictured below). 

The second dataset which consisted of all the beers in which these breweries make, this dataset had 2410 rows and 7 columns (pictured below).

Both of the datasets that are shown above are vital for me to be able to find out where the most breweries are and which of these breweries is producing the most amount of beers. The Brewery dataset depicts what state and city the breweries are located in as well as the name of the brewery. The beers dataset shows the alcohol content, beer name and the style of what the beer is. By combining the datasets I was able to find some rich analysis showing me where I should go to for craft beer.

The problem I faced here was how would I go through and analyze two datasets which were not together and giving me any valuable data. At this point I decided that I should join the two datasets together using the merge function that is built in to the pandas library. During this merge I saw that the two datasets both had a column that stated brewery_id, this created an opportunity for me to join the two together on the brewery_id column. At this point I laid out the foundation of the merge to have the left side of the join be the brewery dataset and the right side to be the beer dataset merging both on the brewery_id on the breweries dataset. The reason for joining the two datasets on the breweries_id category was so that I wouldn’t lose any of the information on the location of the breweries.

0_fk1jk4jqDDfMCVY9.png

The next road bump in which I found while reading through our new dataset was there was an excess of unnecessary data, now that I had the two datasets joined on the brewery_id I had no need for the numbers that were associated with the breweries since I had the name of the individual breweries. Along with the brewery_id i had no need for the international bitterness units scale, id, and ounces, at this point I dropped these columns so I had an easier to read dataset without clutter. During this time I took the time to change the Alcohol by Volume column which was in decimal places making it hard to read to the nearest whole number by multiplying the numbers listed by 100.

With the dataset all cleaned up I was now able to start my analysis on which state in the United States would be the best place to grab a nice cold craft beer. The first analysis which I did was running a groupby analysis on the style of beer, this allowed me to group the beers by the type to see what the alcohol percentage of each individual style were.

From the analysis we can see that English Barleywine has the highest alcohol by volume percentage of a 10.766667 % followed shortly by the Quadrupel (Quad) with a percentage of 10.4 %. From this analysis I wanted to be able to see which state had the beer with the highest alcohol by volume, I found that in Colorado we have a beer that has a 12.8% alcohol by volume content.

Since the main reason for this data analysis was to find out where to go in the United States for the most craft breweries, I wanted to analyze which state had the most breweries. By combining the count of how many breweries and the state that was listed in the dataset I was able to create a graph that clearly shows the average number of Breweries per state and how many breweries there were in each. After plugging this data into python and the Matplot function I wanted to get a better looking graph so I went through the dataset and put my data into tableau and created a better looking graph.

0_NJEoKT0b8Bmbs9uX.png
0_9RovhhIdgwhoURc8.png

The next analysis that I wanted to do was to be able to find out what style of beer was the most brewed, and what people are most craving. I did this by grouping the style and beer name together and then analyzing it by the count of how many of each style there was. I found some very insightful data that the most commonly brewed craft beer was the American IPA. The American IPA had over 400 different beers out on the market whereas the second closest which was the American Pale Ale only had around 250 different craft beers on the market. This shows that the most commonly made beer was the American IPA by a sizable margin, showing that the most commonly drank craft beer is the American IPA.

The next analysis which I performed was a count on how many different beers each brewery makes, this analysis would go along with how many different beers were brewed in each state. I did this analysis by conducting a value_count analysis on the brewery name column. From this data analysis I was able to see that Brewery Vivant in Michigan accounted for around 38% of all the craft beers produced in Michigan. We can also see that Oskar Blues Brewery in Lyons, Colorado accounts for around 17% of all craft beers in Colorado.

0_Jo09PIIPHr8nd57H.png

Finally, I wanted to be able to visualize the amount of breweries there were in each individual state in a way that was easy to understand for someone with no data analysis background. I did this by grouping together the state values and a count of how many beers were in each state. By placing this data into Tableau I was able to go through and create a heatmap which showed the amount of breweries in each state.

After the analysis that I performed from the datasets that where provided I can see that I really do not need to travel that far to find the best craft beer since Colorado has the most breweries. I do believe that traveling to California or Minnesota could be a fun trip because they both still have over a hundred craft breweries allowing for a wide variety of beers and styles.