Hackathon (NOUN, informal/’hakəˌTHän/): A competition in which teams use their programming, engineering and business skills to solve real-world problems within an insanely short window of time (usually around 24 hours). Think of it as a ‘hacking’ marathon. Sometimes there’s a specific challenge, and sometimes the challenge is simply to make the coolest thing possible (like our annual RV Hackathon).
Last weekend, two RV Data Scientists/Engineers participated in Hackathon CLT, the country’s largest Big Data Hackathon. This year’s challenge was presented by Second Harvest Food Bank, and the competition featured 3 categories: Freestyle (which required an ‘outside of the box’ solution and business proposal, but no code), Code (requires both a business plan AND functioning code – typically a slick mobile app or website), and Hack (geared towards using the data to solve the problem analytically and predictively to provide insight/tell a story).
Despite going a full 24 hrs without sleep and working within an impossible timeframe, Kevin Pedde and Jacob Foard took home the grand prize in the ‘Hack’ category. Here’s their breakdown of the event (including how they won):
Winning Hackathon CLT
Hackathon CLT is one of the only hackathons around that focuses on using Big Data to help solve a problem. As two of the first data scientists to join the team at RV, we were naturally really excited to be part of it. Plus, the event benefits a great cause: helping to eliminate hunger in the Carolinas.
This year’s challenge was to find the best way to help Second Harvest expand their largest distribution center in Charlotte, optimize their efforts and raise donations. Every team was given access to billions of rows of real-world data including monetary donations to Second Harvest, food donations, distribution of the food post-donation, and supplementary purchase data from Harris Teeter. The data was stored on a 5 node Hadoop cluster, provided by Data Chambers. A connection through Hive was available and we accessed the data through RStudio server and Python.
Our strategy was to create the most impactful solution that required the least amount of data and code. To us, that meant finding a way to directly impact the business’s bottom line in a significant way, which is why we focused on increasing monetary donations.
We used data from the American Community Survey to build look-alike models of top-donating ZIP codes, and we used those to recommend where Second Harvest should focus their marketing for the highest impact. We also built a time series model to forecast monthly donations Second Harvest would receive. Both models offer a better understanding of where their donation marketing will have the highest impact, as well as how they can better plan operational and marketing budgets for up to 12 months.
There was a TON of competition this year and a lot of other really cool products were developed overnight. For example, the first place winner of the Freestyle category developed a website that identifies the biggest areas of need within the Carolinas, and thus the best locations to build new Second Harvest distribution centers. The first place team in the Code category created a web application that’s a game: If you guess a sponsored logo correctly, you win a coupon for that product and the sponsor makes a donation to Second Harvest.
One of the biggest challenges our team faced during the competiton was actually just staying awake! We had no trouble accessing the data, brainstorming solutions and building the models – and we even finished before the full 24 hours was over! Once our coding was done it became increasingly difficult not to fall asleep – and then presenting our project to a room full of people who also were running on no sleep was a challenge.
Another challenge for me (Jacob) was working within a Hadoop environment. I’ve dipped my toes in it at Red Ventures, but hadn’t experienced anything quite at this level. The fun of Hackathons (for me at least) is in trying to cram projects that would typically take weeks to develop into 24 hours. You get a ton of experience within a really short timeframe through your own work, plus you’re watching how other people tackle similar problems and learning from them, too.
Any tips for future ‘hackers’ who might participate in similar events?
Jacob: First, if you’re thinking about participating in Hackathon CLT, make sure you’re familiar with working in a terminal shell! Everything is done remotely from the cluster, so there’s very little interface and a whole lot of terminal.
Second, even if you’re still in the early stages of learning to code, I highly recommend you experience a Hackathon. You learn so much so quickly from people around you, it’s really unlike anything you’ll ever experience in an office or in a classroom.
Kevin: I agree. Even if you aren’t 100% confident in your coding skills, I highly encourage everyone to compete. Not only can you win cash, prizes, respect (or maybe even a giant novelty check – see above), you’ll learn a lot.
It’s also a great way to network with local devs and gauge how your skills and knowledge measure up to others in your industry. I love being able to see where I have room to improve and also pick up tips and tricks from other data scientists.
Another thing that makes Hackathons fun for me is the chance to play on someone else’s playground. At work we’re really only exposed to the tools and data our Red Ventures has. Hackathons give you an opportunity to use and learn new systems and new software at a really fast pace.
Lastly: You will be overwhelmed and you will stay awake longer than you ever have in your life. Be prepared to make many, many sleep-deprived decisions that you have to be ok with at presentation time.