Hacker Wisdom

Today I’m going to share a little project I’ve been working on called Hacker Wisdom. It’s a web application built using Java, Apache Maven, Spring Boot, Thymeleaf, and the Hacker News API.

I visit the Hacker News website fairly regularly. It has a lot of great information and interesting discussions. One section I particularly enjoy is called “Ask HN”. People ask for advice on all sorts of topics. There are usually nuggets of wisdom and lively debate that follows in the comments.

I often find myself reading through these comments and saving off interesting looking links to other websites. There is a lot of collective wisdom and experience to be found there.

I felt like it would fit the spirit of Hacker News to automate that process and create a spin-off website that collects and displays some of the day’s hacker wisdom.

You can see the result by following this link, which also has a link to the source code on Github.

The rest of this post will be dedicated to describing how I built it.

Setting up the Project

I used IntelliJ for my IDE, and that tool makes it really easy to set up a Maven/Spring Boot project.

Spring Boot and its associated addons made it really easy to add Controllers to handle web requests, Services to handle the logic, and Thymeleaf templates to handle the front end.

Each controller method maps to a Thymeleaf template. The about page is static, but the home page is populated with dynamic data from the “Ask HN” section of Hacker News.

Retrieving the Links

I used to two API endpoints to retrieve the links:

https://hacker-news.firebaseio.com/v0/askstories.json?print=pretty
https://hacker-news.firebaseio.com/v0/item/{id}.json?print=pretty

The first endpoint gets me a list of ids, in the form of integers, of the top “Ask HN” posts. The second endpoint will retrieve any item, whether a post or a comment, by id. With the help of a recursive function, these two endpoints allowed me to drill down into the comments of these “Ask HN” posts.

Once I had the comments I had to look for the links contained in their text. I know that using a regex to parse HTML is generally a bad idea, but I was able to make it work. I didn’t feel compelled to pull in JSoup as a dependency once I had it working consistently.

After retrieving the links I removed duplicates and sorted them for a better user experience. I created a custom, case insensitive link sorter that ignores the http(s) prefixes.

Improving Responsiveness

Unfortunately, while my solution up until this point was accurate, it was not very fast. It takes a lot of API calls to retrieve all the necessary data, and those trips add up. The time can also vary depending on how popular the question is. Without any optimizations, this was taking an unacceptably long time.

My solution to this problem was to implement a caching strategy using Spring’s caching capabilities. The application stores the data in a cache with a name of “answeredQuestions” after retrieving it for the first time.

Meanwhile, in the background, a scheduled job will reperform the data parsing and update the cache every 10 minutes. I used Spring to implement the scheduling as well.

After the first request succeeds, subsequent requests bypass all of the calculations and can simply grab the stored result from the cache.

Subscribe to my blog or follow me on Twitter if you’re interested in seeing more projects like these.