SocialLink/Hindsight:
Origin and Architecture

Why did you write these? Because Someone had to go make WinFS real.

Seriously though, Hindsight and SocialLink were designed to solve a few problems that I noticed kept showing up. Having put off trying to solve them for years, and finding myself with some free time, I decided to put a relatively significant dent in things. For example:

The solution to all of these is generally "make an account on that website, so you can get notified when that user posts". However, this tends to encourage spam - the users that post the most end up drowning out infrequent updates from other content creators.

Subscribing is not a guarantee that you'll receive notifications - a common solution on certain platforms to too many recent uploads is to delay actually notifying users of new content.

One solution, scraping, solves a few of these problems. You can guarantee that nothing gets deleted by downloading a copy of it when first viewing it. Lots of scrapers have been written for lots of platforms - however, most scrapers lack any kind of good frontend - they simply give you a folder of content for you to investigate later. Scrapers are tasked solely with downloading content, not with notifying you that there is new content, or presenting it in any kind of coherent manner. Most scrapers are used in conjunction with manually browsing sites - find content manually, then feed accounts into the scraper for it to fetch later. This is a marginal improvement overall, but introduces new issues - unless you're constantly checking up on the scrapers, errors will go undetected for long periods of time. They simply aren't a replacement for a good frontend.

SocialLink was first written as a frontend to some web scrapers I had previously written. After discovering the occasional data corruption bug as a result of not actually using the output of those scrapers, I wanted to find a way to ensure that the scrapers were both working correctly, and do so in a way that made the underlying sites have a better user experience.

Around the same time, I had a separate problem I wanted to solve - a way to view what you did on a given day, by scanning through metadata in things like chat logs and photos. Since these were ultimately 'content, displayed in chronological order, possibly grouped by person', these features had some overlap with parts of SocialLink. However, this posed a bit of a problem - one application simultaneously wants all your chat logs, address books, banking info, and more. From a security perspective, this is a terrible idea. To mitigate this, SocialLink and Hindsight use a weird architecture. They are both Ruby on Rails projects on top of PostgreSQL, and they both store a bunch of information and display them in their web UI. These two projects could have ultimately been one project (they use similar dependencies), but I figured that the sheer amount of information stored in the database plus the large amount of credentials these projects require would make it unlikely anyone would find this on GitHub and decide they want to use it. If I stumbled across a project that wanted access to:

I would immediately run the other way. While I generally trust the code that I write, I would be horrified if anyone else would.

To solve this, I decided to divide the code into three components. The first component is a runtime that scrapes the web (among other fun tasks), and writes to SocialLink and Hindsight's databases when it fetches content relevant to each of them. The second and third components, SocialLink and Hindsight, are effectively database viewers. They have no knowledge of where content came from - they only display the content placed inside their database tables. They have no access to any credentials.

Hindsight stores database columns of a personal nature - chat history, photos, calendar events, etc. SocialLink stores social networking data scraped from the internet. Both applications are designed to work independently, and can function without the other. However, there's some overlap - if you chat with friends and also follow them on social media, you might want to see the private conversations you had with them alongside what they were publicly sharing. Likewise, if you go back to a specific date, you might want to see the social networking posts everyone was talking about then, for context. This is why each application has the ability to reach into the other's database tables, if explicitly given access.

While auditing any codebase is a difficult task, breaking everything up into separate projects (with two effectively only having read-only data access) means you can more easily audit the components you actually want to use.