SocialLink/Hindsight:
Origin and Architecture

Why did you write these? Because Someone had to go make WinFS real.

Seriously though, Hindsight and SocialLink were designed to solve a few problems that I noticed kept showing up. Having put off trying to solve them for years, and finding myself with some free time, I decided to put a relatively significant dent in things. For example:

I'd bookmark profiles on certain social networks, but I'd forget to come back to them for years - finding an artist worth bookmarking is great, but bookmarks don't bring you back to view their new posts. Problems with bookmarks are numerous and well documented.
Most social networks do not support things like RSS, and the days of RSS are mostly over in favour of walled gardens.
I'd find myself bookmarking the same profile multiple times, since URL formats had changed, each time thinking that I'd stumbled across a new artist. Modern browsers can't determine that unique links all ultimately point to the same content.
Link rot is a big problem - most bookmarks go nowhere within a few years, or redirect to a front page that doesn't retain any connection to the original. Trying to figure out how important a bookmark was to you after the content has already been deleted is quite difficult.

The solution to all of these is generally "make an account on that website, so you can get notified when that user posts". However, this tends to encourage spam - the users that post the most end up drowning out infrequent updates from other content creators.

Subscribing is not a guarantee that you'll receive notifications - a common solution on certain platforms to too many recent uploads is to delay actually notifying users of new content.

Creators ask you to hit the bell icon on YouTube because YouTube no longer guarantees that subscribing equals notifying.
Facebook also doesn't guarantee that a post is displayed to every follower. Since Facebook's home page isn't guaranteed to be chronological and there's no mechanism to see every post since you last visited, there's no way to ensure you saw everything without manually navigating to every person's profile page.
Creators often cross-post to different platforms, but will occasionally upload content exclusively to one site. This causes users who subscribe on different platforms to see most things repeatedly. Unless you like/upvote the content on every platform in response, you're effectively penalizing creators.
Platforms do not necessarily support the things people choose to use them for. Many artists now upload art to Twitter, but Twitter has no real mechanism to handle this. You cannot easily browse user uploads unless you manually scroll photo by photo, or filter "images they drew" from "photos of food".
Platforms come and go, and people don't necessarily re-upload everything elsewhere. Certain platforms have decided to evict large portions of their userbase with sudden changes (Tumblr, Flickr, etc), or shut down entirely (Vine, Periscope, Geocities, digg, etc). Platforms can introduce features or change core behaviour with no notice, which can quickly ruin people's experiences on subsequent visits. If you bookmarked a page and you now have to be registered to view it, it sucks to be you.
Not all content is worth exploring at the same pace. Chapters of a book or pages of a comic might be better read all at once, but there's no way to easily halt notifications from that specific account for a longer period.
Many platforms have terrible mechanisms for going back through archives. Facebook has an excellent tool to filter by year, but the best you can do on Instagram is manually scroll back one row at a time.

One solution, scraping, solves a few of these problems. You can guarantee that nothing gets deleted by downloading a copy of it when first viewing it. Lots of scrapers have been written for lots of platforms - however, most scrapers lack any kind of good frontend - they simply give you a folder of content for you to investigate later. Scrapers are tasked solely with downloading content, not with notifying you that there is new content, or presenting it in any kind of coherent manner. Most scrapers are used in conjunction with manually browsing sites - find content manually, then feed accounts into the scraper for it to fetch later. This is a marginal improvement overall, but introduces new issues - unless you're constantly checking up on the scrapers, errors will go undetected for long periods of time. They simply aren't a replacement for a good frontend.

SocialLink was first written as a frontend to some web scrapers I had previously written. After discovering the occasional data corruption bug as a result of not actually using the output of those scrapers, I wanted to find a way to ensure that the scrapers were both working correctly, and do so in a way that made the underlying sites have a better user experience.

Around the same time, I had a separate problem I wanted to solve - a way to view what you did on a given day, by scanning through metadata in things like chat logs and photos. Since these were ultimately 'content, displayed in chronological order, possibly grouped by person', these features had some overlap with parts of SocialLink. However, this posed a bit of a problem - one application simultaneously wants all your chat logs, address books, banking info, and more. From a security perspective, this is a terrible idea. To mitigate this, SocialLink and Hindsight use a weird architecture. They are both Ruby on Rails projects on top of PostgreSQL, and they both store a bunch of information and display them in their web UI. These two projects could have ultimately been one project (they use similar dependencies), but I figured that the sheer amount of information stored in the database plus the large amount of credentials these projects require would make it unlikely anyone would find this on GitHub and decide they want to use it. If I stumbled across a project that wanted access to:

Every place I've ever been on every date
Address book and calendar accounts
Email and chat credentials
Social network credentials
Raw disk access to all photos / Adobe Lightroom catalogue
Netflix account info
Bank statements

I would immediately run the other way. While I generally trust the code that I write, I would be horrified if anyone else would.

To solve this, I decided to divide the code into three components. The first component is a runtime that scrapes the web (among other fun tasks), and writes to SocialLink and Hindsight's databases when it fetches content relevant to each of them. The second and third components, SocialLink and Hindsight, are effectively database viewers. They have no knowledge of where content came from - they only display the content placed inside their database tables. They have no access to any credentials.

Hindsight stores database columns of a personal nature - chat history, photos, calendar events, etc. SocialLink stores social networking data scraped from the internet. Both applications are designed to work independently, and can function without the other. However, there's some overlap - if you chat with friends and also follow them on social media, you might want to see the private conversations you had with them alongside what they were publicly sharing. Likewise, if you go back to a specific date, you might want to see the social networking posts everyone was talking about then, for context. This is why each application has the ability to reach into the other's database tables, if explicitly given access.

While auditing any codebase is a difficult task, breaking everything up into separate projects (with two effectively only having read-only data access) means you can more easily audit the components you actually want to use.

SocialLink/Hindsight:Origin and Architecture

SocialLink/Hindsight:
Origin and Architecture