I should probably put a massive disclaimer on this post noting that this is just a crackpot idea I came up with, I haven’t actually tested this (nor do I plan to) and there are probably a ton of real-world issues with this approach, both technical as well as legal. So, yeah – don’t try this at home.
A simplistic refresher on compression
(Editors note: as my very smart commenters have pointed out, actually *compressing* data using Twitter doesn’t make any sense. This post is about *encoding* data using Twitter. This part is just a refresh on the basic concepts, feel free to skip ahead if you already know this stuff.)
It’s been a couple of years since my computer science days, but I do remember the basics of an simple compression algorithm. First, remember that all digital files are made up of a series of 1’s and 0’s. Let’s take this pattern as an example:
0010 0100 0001 0000 0100 0001 001
The key to this compression algorithm is to find long repeating patterns, and use a lookup table to replace them with shorter ones. Let’s take another look at our example, with spaces inserted to highlight the patterns:
001 001 000001 000001 000001 001
Now, let’s make a quick lookup table:
Pattern | Replacement |
001 | A |
000001 | B |
We can now represent those bits as follows:
AABBBA
And there we go! Our 27 characters were replaced with 6 characters, and even adding in the extra space for the lookup table, it’s still significantly smaller.
Continue reading