memex

The memex is an archiving tool that offers a user interface similar to git's. It is what I built to archive my administrative files as well as old projects and full copies of code repositories.

Memex archives are composed of encrypted segments that can safely be synced among a group of machines and shared with mainstream storage providers such as rsync.net or Google drive. This way, personal files are shared, archived, and backed up without being revealed. Memex archives are not tied to a specific storage provider and are easy to relocate, should the need arise (e.g., lower costs or unreliable infrastructure).

The data layer of the memex tool is provided by sdar, a secure deduplicated archive management tool. Sdar is a generic content-addressed database that stores the data encrypted. Values in an sdar archive can be as large as multiple petabytes. Random access in archives and values is possible and efficient.

Getting started

Using memex is not just yet a walk in the park. I stopped honing it as soon it was sharp enough for my use. Nonetheless, if you want to give it a shot, the instructions below should help you get started. It is also noteworthy that most memex subcommands accept a -h flag that concisely explains their usage.

Start by getting the source code and compiling it; a C11 compiler and a bare Go installation should suffice to build everything. Then create an archive key and initialize a repository.

$ git clone --recurse-submodules git://c9x.me/artools.git
$ cd artools
$ make
$ sudo make install

$ sdar keygen
  choose passphrase:
  hmac: b5d5e50f68947a7878e7e8a873a4ccd8eb6640df8dbe40f781a2e11a88ca44a2
  pub: b9121faf343a457a4395b76ea9f1f949c00747abb10db18f7ea1aeef6f175801
  sec: c8177daa6dae223d052d5c719f3b8f212ae5640515991780c48caaf6987680fc

$ mkdir myarchive && cd myarchive/
$ memex init .repo
  initialized new archive in .repo

$ echo "archive=.repo" > .memex
$ echo "skip=/.repo" >> .memex

The sdar key is the key to all data archived by the memex tool. It is displayed in clear by the keygen and keydump sdar subcommands. I recommend that you back it up on paper if you plan to use memex for archival.

Once this initial setup is done, you can add data and commit it to the archive.

$ echo "hello world!" > hello.txt
$ dd if=/dev/urandom of=random.bin bs=1024 count=1024
$ memex diff
  + hello.txt
  + random.bin

$ memex commit -m 'a couple files'
  writing: 1.00MiB
  committed as seg/0d1b2ffd17b0cb3aec391bdaef81d454

$ memex log
  Addr: 06879cc0bcba9f617c5053aa5aecaf72bb92be76166b46e31508942609b47df81
  Date: 2022-04-14 12:46:04 +0200 CEST

      a couple files

The newly created segment file .repo/seg/0d1b2... is safe to share and back up on untrusted storage. Each new commit will create a new segment but, if need be, sdar can be used to fuse segments. It can be interesting to keep segments split in order to only transfer newly committed data when re-syncing a memex archive with remote storage.

In addition to segments, it is also recommended to back up the address of the head commit (here, 06879cc...81). However, the loss of this information is not critical as it can be recovered (inefficiently) from segments and the archive key.

To wrap the example up, we look at some ways to recover files from a commit.

$ echo "hello memex!" > hello.txt
$ memex diff
  ! hello.txt

$ memex get hello.txt
  hello world!

$ memex commit -m "world -> memex"
  writing: 1.00MiB
  committed as seg/7232d6a589ff91ebc97185341f6d0f7f

$ ls -l .repo/seg/7232d6a589ff91ebc97185341f6d0f7f
  -rw-r--r-- 1 user users 608 Apr 14 16:10 .repo/seg/7232d6a589ff91ebc97185341f6d0f7f

$ memex sync main^
$ cat hello.txt
  hello world!

$ memex sync main