Setting up Matrix homeserver and its media files storage on Tardigrade

This article introduces you how to set up your #Matrix instance (called as homeserver) with an Ansible playbook. It also describes steps to connect the instance to #Tardigrade, decentralized and client-side encrypted object storage service, which is compatible with Amazon S3.

1. Introduction

What is Matrix?

Matrix is an open lightweight protocol and network for secure, decentralized real-time communication, maintained by the non-profit Matrix.org Foundation. Matrix provides fully decentralized conversations with no single points of control or failure, and also supports the end-to-end encryption (#E2EE) by default.

https://element.io/blog/element-home/ says:

As you know, Element is completely different to most messaging apps.

Because Element is decentralised, the app itself (what you see) is separate from the Matrix hosting service behind it (what you don’t see; the movement of messages and where they are stored).

That’s important because it lets users decide where their messages and data are kept. In owning that choice, you also get to own your data and messages rather than having them sucked up into the likes of Facebook Messenger, Signal, Telegram or WhatsApp.

Matrix resembles the mail system. While a certain mail server could be down due to a maintenance or a failure, the mail system as a whole cannot be prevented from working because of it. In the same way, Matrix keeps working and you can continue to communicate with each other in real-time, no matter what happens to a certain server of the Matrix network, as long as the homeservers of you and your friend are up and alive.

To install Matrix homeserver by yourself, this configuration script (called playbook) for Ansible, a popular configuration management system, is arguably one of the best ways:

This playbook sets up and updates a Matrix homeserver for you, along with various services, such as Element web client and Bridges (one of the core concepts of Matrix; connecting Matrix homeserver to other services like Signal, Facebook, Discord, LINE messenger, etc).

Edit: recently (on 26/3/2021) this project was mentioned on the official blog post of Matrix: https://matrix.org/blog/2021/03/26/this-week-in-matrix-2021-03-26#matrix-docker-ansible-deploy

What is Tardigrade?

Tardigrade, developed and supported by Storj Labs, is a decentralized and client-side encrypted cloud storage service, which is compatible with Amazon S3 storage.

According this page at the official website, Tardigrade encrypts files locally and splits them into pieces, and then distribute the pieces across the global network of Storage Nodes. On Tardigrade nobody except you (i.e. those who have an access to the encryption key) practically cannot read your data, and there is neither a single point of failures nor a target by a malicious actor. In this recent article I have argued those characteristics as advantages over other Amazon S3 compatible storage services.

I have adopted Tardigrade for another project since the spring of the last year, and I have not experienced an unexpected downtime until now. The stability, in addition to speed and privacy, is one of the main reasons I decided to enable Tardigrade for my Matrix server's media files storage too.

2. Setting up applications for Tardigrade

The Ansible playbook mentioned above provides an option to install the application called Goofys, which basically mounts a bucket of Amazon S3 compatible storage as a local filesystem (in a POSIX-like way). Since Tardigrade is S3 compatible, you can connect the Matrix homeserver to the cloud storage via Goofys, so that the homeserver can take advantage of it as a media file storage.

Note: If you do not manage your Matrix homeserver, you cannot decide where to store the media files. In this case, you'll need to ask the administrator of your homeserver to adopt Tardigrade as the storage.

Before configuring the playbook, first let's set up the applications required to use Tardigrade.

The main object of this chapter is to get the Tardigrade filesystem's keys (access and private ones) and the gateway's endpoint so that you can specify them in the Ansible playbook. By setting an access key, secret key, and custom endpoint on the script, running it will automatically take care of the filesystem for you.

Meanwhile, as Tardigrade encrypts files locally before sending them to the storage nodes, the procedure to obtain those information is more complex than Amazon S3 and other S3 compatible services. From the perspective of UX this should be one of the areas which need improvement.

Based on the documentation, it can be summarized as follow:

  1. Create Access Grant on Dashboard from a browser
  2. Import Access Grant to Uplink CLI (uplink)
  3. Create a bucket with uplink
  4. Obtain a restricted access code to the bucket
  5. Run gateway on your server to get S3-compatible access key, secret key, and custom endpoint

On Tardigrade permissions (read, write, list, and delete) for a bucket are managed with an Access Grant, instead of Policy.

Per the documentation available at https://documentation.tardigrade.io/storage/considerations#basic-tools-for-tardigrades-decentralized-cloud-object-storage,

Below we are going to set up Uplink CLI and Gateway, along with obtaining two Access Grants (one for accessing to Tardigrade with Uplink CLI and the other for permitting Goofys to access to a specific bucket created with Uplink CLI) and importing the latter to Gateway, so that it provides the endpoint to which Goofys can connect.

First, let's download and install Uplink CLI and Gateway. You can download those applications following these documentations:

While Gateway should be downloaded to your server unless you have a specific need, Uplink CLI can be downloaded to the server or the computer to manage the server.

It should be noted that another Gateway software (called Gateway MT) hosted by Tardigrade is being tested: https://documentation.tardigrade.io/getting-started/beta-gateway-mt. On Gateway MT Tardigrade provides the endpoint for you, and you will not have to host Gateway by yourself.

Create an account at Tardigrade

Now that you have downloaded and installed the required software, let's create an account at Tardigrade if you do not have one yet. Please follow the official documentation to sign up:

You will be required to register a credit card (via Stripe) or send STORJ token to your account to use the service.

Perhaps you may be a little worried about registering your credit card, but since the total amount of fee is available on the dashboard, the price policy is crystal clear, and the usage of the storage and egress is limited (which can be eased by asking to the support team), you should not end up with a bad surprise with your monthly invoice, unlike this case at Wasabi object storage.

After creating an account at Tardigrade, let's obtain an Access Grant on the Dashboard. Please check the official documentation below for instruction.

Access Grant generation UI

Please copy and save the Access Grant and the encryption passphrase to somewhere safe (e.g. a password manager like KeePassXC). Please be careful not to lose the passphrase as you cannot reclaim it and if you lose it, you will not be able to recover your files.

Since there is a known bug related to a handling of browser's cache, make sure to clear cache when you generate an Access Grant. In my case I avoided the bug by generating it on a private window of Firefox.

Next, you need to import the Access Grant to Uplink CLI. The official instruction is available here:

Setting up a bucket — create it and obtain restricted access

After importing the Access Grant to Uplink CLI, it should be available to create a bucket, following the command (where example is the name of the bucket):

$ uplink mb sj://example

You may reuse the Access Grant for running Gateway, but since it is applied to the whole project where other buckets can be created, I am going to create another Access Grant to limit the scope of the permissions, i.e., to narrow down them to the only bucket needed for Goofys.

Please follow the the instruction below to create an Access Grant limited to the bucket:

$ uplink share sj://example --readonly=false

Please note that --readonly=false is required to make it possible for Goofys to write data to the bucket via Gateway.

After running the command, the Access Grant for the bucket should be displayed in the table (Next to Access). Please copy and save it to somewhere safe.

Setting up Gateway — import the Access Grant and run

Now that you got the Access Grant of the bucket, let's proceed with setting up Gateway. Please run the command available below to import the Access Grant to Gateway.

$ gateway setup --access "14aV..." --non-interactive

where 14aV... should be replaced with the Access Grant for the bucket which we created just now.

After the Access Grant is imported, please start gateway to obtain the custom endpoint, the access key, and the secret key.

$ gateway run

Please take note of those information as they are required to set up Goofys with the Ansible playbook. Do not share or publish the secret key.

Note: In order to let Gateway run on background, you need to creating a Linux service with systemd. Please have a look at the articles below for instruction.

Setting up AWS CLI for testing

You may also want to install aws-cli to check if Gateway was properly configured. Per this page, you should install aws-cli with pip to avoid a bug. These commands let you install on Debian / Ubuntu:

# apt install python-pip
# pip install awscli

To interact with Tardigrade with aws-cli, you need to configure it with this command:

$ aws configure

Please input the access key and secret key generated above.

After configuration, let's conduct a simple test.

First, please create an empty file with $ touch example.txt. After creating the file, please run this command to upload the file to the bucket sj://example created above.

$ aws s3 --endpoint=http://localhost:7777/ cp example.txt s3://example

Next, please run this to see if the file was successfully uploaded to the bucket.

$ aws s3 --endpoint=http://localhost:7777/ ls s3://example

Finally, let's remove the test file from the bucket with this command:

$ aws s3 --endpoint=http://localhost:7777/ rm s3://example/example.txt

Please run the second command (the ls one) again to confirm that the file was deleted successfully. If it does not return example.txt, Gateway is set up properly and you are good to go!

Note: For other APIs available for Tardigrade, please see the documentation below:

4. Install Matrix with the playbook

Now Gateway was confirmed to have been set up properly and you have got the access key & the secret key, along with the custom endpoint, let's start configuring the script to install the Matrix homeserver.

Install Ansible on your local machine

In order to use the playbook, first you need to set up Ansible on your local machine.

Ansible communicates over a normal SSH channel, and you do not have to install a special application on the target machine (called as managed node) to run the playbook. Also, once you create and set up the playbook for yourself, you can reuse it for other machines to reproduce the environment automatically, avoiding a repetitive work which is not interesting and often consumes you. The article below is a nice guide to install and configure Ansible:

Enable Goofys (Amazon S3-compatible filesystem) on the playbook

Disclaimer: The instruction below is based on this version of the playbook: https://github.com/spantaleev/matrix-docker-ansible-deploy/tree/6baa91dd9fea14a1cd9ba204d98835fb6d43465a. The project is quite active so please make sure to check its latest status before proceeding.

The Ansible playbook provides a lot of options to configure, so I would recommend you to check the documentation index and configuration guide. There are many documentations to read, but as long as you follow them step by step, it should be possible to set up your Matrix homeserver without difficulties.

However, in this article I would like to limit the scope of this article to connecting a Matrix homeserver to Tardigrade with Goofys.

In order to enable Goofys, please have a look at this documentation.

Since the permissions have already been set with the Access Grant, you can safely ignore the explanation about the security policy on the page.

To enable Tardigrade support on the playbook, open inventory/host_vars/matrix.<your-domain>/vars.yml and add the following lines to the file, along with other settings.

matrix_s3_media_store_enabled: true
matrix_s3_media_store_bucket_name: "example"
matrix_s3_media_store_aws_access_key: "6Tu..."
matrix_s3_media_store_aws_secret_key: "zfM..."
matrix_s3_media_store_custom_endpoint_enabled: true
matrix_s3_media_store_custom_endpoint: "http://127.0.0.1:7777"

where example should be your bucket name to store the media files, 6Tu... and zfM... should be replaced with the access key and the secret key generated on Gateway, respectively.

Because Gateway runs on the localhost by default, don't forget to add --network=host as an argument to the docker run command on roles/matrix-synapse/templates/goofys/systemd/matrix-goofys.service.j2. That should make it possible to access to the localhost from Docker containers.

Note: this edit should not be required if you use Gateway MT hosted outside of the localhost.

--- a/roles/matrix-synapse/templates/goofys/systemd/matrix-goofys.service.j2
+++ b/roles/matrix-synapse/templates/goofys/systemd/matrix-goofys.service.j2
@@ -12,6 +12,7 @@ ExecStartPre=-{{ matrix_host_command_docker }} kill %n
 ExecStartPre=-{{ matrix_host_command_docker }} rm %n
 
 ExecStart={{ matrix_host_command_docker }} run --rm --name %n \
+                       --network=host \
                        --log-driver=none \
                        --user={{ matrix_user_uid }}:{{ matrix_user_gid }} \
                        --mount type=bind,src=/etc/passwd,dst=/etc/passwd,ro \

Please note that if you edit matrix-goofys.service directly, the change will be overwritten when you re-run the playbook.

Migrate from local filesystem to Tardigrade

It is also possible to migrate from local filesystem to Tardigrade, following the instruction. Based on the usage, it may take from hours to days to complete the initial rsync. For reference, it took a couple of days for my 1-month-old Matrix instance.

Creating a file inside folders on a bucket often take a lot of time, so you would feel lag on your Matrix interface like Element after migration, but once files are synced between Matrix network and the bucket on Tardigrade, the speed should be improved somewhat thanks to cache by Goofys.

5. Closing remark

The playbook I picked up in this article is one of the sophisticated and convenient ways to use for installation and maintenance of your Matrix homeserver. It makes it possible for you to not only choose which functions to enable, but also keep your Matrix instance up-to-date.

It looks to me that Tardigrade is a less known service compared to other S3 compatible storage like Backblaze B2, but when it comes to privacy and decentralization Tardigrade is one of the most advanced services. Data is encrypted on client-side by default, and there is not a single point of failures thanks to the global distribution of the Storage Nodes, despite currently being concentrated to West Europe and North America.

By combining Matrix and Tardigrade it is possible to host a real-time communication system which is not only private but also stable and extendable. If you are planning to host your Matrix homeserver for a long term, without being worried about data amount, its privacy and durability, I think connecting it with Tardigrade should be a reasonable choice.

Copyright © 2021 Suguru Hirahara. This work is available under GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation. See https://blog.progressiv.dev/yq31akw3jj for copying conditions.