Hako: Stupidly simple DIY web archiving tool

I can't code to save my life, but that doesn't stop me from trying. My latest creation is a case in point. Since stuff tends to disappear unceremoniously from the Web, I usually save local copies of interesting articles. Up until recently, I used the SingleFile Firefox add-on for that, but the process involved too many manual steps for my liking. After several failed attempts to make Archivebox work, I decided to roll out my own tool based on monolith. It's a simple command-line utility that saves complete web pages as single HTML files. It took me a few hours to cobble together a crude but usable tool that I named Hako (it means box in Japanese, and it sounds a bit like hacky, which I find somewhat appropriate).

Here's how Hako works. To archive the currently opened web page, select the title and click on the Hako bookmarklet. This sends the URL and the title of the page to the Hako PHP page that passes the received values to monolith. The latter then saves the page using the title as its file name. The very same page also shows a list of all archived pages. So it also acts as a no-frills read-it-later tool. That's all there is to it, really.

To deploy Hako on your machine, you need to install monolith first. This can be done using the following commands (note that this installs the x86_64 version of monolith and it uses the curl and jq tools).

curl -s https://api.github.com/repos/Y2Z/monolith/releases/latest | jq -r ".assets[] | \
select(.name | contains(\"x86_64\")) | .browser_download_url" | wget -i -
sudo mv monolith-gnu-linux-x86_64 /usr/local/bin/monolith
sudo chown root:root /usr/local/bin/monolith
sudo chmod 755 /usr/local/bin/monolith

Install PHP as well as the php-xml and php-mbstring packages on your system. To do this on Debian and Ubuntu-based systems, run the sudo apt install php php-xml php-mbstring command. Clone then the project's Git repository using the git clone https://github.com/dmpop/hako.git command. Switch to the resulting hako directory, open the index.php file for editing, and replace the default value of the $KEY variable with the desired password. Save the changes and start the PHP server using the php -S command.

Next, add the following bookmarklet to the Bookmark toolbar of your browser (replace with the actual IP address of the machine running Hako and secret with the string that matches the value of the $KEY variable in the hako/index.php file):


Now navigate to the page you want to archive, select the title, and click on the Hako bookmarklet. If the page has been archived successfully, you should see it in the list of saved pages.

If everything works properly, you might want to create a system service to start Hako automatically. Run the sudo nano /etc/systemd/system/hako.service command and add the following definition (replace /path/to/hako with the actual path to the hako directory):


ExecStart=/usr/bin/php -S -t /path/to/hako
ExecStop=/usr/bin/kill -HUP $MAINPID


Enable and start the service:

sudo systemctl enable hako.service
sudo systemctl start hako.service

Keep in mind that Hako is a very simple tool with its fair share of shortcomings. It doesn't provide any feedback, so the only indication that an archival action has been completed successfully is a created HTML file. Anyone with the Hako bookmarklet (or basic knowledge of creating HTTP requests) and IP address of your Hako instance can archive pages on your server. The web UI just lists the saved files, and that's all. And since there is no password protection, all the saved web pages are publicly accessible. I run Hako on a local server that is not exposed to the outside world, and I manage saved pages using standard Linux tools. I recommend you do the same.


© Dmitri Popov