Piotr Migdał

If it is worth keeping, save it in Markdown

17 Feb 2025 | by Piotr Migdał

One of Stanisław Lem's stories, The Memoirs Found in a Bathtub, begins with a strange phenomenon that turns all written materials into dust. While this is science fiction, something similar happens in our digital world.

Digital memento mori

If you publish something online, sooner or later, it will vanish.1

In the best-case scenario, a link changes during website restructuring. More commonly, the content is lost. The only hope is that someone saved it from oblivion in the Internet Archive Wayback Machine.

Walled gardens requiring login are even worse - when they go down, everything within them vanishes forever. If you haven't saved it yourself, it's gone. Moreover, any service (free or paid) may restrict access to content at any time - either completely or practically, by making it impossible to find what you're looking for. The same content you posted on Twitter a few years ago, now is on X, and in a few years might be available after login, paid subscription, or - not at all .

Even self-hosting isn't foolproof - your content can vanish when you forget to pay for hosting or after a server crash. And even if your data survives, accessing it can be tricky: WordPress blogs store posts in databases that server updates can break. I learned this lesson when my PHP photo gallery went down - thankfully, I had kept all photos as simple JPGs organized by date.

The only reliable solution is to store content in formats that can be opened without specialized software - formats that will remain accessible for decades to come.

Why things are worth saving

There are many motivations for preserving content, ranging from a digital "non omnis moriar" through practical arguments, to archiving as a goal in itself2.

For me, the key reasons are:

  • I want to keep and own things I wrote - they are parts of me, my history, my lived experience
  • I want to have everything in one place and easily searchable
  • I want to use it with AI tools (looking for similar notes, summarizing, using as context)
  • I want to be able to reuse or share things however I want (email, blog post, ebook, anything)

Plaintext

As a data scientist, I turn things into vectors.
As an unabashed archivist, I turn things into Markdown.

The most durable solution would be carving things in stone - it would last for millennia. But that's hardly practical, and it wouldn't make things easily searchable or shareable.

The second best option is plaintext files with UTF-8 encoding and Markdown formatting3. As long as computers exist, we'll be able to read plaintext files with ease.

Markdown files are essentially plaintext with some extra syntax for common elements like sections, bullet points, and links. The format deliberately avoids precise control over display details like font selection4. Following the rule of least power, I consider this limitation a feature. For contrast, consider PDF - a format so powerful that it can run Doom.

For personal notes, I use Obsidian, a note-taking app I love and use daily. While it's a powerful tool with great plugins, what keeps me loyal is its simplicity - it stores everything in plain files. The lack of a proprietary format moat is precisely what makes it so compelling.

For blogging, most static site generators embrace Markdown. This very blog post is written in Markdown5. Using the same markup for note-taking and publishing makes sharing smooth.

How I do it

I dream of automatically converting everything I write or encounter into Markdown. The reality is messier - there's a constant tension between my autistic urge to archive everything and my ADHD that makes maintaining such systems challenging.

So I take a pragmatic approach - when I find content worth keeping, I copy it to a markdown file, adding frontmatter with its publication date, source, and relevant tags:

I particularly save things I post that might be useful later. Conference talk abstracts, sauna event descriptions, technical explanations - in the future, they're much easier to find and reuse.

When I catch myself searching for old content (like a Facebook post I want to share or reread), I save it immediately. If I discover a blog post has vanished, I retrieve it from the Wayback Machine and preserve it. When forwarding an email with a detailed explanation - you guessed it, I save it.

Content worth searching for once is content worth preserving forever.

Worried about saving too much? Well, disk storage is cheap - and for text files, it's practically free.

Tools that help

Sometimes manual copying suffices. For trickier formatting, AI tools are invaluable - being trained on Markdown, they excel at processing and extracting content. You can use them to convert online text or parse PDFs (like slides), as shown in Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything.

For some sources, I've created semi-automated solutions. For instance, I wrote a Python script to convert my Kindle highlights and notes into Markdown.

Many tools exist to help with format conversion. The most versatile is pandoc, which can convert between dozens of formats - from Word documents to LaTeX, and everything in between.

The community has also created specialized tools for specific platforms. You can find tools for converting Medium posts to Markdown (either from export or directly by URL), archiving Reddit threads, and many other use cases.

Since we're dealing with lightweight text files, there are many for backing it up. Git is particularly well-suited for version-controlling and syncing this content.

Additionally, in each service I own, I periodically download my data. Even if it's a mesh of JSON, XML, HTML, CSV and other formats, I have it. Even if at a given moment I have no time to process it into Markdown, at least the data is there.

Next steps

I would love to have a comprehensive tool for exporting everything - especially from social media. Both the posts that resonated with many people and those that hold personal significance deserve preservation.

While Facebook offers limited data export capabilities, they're incomplete. Most notably, there's no way to preserve entire discussion threads - often the most valuable part of a post.

And you - what content do you find yourself searching for? What have you archived, and what do you wish you had saved?

Footnotes

  1. But for practical reasons, and hoarding for its own sake, I fathered over 14k links in Pinboard. Yes, downloaded data in JSON.
  2. I don't claim Markdown is the only solution. There are valid reasons to use other formats. My focus is on plaintext in UTF-8. If you prefer other markup languages (like reStructuredText, AsciiDoc, Org-Mode) or just plain text without formatting - the principles still apply. In some cases original format works - e.g. if it is JSON or code.
  3. Consider HTML (Hypertext Markup Language) as a counterexample. It was meant to enrich text with semantics, but now serves primarily as a tool for building UIs. While this evolution brought many benefits, HTML is no longer suitable for pure content storage.
  4. This blog uses Nuxt 3 Content (source: github.com/stared/stared.github.io). It follows my previous versions in Jekyll and Gridsome. Thanks to Markdown, migration between platforms has been seamless - see New blog - moving from Medium to Gridsome. For the latest migration from Gridsome to Nuxt 3 Content, Cursor IDE was a great help. Astro is another static site generator gaining significant traction.