Migrating from Ghost to Hugo

So I recently migrated this site from Ghost to Hugo after reading a nice article about the Hugo in Linux Voice #20 (funnily enough, the same issue also features an article about Ghost). I originally made the switch to Ghost from Jekyll back in 2014 or so mainly because I could not find a good theme to use. Ghost also seemed to have a lot of cool features and it’s fun to try new things.

I think it’s safe to say that I am hardly a prolific blogger. I mainly write about stuff which I personally cannot find on the web which I think should exist, because I will likely need it myself sometime in the future. So it’s hardly a surprise that I am not in the target audience for Ghost.

Things about Ghost which annoy me

  • It’s written in NodeJS — people who think JS is a good server language also tend to think that it’s a good idea to depend on just about any package, and download it in every single build. Which becomes really funny sometimes.
  • Poor selection of themes — this is subjective of course, but it seems to me that the free options don’t have much in terms of diversity. Heck, they even call it a marketplace which rubs me the wrong way.
  • Themes end up being quite reliant on JS if you want necessary features like syntax highlighting on code snippets — I often browse with JS disabled and should be able to view my own site.
  • Markdown parser treats newlines as significant — meaning you can’t have properly aligned paragraphs in your editor.

That last point irritates me deeply but it’s not as bad as the next point.

  • You can effectively lock an account by entering the wrong password 3 times.

This requires some explanation. So Ghost, targeting teams of bloggers really, naturally have an account system much like Wordpress. Now, as I was surveying the security status of other services I am running, I was wondering how Ghost handled someone trying to brute force your account and decided to simply try it out. Type the wrong password once too many, and this happens:

Ghost: typing the wrong password too many times locks your account

It doesn’t lock it for a single IP address (I tried from several), it locks the entire account. Effectively, someone can just set up a script to try an account indefinitely simply with the intention to block someone from logging in.

The log doesn’t even show login attempts, so there is no way to implement sensible blocking strategies using something like fail2ban.

The whole thing left a bad taste my mouth so it was a very suitable timing to read an article on Hugo.

Things about Hugo which excite me

  • Markdown parser treats newlines correctly
  • It’s a static site generator and not a service — this meant 100MB (10%) of RAM became available on my server and there is no account to hack (or block).
  • Supports everything of Ghost (that I am aware of).
  • The simplicity of Hugo makes it quite painless to do useful things compared to ignored feature requests for the same in Ghost.
  • Can do server side syntax highlighting using Pygments.
  • Some really nice themes are available, and they are all free.

Migrating all data from Ghost

Migrating from Ghost also turned about to be really painless. There were several scripts around for exactly this but they all turned out to be written in odd languages, and did not actually migrate all the metadata in Ghost. So I wrote my own in Python with these killer features:

  • Migrates tags.
  • Migrates dates.
  • Migrates drafts as drafts.
  • Creates aliases in your posts which makes sure that old permalinks will still work!
  • Migrates cover pictures as banner images, just select a theme which support them.
  • Rewrites all relative links so they all still work (this includes images).
  • Code blocks with language definitions like ```language-java are changed to ```java.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''
A simple program which migrates an exported Ghost blog to Hugo.
It assumes your blog is using the hugo-icarus theme, but should
work for any theme. The script will migrate your posts, including
tags and banner images. Furthermore, it will make sure that
all your old post urls will keep working by adding aliases to them.

The only thing you need to do yourself is copying the `images/`
directory in your ghost directory to `static/images/` in your hugo
directory. That way, all images will work. The script will rewrite
all urls linking to `/content/images` to just `/images`.
'''

import argparse
import json
from datetime import date
from os import path
from collections import defaultdict
import re

_post = '''
+++
date = "{date}"
draft = {draft}
title = """{title}"""
slug = "{slug}"
tags = {tags}
banner = "{banner}"
aliases = {aliases}
+++

{markdown}
'''


def migrate(filepath, hugodir):
    '''
    Parse the Ghost json file and write post files
    '''
    with open(filepath, "r") as fp:
        ghost = json.load(fp)

    data = ghost['db'][0]['data']

    tags = {}
    for tag in data["tags"]:
        tags[tag["id"]] = tag["name"]

    posttags = defaultdict(list)

    for posttag in data["posts_tags"]:
        posttags[posttag["post_id"]].append(tags[posttag["tag_id"]])

    for post in data['posts']:
        draft = "true" if post["status"] == "draft" else "false"
        ts = int(post["created_at"]) / 1000

        banner = "" if post["image"] is None else post["image"]
        # /content/ should not be part of uri anymore
        banner = re.sub("^.*/content[s]?/", "/", banner)

        target = path.join(hugodir, "content/post",
                           "{}.md".format(post["slug"]))

        aliases = ["/{}/".format(post["slug"])]

        print("Migrating '{}' to {}".format(post["title"],
                                          target))

        hugopost = _post.format(markdown=post["markdown"],
                                title=post["title"],
                                draft=draft,
                                slug=post["slug"],
                                date=date.fromtimestamp(ts).isoformat(),
                                tags=posttags[post["id"]],
                                banner=banner,
                                aliases=aliases)

        # this is no longer relevant
        hugopost = hugopost.replace("```language-", "```")
        # /content/ should not be part of uri anymore
        hugopost = hugopost.replace("/content/", "/")
        hugopost = re.sub("^.*/content[s]?/", "/", hugopost)

        with open(target, 'w') as fp:
            print(hugopost, file=fp)


def main():
    parser = argparse.ArgumentParser(
        description="Migrate an exported Ghost blog to Hugo")
    req = parser.add_argument_group(title="required arguments")
    req.add_argument("-f", "--file", help="JSON file exported from Ghost",
                     required=True)
    req.add_argument("-d", "--dir", help="Directory (root) of Hugo site",
                     required=True)

    args = parser.parse_args()

    migrate(args.file, args.dir)


if __name__ == "__main__":
    main()

Next post, I might write about what changes I made to the theme, and some nifty Nginx tricks you can use to stay compatible with old links.


Other posts in the Migrating from Ghost to Hugo series: