Migrating from Medium

Posted in Engineering

Photo by Jess Bailey on Unsplash

After many years of writing on Medium, I decided to move my blog to my own website. I wanted to create some interactive content in the future and it’s not possible to do so in Medium. Medium also started to put a lot of paywall on their content, which I don’t really like. While I understand that they need to make money, it just doesn’t sit well with me.

So I exported all of my posts from Medium using the download tool they provided. It’s a zip file containing all of my posts in HTML format.

I am using Nuxt as the framework for this website, so I need to convert all of the HTML files into Markdown. I initially started with manually copying and pasting the content from the HTML file into an online service that converts HTML to Markdown. It was a tedious process and I don’t want to do it for 40+ posts.

I then decided to write a script to automate the process.

Automating HTML to Markdown

I use Python with Jupyter Notebook on a daily basis for my scripting needs. While I don’t really like Python, Jupyter notebook is a great tool for prototyping and experimenting with scripts.

The exported HTML was not in a good format. It wasn’t consistent and provides little structure for parsing. So I decided to just simply convert it as raw as possible.

I used markdownify library to convert the HTML into Markdown.

for file in files:
    # read file
    with open(f'in/{file}', 'r') as f:
        text = f.read()
    # convert to markdown
    md = markdownify.markdownify(text,heading_style='ATX')
    date = file.split('_')[0]
    file_without_date = '-'.join(file.split('_')[1:])
    with open(f'out/{file_without_date}.md', 'w') as f:
        f.write(
            f'---\n'
            f'title:\n'
            f'description:\n'
            f'category: Uncategorized\n'
            f'thumbnail:\n'
            f'date: {date}\n'
            f'---\n'
        )
        f.write(md)

I moved all of my exported HTML files into the in folder of my Jupyter notebook project. The script will read all of the files in the in folder and convert it into Markdown. The Markdown files will be saved in the out folder.

The exported HTML file has a filename in the format of YYYY-MM-DD_title.html. I removed the date from the filename and use it as the date on the frontmatter along with some placeholder for the title, description, and thumbnail.

The Manual Stuff

I copied the generated Markdown files into the content/blog folder and started to manually edit the frontmatter.

I can try parsing the HTML to get the title, description, and thumbnail; but it’s not worth the effort as I would still need to do some editing on the Markdown files anyway like adding the <!--more--> tag and fixing the image links and some formatting.

The <!--more--> tag is used to indicate the part of the post that will be shown as an excerpt on the blog page.

Conclusion

I’m glad that I finally finished migrating all of my posts from Medium. It was a straightforward process but still requires some manual work.

Now that I finished the migration, it’s time to write some new posts.