Migrating from Medium
Photo by Jess Bailey on Unsplash
After many years of writing on Medium, I decided to move my blog to my own website. I wanted to create some interactive content in the future and it’s not possible to do so in Medium. Medium also started to put a lot of paywall on their content, which I don’t really like. While I understand that they need to make money, it just doesn’t sit well with me.
So I exported all of my posts from Medium using the download tool they provided. It’s a zip file containing all of my posts in HTML format.
I am using Nuxt as the framework for this website, so I need to convert all of the HTML files into Markdown. I initially started with manually copying and pasting the content from the HTML file into an online service that converts HTML to Markdown. It was a tedious process and I don’t want to do it for 40+ posts.
I then decided to write a script to automate the process.
Automating HTML to Markdown
I use Python with Jupyter Notebook on a daily basis for my scripting needs. While I don’t really like Python, Jupyter notebook is a great tool for prototyping and experimenting with scripts.
The exported HTML was not in a good format. It wasn’t consistent and provides little structure for parsing. So I decided to just simply convert it as raw as possible.
I used markdownify
library to convert the HTML into Markdown.
for file in files:
# read file
with open(f'in/{file}', 'r') as f:
text = f.read()
# convert to markdown
md = markdownify.markdownify(text,heading_style='ATX')
date = file.split('_')[0]
file_without_date = '-'.join(file.split('_')[1:])
with open(f'out/{file_without_date}.md', 'w') as f:
f.write(
f'---\n'
f'title:\n'
f'description:\n'
f'category: Uncategorized\n'
f'thumbnail:\n'
f'date: {date}\n'
f'---\n'
)
f.write(md)
I moved all of my exported HTML files into the in
folder of my
Jupyter notebook project. The script will read all of the files in
the in
folder and convert it into Markdown. The Markdown files
will be saved in the out
folder.
The exported HTML file has a filename in the format of YYYY-MM-DD_title.html
.
I removed the date from the filename and use it as the date on the frontmatter
along with some placeholder for the title, description, and thumbnail.
The Manual Stuff
I copied the generated Markdown files into the content/blog
folder
and started to manually edit the frontmatter.
I can try parsing the HTML to get the title, description, and thumbnail;
but it’s not worth the effort as I would still need to do some
editing on the Markdown files anyway like adding the <!--more-->
tag
and fixing the image links and some formatting.
The
<!--more-->
tag is used to indicate the part of the post that will be shown as an excerpt on the blog page.
Conclusion
I’m glad that I finally finished migrating all of my posts from Medium. It was a straightforward process but still requires some manual work.
Now that I finished the migration, it’s time to write some new posts.