Migrating 13,000 Comments from Drupal to Hugo

Jan 20, 2026

After 16 years on the LAMP stack, I finished migrating this website from Drupal to Hugo a few weeks ago.

What's old is new, as this blog was originally built with Thingamablog, a Java-based Static Site Generator (SSG) I ran on my Mac to generate HTML and FTP it up to my first webserver (over 20 years ago!).

The main reason I moved from an SSG to Drupal was to add comments. I wanted my blog to have the same level of interactivity I had pre-Thingamablog, when I was (briefly) on Xanga.com.

For many years, Drupal comments were fine.

But over time...

2009: I finished manually migrating my old Thingamablog site into Drupal.
2011: Spam became more prevalent, so for use on both JeffGeerling.com and Flocknote, I built the Honeypot spam prevention module, which grew to become one of the top 50 installed projects across all Drupal sites.
2020: I live-streamed the entire Drupal 7 to 8 migration, a process which spanned 16 streams and ultimately motivated a career shift to my YouTube channel.
2022: After dealing with three major DDoS attacks, I started thinking about using an SSG to make it easier to stave off such attacks. Drupal has many caching mechanisms (which I've written about frequently), and can scale quite well—but it's easier to not have any backend attack surface.

And that brings us to 2026: the blog is running on Hugo, and I just finished migrating 13,189 comments across 1,119 Drupal posts.

JeffGeerling.com - Remark42 comments example

Since the process isn't documented elsewhere, and since I hadn't heard of the comment system I'm now using (Remark42) until I got serious about a static site migration, I figured I'd write it up here.

LLMs as coding assistants

After the last major migration from a SSG into Drupal, I noted:

As a sad side-effect, all the blog comments are gone. Forever. Wiped out. But have no fear, we can start new discussions on many new posts! I archived all the comments from the old 'Thingamablog' version of the blog, but can't repost them here (at least, not with my time constraints... it would just take a nice import script, but I don't have the time for that now).

That would still be the case today, were it not for my desire to test out local LLMs to assist with the migration. I'd label myself an 'AI skeptic', but I admit it's impressive how well LLMs achieve certain tasks, especially if you treat them like junior devs on a small team, break down work into reasonable-sized tasks, review the work in stages (checking in code in a VCS)—as you would if you were a technical architect.

I've had experience working with a number of teams, and I'd say the two models I was using on my Mac (GPT-OSS 20B and Qwen3 Coder 30B, via Ollama) are on the lower-to-midrange end of dev teams I've worked with. "Frontier" models might be better than that, but they still don't solve all the issues prevalent in computer science!

Nota bene: not one word of this blog post (nor any post on this blog, either in the past or in the future) was written by, or with the assistance of an LLM—and yes, I use em-dashes, which are easy to type on a Mac (⇧ + ⌥ + -). Sosumi!

As a technical architect, I encountered:

Missed requirements: Sometimes this was my own fault, but often it was a sign of a feature that was missing something important. The code would implement a feature, but lack one or two of the important bits required to get it across the line for stakeholder approval. Sometimes it was something nobody considered, but was obvious in hindsight.
Working, but suboptimal implementations: After building at least a few hundred Drupal sites, I learned design patterns that lead to either unmaintainable disasters or efficient, maintainable sites. The more junior the developer, the more often I'd spend time with them trying to guide approaches down the less rocky path.
Premature optimizations: The paradoxical flip-side is spending too much time perfecting a feature. Sometimes code will only be run one time in a migration, and it'll be irrelevant beyond that. So don't spend hours optimizing its Big O to shave 3 minutes off a 7 hour process!
Burnout: Seeing patterns that lead to burnout both for myself and other devs, I tried to help project managers lighten a load or go easier on devs in the thick of it. Sometimes it was just a matter of taking a task off that developer's back, other times pulling a feature and reworking the requirements.

The LLMs I enlisted for help seemed to hit all four of these things, at various times (yes, even 'burnout', as their context windows would grow too large for my meager Mac mini, and I'd reset and start from a fresh angle).

The big difference? I could supply a small set of requirements¹, and within 1-2 minutes, I would have code that runs. Maybe not code that works, but it would be in close proximity to the code that meets all my requirements.

If I were assigning the same tasks to a small dev team, I wouldn't expect the first code back for review for at least a day. Maybe two. And probably a full sprint (e.g. 2 weeks) before we'd have a solution ready for QA testing.

With some initial success in getting the code I needed (coding was only about half this project), I was a little troubled:

I was able to finish this entire comment migration in a few evenings.

Being able to do that felt great, sure. But the fact 'senior' developers can be similarly productive, without the useful work of mentoring junior devs through this process, worries me.

AI/LLMs—even the best 'frontier' models—cannot and I believe will never be good at the other 80% of work involved in a content migration.

The best projects—the ones that don't go over budget and timeline—require technical and project management from people who ran the gauntlet as beginners.

We need people who've brought down the entire site with a bad query or migration step. We need people who've had to withstand the ire of an angry sysadmin on a weekend night their Friday deployment wiped out a database...

You don't get that for free.

With AI/LLMs, and without the mentorship aspect, you end up with two types of developers:

Expert beginners: Junior devs who feel like they can achieve anything with AI coding tools. (But they don't see the enormous footguns lurking in their code.)
Lone Wolf Developers: Devs who did go through the ringer earlier in the pre-AI era, and have the tools to play LLMs like an orchestra, building decent software fast—and alone. And who now have no excuse to work on teams with junior devs and be the curmudgeons² they were meant to be.

There's less of a path from #1 to #2 now. And that's even assuming you should strive to become a #2. I'd argue we need 'middle class' developers: devs who want to earn a living, clock in and clock out, and build software that helps the world run.

These developers also benefit from the mentorship (and sometimes consternation) they'd traditionally get early in their careers.

Sycophant LLMs are not a substitute for senior devs.

And they're also about the exact opposite of what you'd want for QA³.

ANYWAY, I went off on a bit of a tangent there. Sorry for waxing a bit on the state of AI coding today.

Why Remark42

My requirements for a commenting system were:

Able to handle thousands of blog posts, and tens of thousands of comments, with threading and some form of moderation.
Must be self-hosted, relying on zero 3rd party APIs or websites (no Disqus, no giscus).
Must allow anonymous (or at most, email-based) comments—no 3rd party signin required.
Some form of spam mitigation.
Can import all my old Drupal comments.

Remark42 was one of two static-site-comment systems I evaluated that met those requirements. The other one was Meh, by GitHub user splitbrain. Remark42 won out based on its history: its been maintained for nearly a decade, versus one year for Meh.

Remark42 was:

Easy to get running quickly with Docker
Fast (responses under 1 ms locally)
API-driven, so I know I can get data in and out easily

Remark42 Setup

I wrote up all the details of my comment migration on GitHub, but I'll give the quick rundown here:

In Hugo, I created this comments.html partial with the remark_config embedded for the frontend.
I built comments.jeffgeerling.com on a DigitalOcean VPS, used Ansible to configure security settings and install Docker, and also to manage Remark42's Docker Compose environment.
For spam prevention and DDoS protection, I put the server behind Cloudflare. I also have Fail2Ban running, and DigitalOcean firewall rules locking down the VPS even further.
For email debugging, I configured Mailpit in my Remark42 Docker Compose configuration as a 'dev' profile option. When I run Remark42 locally, I use the command docker compose --env-file .env.dev --profile dev up, which also loads in a set of environment variables (including a local SMTP configuration) stored in .env.dev.

I use Amazon Simple Email Service (SES) for email notifications on the public server. It's cheaper than other options like Mailgun, and I was already familiar with it. One quirk with SES is it takes at least 12-24 hours to get fully approved, and the setup process is slightly more onerous than other email providers⁴.

I stuck with email for notifications since it's ubiquitous, and I imagine it'll be around far beyond other notification services' useful lives.

Implementation Quirks

As with all software, deploying Remark42 wasn't a perfect process. I ran into a number of quirks. None were showstoppers, but I do hope to see a few of these resolved:

Spam prevention

Remark42 doesn't have a 'approve before publication' option, which is how I moderated comments on my Drupal site. Requiring explicit approval discourages bad actors who spam out dozens of comments in a short time.
There's no integrated spam prevention mechanism besides a basic 'honeypot-style' field. On my Drupal site, I used CleanTalk, but Akismet is another popular option. I'm following this issue about backend spam filtering.
There isn't a global admin UI, with an overview of all comments.
- There's an issue for premoderation, and an open PR, but without a global UI, it would still be a annoying to manage things on days with many comments.

Display issues

Remark42 comes with stylesheets for light and dark mode, but it doesn't set them automatically. So I'm using a JS workaround for automatic light/dark mode.
I might disable user avatars, but I couldn't find an efficient way to do that in Remark42, outside of hiding them in a template or with CSS. It seems like the Gravatar integration would still run and cache avatars regardless. So I opened Allow disabling avatar functionality?.

Getting comments out of Drupal

Remark42 comes with importers for Disqus, Wordpress, and Commento. Because Drupal's built-in commenting system is conceptually similar to Wordpress, I built a Python script to export Drupal comments in the same XML format as a Wordpress export.

I briefly considered migrating straight from Drupal into the Bolt (bbolt) key-value database Remark42 uses. But because of the lack of familiar tooling around it (like Sequel Ace for MariaDB or Base for SQLite), I decided to stick with the Drupal -> Wordpress -> Remark42 option.

Using GPT-OSS 20B and Qwen3 30B A3B, I got a good start on the export script, but I did spend time tweaking the SQL and fiddling with the XML structure, since the AI models missed the finer details.

I built a local environment for testing on the Hugo site, and built a little configuration toggle in my hugo.toml file so I could enable or disable comments site-wide very quickly:

[params]
  ...
  commentsGlobalEnable = true # set to 'true' to enable Remark42 comments.

I then use the conditional {{ if .Site.Params.commentsGlobalEnable }} in my comments.html partial template, to either display the Remark42 embed, or a 'Comments disabled' message.

I spent a couple of hours testing and re-testing the entire migration, spot-checking a number of posts with different features (many comments, no comments, deeply-threaded comments, etc.).

To get all features working locally, I also had to set up local domains for my website inside /etc/hosts:

127.0.0.1 dev.jeffgeerling.com
127.0.0.1 dev-comments.jeffgeerling.com

Otherwise you'll bump into issues testing the importer through localhost. I even had to force Docker to use the right IP address for the Hugo site running on my Mac host (outside the Docker environment), by adding extra_hosts in the docker-compose.yml file:

services:
  remark:
    image: ghcr.io/umputun/remark42:v1.15.0
    container_name: "comments_jeffgeerling"
    hostname: ${HOSTNAME}
    extra_hosts:
      - "${DEV_HOST_MAPPING:-dummy:127.0.0.1}"
    ...

Then in my .env.dev:

HOSTNAME=dev-comments.jeffgeerling.com
DEV_HOST_MAPPING=dev.jeffgeerling.com:192.168.65.254  # host.docker.internal IP

The final migration

For the final migration process, I created a separate issue on GitHub to track progress: Final comment migration steps (Drupal to Remark42).

JeffGeerling.com - Remark42 migration steps

I use this format (just an issue with checkboxes, or a text file with markdown-based checkboxes) when performing any potentially-destructive tasks, so I can put in exact steps, including the commands to run, and follow them in the correct order.

Having done all the steps multiple times locally helped a lot. But there are certain tasks that can only be done in prod, at least when you're like me and don't have a true prod-like staging environment, with separate servers and infrastructure at every level.

The most annoying task was getting SSL working, because I was using strict SSL through Cloudflare.

Once I got a local self-signed cert figured out, I immediately got a ton of invalid traffic on the new server. This problem ("new VPS gets flooded with traffic immediately") is a bit annoying, because VPS providers like DigitalOcean recycle IPv4 addresses quickly—and bring along the baggage of the old IP at the same time...

So I locked down the DigitalOcean Firewall on the comments VPS the same way I did my main site VPS. But then I noticed the Remark42 container was running at 100% CPU constantly.

Long story short, I realized by trying to disable comment editing, I had caused an infinite loop in the container startup process, and it ate up all my server's CPU.

Therefore I opened one final issue, If I set EDIT_TIME=0, container uses 100% CPU forever on init, and set edit time back to '5 minutes'.

The server was finally running well, and the final snag was needing to add my self-signed comment server cert to the server's certificate store, because the Go library Remark42 uses when importing comments through Remark42's API requires a trusted certificate (even when running on localhost!).

So:

/srv # cp var/cert.pem /usr/local/share/ca-certificates/
/srv # update-ca-certificates

And finally, the import worked:

/srv # import --url=https://comments.jeffgeerling.com:8443 -p wordpress -f /srv/var/exported-comments.xml -s jeffgeerlin
g_com
remark42 v1.15.0-307e69e-20251224T02:45:51
2026/01/15 23:54:29.727 [INFO]  import /srv/var/exported-comments.xml (wordpress), site jeffgeerling_com
2026/01/15 23:54:29.852 [INFO]  completed, status=202, {"status":"import request accepted"}

It took a while, because Remark42 also verifies each comment post URL prior to importing the comments (the comment server can't run standalone for an import).

After twiddling with some of my DDoS prevention rules in Cloudflare, I was able to get all Remark42 functionality running—along with all 13,000+ Drupal comments—on this website!

The Grass is Always Greener...

Will this site go back to a CMS at some point? Maybe. But probably not.

I spoke to a former colleague in the middle of the migration—someone who's been running a personal blog on Drupal four years longer than I have!

His perspective (given in the midst of the comment migration process) was useful in tempering my excitement over having gone static.

Instead of having a fully dynamic website, with native comments, a deep caching system, built-in search (with modules to improve all these things), I now have a static website, which needs a separate server for comments, and I'll soon implement a less flexible site search solution!

However, part of my goal in moving to a static site is being able to test various hosting options, some of them very exotic—and limited in processing power. Therefore a static site only requiring an HTTP server and a few MB of RAM is a bonus.

So far, I've had a good experience running Remark42 (for about a week) and Hugo (for almost three weeks). I haven't encountered DDoS-level traffic, so I have yet to see how it'll hold up in that condition.

Whatever happens, I'll continue developing my website—as I do all my projects—in the open over on GitHub.

Like "Here is my python export script. Add a database query that pulls all comments from a Drupal 10 database, along with comment information including email address and username, and sort that data by the node the comment is attached to, including heirarchical 'parent' information." ↩︎
I don't mean this in a negative way (at least, most of the time). Much of my career (and personal) development resulted from conversations I've had with people who vehemently disagreed with my take on a topic, feature, bug report, etc. Most people I initially thought were standoffish or ill-tempered were amazing to work with and helped me see something in an entirely different way. (This still happens regularly.) ↩︎
Sadly, the battle for proving QA's worth is already lost in many companies. QA folks have often been the lynchpin that saves a project, in my experience, uncovering major faults well before they have a seismic impact on said project. ↩︎
But it's a lot easier than maintaining my own SMTP server! Email deliverability is challenging enough when using cloud email providers... ↩︎