WordPress and the legendary trojan emojis

Here is one of the most unbelievable releases in the history of WordPress.

WordPress is everywhere. In 2020, it’s still the most popular CMS, that is why hackers still love it too.

The fascinating story I want to share with you is pretty old, though (2014). But, hopefully, it’s worth it.

A story by Andrew Nacin

Andrew Nacin has been a WordPress core developer for years. He joined the White House’s U.S. Digital Service in 2015.

If you want to hear the story (which is his story) instead of reading my post, you can watch the YouTube video.

This guy is an inspiration, check his work.

One line of code can change everything

The following line would have changed a lot of things:


Unfortunately, it was missing. Enabling strict mode, whether it’s MySQL or javaScript or whatever language, is always a good idea.

When strict mode is disabled, the engine starts guessing things to make them work, which often leads to critical security issues like harmful injections.

A LOT of CMS such as Joomla or Drupal have enabled strict mode since the beginning, but not WordPress.

Multibyte comment injection

MySQL only stores 3-byte sequences with the default UTF-8 charset, and it truncates everything after unless you use another charset, utf8mb4.

For example, emojis are 4-byte sequences.

Hackers might exploit this vulnerability by using four-byte characters to force MyQSL truncation, and then inject malicious code in the database.

But how exactly?

As proof of concept, Andrew Nacin used WordPress comments. He inserted a comment with a 4-byte sequence like the following in the database:

<q cite= 'Hello 🔥'>blablabla</q>

As MySQL truncated everything after 3 bytes, there was a opening HTML tag in the database. Then he just needed to insert another comment with malicious code inside.

In WordPress, there is a discussion setting that says:

Comment author must have a previously approved comment

So, at this time, it was possible under specific circumstances for hackers/spammers to get one comment approved and then use this approval to inject their shit.

Backward compatibility

The core team realized this vulnerability could affect “any two fields that would be rendered anywhere near each other”, not just comments.

Again, this cannot happen with:


WordPress is a backward-compatible software! It’s in its core philosophy.

It was impossible to enable MySQL strict mode ten years after the first release without breaking everything, including the core itself, the entire ecosystem with plugins and themes, etc.

IMHO, the most incredible part of this story is that the core team realized the hole was even larger than they thought.

It was not only a 4-bytes issue, pretty much everything could turn bad. The team had to write some global sequel parser in PHP.

However, they had to parse A LOT of code, including edge cases. As a result, in WordPress 4.2, approximately 1,000 lines of code were inserted into wpdb.php.

They struggled for months with a lot of changes.

During all this time (almost two years), the code was in the trunk branch, but it was not ready. They could not reveal the big security issue before the patch is released, so they worked incognito.

To achieve that, they said they worked on “removing invalid characters” and “support for emojis”!

Wrap up

More than five years after this story is still fascinating. Those patches make the web more secure for everyone.