Solving the Emoji/character encoding problem in Drupal 7

Update: As of Drupal 7.50, Emoji/UTF-8 mb4 is now supported for MySQL (and other databases) in core! See the documentation page here for more information on how to configure it: Multi-byte UTF-8 support in Drupal 7. This blog post exists for historical purposes onlyβ€”please see the Drupal.org documentation for the most up-to-date instructions, and see my newer blog post here: Getting Emoji and multibyte characters on your Drupal 7 site with 7.50 πŸ‘πŸΌ!

On many Drupal 7 sites, I have encountered issues with Emoji (mostly) and other special characters (rarely) when importing content from social media feeds, during content migrations, and in other situations, so I finally decided to add a quick blog post about it.

Have you ever noticed an error in your logs complaining about incorrect string values, with an emoji or other special character, like the following:

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x89" ...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 538551 [:db_insert_placeholder_2] => 538550 [:db_insert_placeholder_3] => story [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => und [:db_insert_placeholder_6] => <p>πŸ˜‰</p> [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (line 514 of /drupal/modules/field/modules/field_sql_storage/field_sql_storage.module).

To fix this, you need to switch the affected MySQL table's encoding to utf8mb4, and also switch any table columns ('fields', in Drupal parlance) which will store Emojis or other exotic UTF-8 characters. This will allow these special characters to be stored in the database, and stop the PDOExceptions.

Using Sequel Pro on a Mac, this process is relatively quick and painless:

  1. Open the affected tables (in the above case, field_data_body, and the corresponding revision table, field_revision_body), and click on the 'Table info' tab.
  2. In the 'Encoding' menu, switch from "UTF-8 Unicode (utf8)" to "UTF-8 Unicode (utf8mb4)". This will take a little time for larger data sets.
  3. Switch over to the 'Structure' tab, and for each field which will be storing data (in our case, the body_value and body_summary fields), choose "UTF-8 Unicode (utf8mb4)" under the 'Encoding' column. This will take a little time for larger data sets.

After converting the affected tables, you will also need to patch Drupal 7 to make sure the MySQL connection uses the correct encoding. Apply the latest patch from the issue Drupal 7 MySQL does not support full UTF-8, and add the following keys to your default database connection settings:

<?php
$databases
= array(
 
'default' => array(
   
'default' => array(
     
'database' => 'database',
     
'username' => 'username',
     
'password' => 'password',
     
'host' => '127.0.0.1',
     
'driver' => 'mysql',
     
// Add default charset and collation for mb4 support.
     
'charset' => 'utf8mb4',
     
'collation' => 'utf8mb4_general_ci',
    ),
  ),
);
?>

That issue is actually a child issue of MySQL driver does not support full UTF-8, which has already been fixed in Drupal 8 (which now requires MySQL 5.5.3 or later as a result). It may take a little time for the problem to get an 'official' fix in Drupal 7, since it's a complicated problem that requires a delicate touchβ€”we don't want a bunch of people's sites to go belly up because some contributed modules are using large VARCHAR columns, or because their hosting provider is running an old version of MySQL!

There's also a handy table_converter module for Drupal 7, which helps you automate the process of converting tables to the new format. It still requires the core patch mentioned above, but it can help smooth out the process of actually converting the tables to the new format.

Finally, if you want to write an update function to convert the tables yourself during a deployment (instead of manually converting tables using something like table_converter), you can use a function like the following in a custom module's mymodule.install file:

<?php
/**
 * Convert the `body` field to utf8mb4.
 */
function mymodule_update_N(&$sandbox) {
 
db_query("ALTER TABLE {field_data_body} MODIFY body_value longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
 
db_query("ALTER TABLE {field_data_body} MODIFY body_summary longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
 
db_query("ALTER TABLE {field_revision_body} MODIFY body_value longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
 
db_query("ALTER TABLE {field_revision_body} MODIFY body_summary longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
 
field_cache_clear(TRUE);
}
?>

Once you've fixed the issue, you won't be quite as annoyed next time you see one of these guys: πŸ˜€

Comments

Thanks for compiling that. This bug/feature is causing problems for the Twitter module (and others), and it's a bit complex to handle from that POV.

Yeah, you can't expect every user of the Twitter module (or many of the other contrib modules that frequently hit this issue) to patch core... so I'm hoping it makes its way into core at some point in the future :-/

I can imagine many sites are getting uncaught exceptions and feeds just get blocked from importing for a while due to this bug. You only notice if you're looking for new items, and they're missing.

If you want to run the table update commands in an update hook in Drupal, you can also try this:

function hook_update_N(&$sandbox) {
  db_query("ALTER TABLE {field_data_body} MODIFY body_value longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
  db_query("ALTER TABLE {field_data_body} MODIFY body_summary longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
  db_query("ALTER TABLE {field_revision_body} MODIFY body_value longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
  db_query("ALTER TABLE {field_revision_body} MODIFY body_summary longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci");
  field_cache_clear(TRUE);
}

(Rather than use Sequel Pro to do the changes by hand).

Excellent information, thanks very much for posting, especially that last comment about running this via hook_update_N().

Thanks for this post! Its working for me if i want to add an emoji to the body field. Unfortunaly I can't add an emoji to the comment field. Do others have the same experience?

Didn't work for me, unfortunately.
Still getting the PDO Exception

Thanks Jeff! Just stumbled upon this problem myself and it worked great! Just one thing that wasn't explicit in your instructions but I remembered to do after reading your php snippet...CLEAR ALL THE CACHES! You da man!

Seems that after more than 2 years the problem still exists. This is helpful, tried and works. But is there any disadvantage of using utf8mb4_general_ci over otf8_general_ci?

Thank you so much! I spent quite a bit of time trying to figure this out and there were several posts and bug reports that came *close* to solving this, but the alter table queries you provided worked great.