Icelandic letters in WordPress permalinksPosted: Last updated:
Update 17.04.2012 The fix that this blog post describes got released in WordPress 3.1, so this blog post is now obsolete. I looked at the patch that I had submitted for it today and found out that it was finally fixed in changeset 15930 on Oct 23, 2010, and was released in WordPress 3.1 on February 23, 2011. So, three and a half years after I submitted the patch it got released :)
Update 13.08.2007 I made a patch for this and submitted it to WordPress. The changes should come in WP 2.3. The patch is on ticket #4739.
I use WordPress for this site as well as my personal blog. It works very well with icelandic letters for the most part but there was one problem. When you create a new post and give it a title, the post slug is generated from the title. For example, the title of this post is "Icelandic letters in WordPress permalinks" and it's automatically changed to "icelandic-letters-in-wordpress-permalinks" in the url. When you have a post with special characters in the title they are either removed or changed to some ASCII equivalent. For instance, Á becomes A, Í becomes I, ö becomes o and so on. This worked well for all icelandic letters except three, they are þ, æ, ð. When I made a post with the title "Þátturinn" the post-slug would become "þatturinn" and when I tried to enter that address in my address bar it changed to "%c3%beatturinn" and I got a "page not found" error from WordPress.
Now, you can manually enter the post-slug when you write a post, but I don't wanna have to do that every time I post, so I dug around the WordPress code and found the replacement function. It's called remove_accents and is in the file wp-includes/formatting.php. There, right before the line "// Decompositions for Latin Extended-A" I added the following code:
chr(195).chr(144) => 'D', chr(195).chr(176) => 'd', chr(195).chr(158) => 'TH',chr(195).chr(190) => 'th', chr(195).chr(134) => 'AE',chr(195).chr(166) => 'ae',
Now the characters are replaced automatically like this: Ð => D, ð => d, Þ => TH, þ => th, Æ => AE, æ => ae. I don't know if everyone has this problem, or if it just has to do with the character set settings on my webhost but this works great for me. No more manually fixing page slugs! I'm gonna create a bug report with WordPress and hopefully this will be accepted into the next version.
Few words in icelandic to help people find this post: Íslenskir stafir í wordpress linkum post slug permalink permalinks