Korpela, Unicode Explained (O’Reilly, 2006)

Korpela’s Unicode Explained was originally intended for three audiences, I think. The first was the casual user who might need to make some basic use of Unicode in everyday life (entering a little bit of Unicode in Windows, for example). The second was the advanced user who might need to draw on some Unicode wizardry in a few specialized cases: programming, HTML or other markup, or the internet. The final audience is those wanting an introduction to the principles behind Unicode without making a brute force attack on the Unicode Standard itself. The passage of time means that the book may now be less useful for either casual users or advanced users (who really do need current information). Nevertheless, Korpela’s work remains helpful, and his discussion of the theoretical side of Unicode is excellent, both clear and nuanced.

Major changes have happened in the Unicode world since this book was written. Characters and scripts have been added of course, but the real difference is that Unicode support is much more pervasive than it was when the book was written. Unicode is much, much more common on web sites now, Emacs 24 finally has long-overdue support for certain Unicode features such as bidi text (though vim is still a holdout in this case), and most programming languages have come to adopt Unicode as fundamental to the way that strings work (such as the adoption of Unicode in Python 3).

Very many of these changes occurred after the 2006 publication date of Korpela’s book, and this means that at points the book reads like a period piece– the changes were in the foreseeable future, but not there yet. This also means that some parts of the book are very out of date. The section on Perl, for example, is completely out of date: however sluggish it may have seemed in 2006, Perl has now adopted Unicode to such an extent that it’s even changed some of the fundamental ways Perl works. Long-beloved character class shortcuts speak Unicode now, which means it’s often less trouble to just use full character classes. (For more on Unicode in Perl, check out the relevant sections in the llama and– if you’re brave of heart– in the camel.)

Much of the utility of a book like this is expert discussion of such advanced topics; having to check the book’s information against more recent sources defeats the purpose. On the other hand, my sense is that because much of the core structure of Unicode was in place in 2006, many of the basic ways of working with Unicode are the same. Though I’m no expert, my sense is that working with Unicode in HTML is unchanged, even for complicated stuff (bidi), while working with Unicode in MS Word can still be a pain.

But the real reason to read this book isn’t so much the practical advice– much of which you’d be better off looking up on StackExchange anyway– but the lucid explication of the structure and design of the Unicode framework. Not all of the explanations are equally clear– I found the first chapter a bit muddled, oddly– but Korpela remains a useful guide to the Unicode terrain.

One thought on “Korpela, Unicode Explained (O’Reilly, 2006)

  1. Pingback: An Introduction to working with RTL languages and bidirectional text | mrgah

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s