Monday, 19 April 2004

これは日本語のテキストです。読めますか?

Let's see how Unicode and weblogs does with Japanese /images/emoticons/mozilla_laughing.gif

 

これは日本語のテキストです。読めますか?
Posted by david at 4:05 PM in Internationalization

Shaken. Not stirred.

Just in case you were wondering what cufflinks I am wearing today.

Posted by david at 3:17 PM in My Life With The Thrill Kill Kult

Iñtërnâtiônàlizætiøn

Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn
Posted by david at 1:36 PM in Internationalization

Iñtërnâtiônàlizætiøn category

Iñtërnâtiônàlizætiøn

 

All internationalization tests pass.
  • Entry title, text (includes filename on disk): Check
  • Category name and description (includes directory on disk): Check
  • Comments: Check (includes comment e-mail)
  • Trackbacks: Check (includes trackback e-mail)
  • Feeds: Valid
  • Editing via web-based administration interface: Check
  • Editing via blog client (ecto): Check
Hell, even the e-mail address obfuscator plugin works like a champ!

 

Let's see how a link back to Sam Ruby's Unicode and weblogs goes.

 

My original frustration with URI encoding in Tomcat 5 for reference.
Posted by david at 1:02 PM in Internationalization

URI encoding in Tomcat 5

Good morning campers! Everyone all bright eyed and bushy tailed?

If you're using Tomcat 5, say Tomcat 5.0.19, and you're having issues with international characters in your URIs, then you might want to look at your server.xml file and check out the settings for various defined <Connector .../> elements. Why? According to the URIencoding attribute documentation for this element in the Tomcat 5 connector documentation, "This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.". A simple fix as given below was all this server needed to be happy as demonstrated hm'yah. /images/emoticons/mozilla_laughing.gif

<Connector port="8009"
   enableLookups="false" redirectPort="8443" debug="0"
   protocol="AJP/1.3" URIEncoding="UTF-8"/>

The default for this option, IMHO, should be UTF-8 and not ISO-8859-1. Did I dream it that there was a relevant W3C specification where it was specified that UTF-8 should be the default encoding used for URIs? Maybe. I'm looking now, but if you know in particular, point me at it and I'll update this entry appropriately.

Update: Character Encoding in URI references. So, you still get the restricted US-ASCII subset allowed in URIs, but the encoding of the characters to bytes is done using UTF-8.

  • 1. Each disallowed character is converted to UTF-8, resulting in one or more bytes.
  • 2. The resulting bytes are escaped using the URI escaping mechanism (that is, each byte is converted to %HH, where HH is the byte value expressed using hexadecimal notation).
  • 3. The original character is replaced by the resulting character sequence.
And wouldn't you know it that the reference I was originally looking for was in the javadocs for java.net.URLEncoder#encode(String s, String enc). The specific reference is Non-ASCII characters in URI attribute values.
Posted by david at 10:47 AM in java ... just java

Testing Iñtërnâtiônàlizætiøn in filename

Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn Iñtërnâtiônàlizætiøn
Posted by david at 9:51 AM in Evil Experiments
« First  « Prev   1 2 3 4 5   Next »  Last »
« April »
SunMonTueWedThuFriSat
    123
45678910
11121314151617
18192021222324
252627282930