Forgot your password?
typodupeerror

Comment PHP + MySQL for I18N (Score 1) 117

As a number of people have mentioned, Internationalization and localization can be an incredibly complex process.

Since you are working with an existing system, you don't have the option of designing in I18N support from the very beginning.

Get a good book.

I recommend "XML Internationalization and Localization" by Yves Savourel, and "Beyond Borders web globalization strategies" by John Yunker. Both the authors have been in the I18N business a long time. They know what they are talking about.

Choose your tools wisely.

Use MySQL 4.1 (or newer) --

Since MySQL 4.1, you have the option of choosing which character set to use on a per DB, per table, or per field basis. The simplest solution is to just make the entire DB use the UTF-8 character set (This may not be appropriate for reasons of optimization or other reasons).

Learn about Unicode/UTF-8. (Others have provided links)

Store your localized data in UTF-8. Using a single character set makes life much easier.

Use a fairly recent version of PHP --

PHP 4.1.1 (or newer) comes bundled with GNU Gettext.

GNU gettext

http://www.gnu.org/software/gettext/ You probably don't need to download it, since it should be included with your version of PHP. Just enable it in the php.ini, or compile it in from source.

GNU Gettext has been around for a number of years. It's fairly efficient, well maintained and has a larger user base. It basically makes use of mapping a reference ID and a language-locale to a string of text. It replaces the ID with the appropriate text in your template to create a finished document. Text for different language-locales are stored in separate files called PO files.

You will also want a PO file editor.

Here are a couple of articles on GNU Gettext

http://www.phpdig.net/ref/rn26.html
http://www.onlamp.com/pub/a/php/2002/06/13/php.htm l
http://www.uberdose.com/php/php-and-gettext-for-i1 8n/

If you are going to be using professional translators, you may want to consider XLIFF as a document exchange format. There are XLIFF to PO converters available.

You may be considering XML (XHTML, XSLT and XLIFF) for Internationalization. The PHP solution, using Sablatron, is not yet fully-baked. I would avoid it for a production system. It shows promise for the future. Plus, XLIFF is not recommended as a storage format. You'll probably find some performance issues if you try to use it as a direct data store.

Use templates, if at all possible.

You may not be able to use the same template for all language-locales, but they should work for most cases. If you have a BDI language, for example Arabic or Hebrew, would likely need a separate template.

Localize your CSS stylesheets.

You may have locale specific layout and formatting information in your stylesheets.

From a design point of view, consider using a combination of a Front Controller pattern to switch languages and a Page Controller pattern to apply the templates.

Where are you storing the article data? Is it in the MySQL DB, or is it in static files that are referenced by the DB? Focus most of your efforts on the part that is most critical, MySQL if most of the data is in the DB, or PHP if most of the data is static. But remember, you are going to have to internationalize both parts of your system.

Don't forget, text from many other languages takes up more space than english to say the same thing. Sometimes 30-50% more space. This can significantly impact layout in heading sections, column widths, and forms.

I18N and L10N become really fun when you start talking about more complicated cases.

Don't forget to localize dates and number group separators.

Are you using HTML FORMs for input?

If so, you need to identify the incoming character set, which can be tricky. Most modern browsers will return UTF8 if you specify the accept-charset in the Form itself. However, if a user has an older browser, or cut-and-pastes into the form, you could get anything.

Are you providing a search feature for your articles?

How will you handle diacritical characters (accent marks) in searches?

Sort order, for example for items in pulldown menus may change, or DB result sets may change depending on language-locale.

Don't forget to internationalize your error messages. Error messages should appear in the local language. This may include errors where you can't reach the database.

When it comes to translations, know your users. For example, if your target audience for Spanish is in Spain, don't use Latin American Spanish, and vis-versa. In fact, where there are significant differences between language locales, and you need to serve both, then consider having multiple translations in the same language, but for different locales.

There are so many other issues regarding internationalization and localization that you could write a book. In fact, people have written them (see recommendations above).

If you would like to discuss this further, I can be reached at jsudall*agirdev#com (replace the * with an @ sign and the # with a dot to get the correct address)

Slashdot Top Deals

"Take that, you hostile sons-of-bitches!" -- James Coburn, in the finale of _The_President's_Analyst_

Working...