Sunday, March 14, 2010

Mysql - Use utf8 for supporting multibyte languags.


Some languages such as Chinese will use many bytes to store single character.

Latin1 is the default charset in mysql.

Latin1 won't support the multibyte characters. i-e if we store the chinese characters in mysql server with the default charset, the characters won't appear correctly.

So, if we are going to store multibyte characters in mysql database we need to specify the charset as utf8.

If we need to migrate the existing latin1 database into utf8 database we can follow any one of the below approaches.
- We can use sql such as "ALTER DATABASE DEFAULT CHARSET=utf8" to migrate the database charset. But in this case, the existing table will continue to use latin1 only. So we need to again alter each table and the required columns.

-Or, we can just export the latin1 database into a file and then import it after replacing "DEFAULT CHARSET=latin1" into "DEFAULT CHARSET=utf8".

Migrate into utf8 charset if only it is required. Because, it will occupy 3 bytes for storing single character. And, case insensitive search query is difficult with utf8. And, we need to make some required changes in sql queries also. For example, "select * from table_name1 where column_name1='value1'" won't work correctly in UTF8 charset.
Instead of this sql we should use "select * from table_name1 where column_name1 like 'value1'"

i-e "=" should be replaced with "like".

This article will give more details about migrating mysql latin1 charset into utf8.

More Articles...


You can bookmark this blog for further reading, or you can subscribe to our blog feed.

No comments:

Search This Blog