[Bf-committers] Proposal for handling string encoding in blender.

Elia Sarti vekoon at gmail.com
Fri Aug 13 16:35:19 CEST 2010

Let's clarify some things.

First of all UTF-8 was specifically designed to replace ASCII 
painlessly, thus it does not contain zero (or null) bytes, meaning you 
can always have a C string to hold UTF-8 chars.

Also UTF-8 can encode ANY character representable in the Unicode 
standard, which means most characters in the world can be encoded in UTF-8.
The "é" character is simply not ASCII but latin-1. The fact that you 
don't notice the difference when using it in C is that it only takes 1 
byte to store this value, but ASCII is actually a 7-bit standard, you 
get the extra bit for free because the minimum data size in C is 8-bits.

That being said, I don't think it's a good idea to have encoding stored 
in the .blend, this would be useless without full unicode support.

The easiest thing to do is to simply assume we always use UTF-8 
internally, as it's ASCII compatible, any other encoding would require 
too much work.

What I was suggesting is that we try to detect if certain strings (like 
file paths) are UTF-8 and if they aren't see if they are UTF-16/32 and 
convert them before storing. The same in reverse from a file path in the 
.blend to one for external usage.
So if we have stored a relative path like //tèst/file.png (in UTF-8) and 
the OS wants UTF-16 we can convert this before passing the path to the 
OS and because most OSes use some version of UTF, by handling all of 
them we're reasonably safe we won't break too many file tree setups.

Of course there are other apps already doing this, like all web 
browsers. For instance Firefox, which even provides a library for 
encoding detection (independent C++ library usable from C too):


Note that I've never actually tried it, but I guess it has to work 
considering it's used in Firefox. The article is dated but I think it's 
still valid, the source is here I believe:


More information about the Bf-committers mailing list