[Bf-committers] Proposal for handling string encoding in blender.

Fri Aug 13 15:57:34 CEST 2010

Hi,

The draw code (blenfont) already support utf8, in fact, always work
with utf8 because a ascii string is a valid utf8 string, that is why I
use utf8 in the first place, but one thing is draw utf8 and another
edit/modify/change it.

So, if the proposal is just "limite" the input from the user, so we
don't allow invalid character, +1 for me.

But if the proposal is add full support to utf8 (internal, in DNA/RNA
and all the blender code) I will say -1 right now, I don't thing is
the right moment for this.

- Diego

On Fri, Aug 13, 2010 at 10:47 AM, Roger Wickes <rogerwickes at yahoo.com> wrote:
> I volunteered to look into this for scripts, and found that UTF8 encoding is a
> safe
> way to go. There are many string encoding/decoding standards/codecs, UTF, UCS,
> etc and variants
> within families. see
> http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings.
> The same situation for video codecs has happened for strings. geez. Anyhoo...
>
> Requirements: In general, I think we want encoding and decoding arbitrary binary
> strings
>
> into text strings that can be entered by any keyboard, saved and decoded
> losslessly in the
>
> blend file, displayed on the user's computer, safely sent by email, used as
> parts of URLs,
>
> or included as part of an HTTP POST request, be a valid filename, etc.
>
> I think that UTF8 would suit our purposes now and for the next decade or two.
> UTF-8 can encode any Unicode character.
>
> The downside is that because encoded strings may contain many zero bytes, the
> strings cannot be manipulated
>
> by normal C string handling for even simple operations such as copy. This means
> that a pass through
> the ENTIRE code base is needed to seek out all str functions and replace them
> with a call to encode/decode.
>
> in 2007, Python adopted UTF8 and recoded their base to use it.
> http://www.python.org/dev/peps/pep-3120/
> For a Py3 discussion, see http://www.python.org/dev/peps/pep-0383/. For
> displaying encoded strings,
>
> Py3k uses an pretty involved process: http://www.python.org/dev/peps/pep-3138/
> For the python code base itself, as of 2007, they also have issues and more
> questions than answers
> see http://www.python.org/dev/peps/pep-3131/ and the bottom line is: english
> normal characters.
>
>
> UTF16 is the ultimate alternative for international error messages, etc
> and is what is used in the Mac OSX and Windows.
> It can encode any glyph. There are some space saving advantages for UTF16, but
> only if the text is mostly glyphs.
> Characters U+0800 through U+FFFF use three bytes in UTF-8, but only two  in
> UTF-16. As a result, text in (for example) Chinese, Japanese or Hindi  could
> take more space in UTF-8 if there are more of these characters than there are
> ASCII characters.  This rarely happens in real documents, for example both the
> Japanese  and the Korean UTF-8 article on Wikipedia take more space if saved as
> UTF-16 than the original UTF-8 version
>
> --Roger
>
>
>
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>