[Bf-committers] Proposal for handling string encoding in blender.

Campbell Barton ideasman42 at gmail.com
Fri Aug 13 14:32:52 CEST 2010


@Remo, I google'd UTF8 incompatible characters and found this one :),
also when I try enter it into the python console I get this error.

>>> "numéro"
Traceback (most recent call last):
  File "/b/release/scripts/op/console_python.py", line 134, in execute
    line = line_object.body
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6:
invalid data

@Elia, I asked this in #python, couldn't it just try decode with any
working codec, but was told blender was broken because the "blend"
file should store its own encoding.
I really don't want to go there, in simple cases it can work but gets
way more complicated with library linking, it you might link libraries
which are a different encoding, then we have to store an encoding per
library, then making local has to convert between encodings, or
directly linking an object into a blend which has a different encoding
..... I think this is more trouble then its worth.

So back to detecting the encoding yes I guess its possible but would
be like this in python:
try:
   return decode(...)
except:
  try:
   return decode(...)
........
This is really ugly in that it sets the exception in every case where
the first decode fails, perhaps there is some better way to do this.

Or we could try be smart but Im not sure its that easy:
http://stackoverflow.com/questions/1775622/detect-utf-16-file-content

Elia, could you look into any other apps that do this?

Also, how limiting is it to set on utf8, is blender suddenly horrible
to use in some languages???, is blender at all usable in these
languages now???

My impression is that we don't have the developer interest/resources
to properly add support, and since blend files are so portable we it
becomes more tricky so we could at least make user experience better
by being consistent and not failing or allowing files with mixed
encodings.

On Fri, Aug 13, 2010 at 8:47 PM, Elia Sarti <vekoon at gmail.com> wrote:
> The point is that different systems use different encodings. UTF-8 is
> just one way to encode multibyte characters, UTF-16 is another for
> instance (and there are hundreds others).
>
> Means if you save "numéro" in your .blend on an OS using utf-8 and
> someone opens it in one using utf-16 then the string is incompatible.
>
> I say +1 to this with an addendum.
> To some extent encoding can be detected and thus converted, would it be
> hard to do so for strings in the .blend? Of course only for a limited
> collection, I'd say utf-8 <-> utf-16 would probably suffice as I believe
> many linux distros use utf-8 while windows and mac use utf-16, so this
> would cover the majority of cases.
>
>
> Remo Pini wrote, on 08/13/2010 10:56 AM:
>> Maybe I'm dense, but why would the letter "é" not be UTF-8 compliant? According to my sources that would be é = c3 a9 (LATIN SMALL LETTER E WITH ACUTE) which is perfectly fine...
>>
>> So mesh.name = "numéro" should NOT raise an error IMHO if the system is truly UTF-8.
>>
>> Cheers
>>
>> Remo
>>
>>
>>> -----Original Message-----
>>> From: bf-committers-bounces at blender.org [mailto:bf-committers-
>>> bounces at blender.org] On Behalf Of Campbell Barton
>>> Sent: Freitag, 13. August 2010 11:40
>>> To: bf-blender developers
>>> Subject: [Bf-committers] Proposal for handling string encoding in blender.
>>>
>>> At moment we have have a problem with decoding strings in blender
>>> which is caused by blend files not having any encoding information.
>>> We have a number of reports about this in the tracker - eg
>>> https://projects.blender.org/tracker/index.php?func=detail&aid=23285&gro
>>> up_id=9&atid=498.
>>> This also gave us trouble for models given to us for the durian
>>> sprint, I ended up having to manually rename objects so scripts would
>>> work.
>>>
>>> In practice this means the following can raise an error:
>>>   fn = bpy.data.filename # the file path may not be utf8 compatible
>>>   print(bpy.context.object.name) # the person who made this file may
>>> have cyrillic characters which blender lets them enter.
>>>
>>> If your not into scripting this means simple things like importing a
>>> file from your home directory can be impossible if your name isnt utf8
>>> compliant, so I dont think this is a problem we can ignore.
>>>
>>> The stupid/simple solution is not to use strings, just use byte arrays
>>> all over - then you never have any encoding problems.
>>> Normally I like stupid solutions but it means every string needs to
>>> have a 'b' prefix. eg:  b"Some String", and I think this is too
>>> annoying&  ugly.
>>>
>>> We could just enforce one encoding for all blend files except as
>>> hinted at earlier this wont work for peoples filepaths are not utf8
>>> compatible.
>>>
>>> ---
>>>
>>> So heres my proposed solution:
>>> (in brief.  strings: utf8, except for filepaths: fs-natve)
>>>
>>> * Enforce UTF8 for all blenders internal strings, this can be handled
>>> at the UI&  python level so that you are not allowed to set
>>> utf8-incompatible strings.
>>>   - This means that if you enter a non-utf8 compatible character in an
>>> object name it will reject the name.
>>>   - If you try to do: mesh.name = "numéro" # an error will be raised.
>>>
>>> * filenames can't have this limitation imposed because blender needs
>>> to be able to reference paths on the users system which we have no
>>> control over, however we have a FILENAME type in RNA, we can exempt
>>> these strings from the utf8 check, instead these need to follow the
>>> filesystems encoding.
>>>   - Python can handle this with - Py_FileSystemDefaultEncoding
>>>   - This means the string encoding for a file path and an object name
>>> for instance may differ.
>>>
>>> The flaw in this solution is that someone may create a blend file with
>>> an image in //numéro/foo.png, then they give this to someone else who
>>> can open the file, but get a python error when they try to export it
>>> as an OBJ.
>>>
>>> I think this is an acceptable limitation, we can just tell users that
>>> if they want to share their projects to use ascii filenames, people
>>> already need to use relative paths if they share projects in that ase
>>> the name of their home directory wont matter.
>>> Its a lot better then the current state which stops people from
>>> exporting a file to their own home directory (under certain
>>> conditions).
>>>
>>> If this is ok I can go ahead with this before the next release, its
>>> not really all that much work but since this limits mesh/object/bone
>>> names, and the string input field its not just the python api thats
>>> affected.
>>>
>>> --
>>> - Campbell
>>> _______________________________________________
>>> Bf-committers mailing list
>>> Bf-committers at blender.org
>>> http://lists.blender.org/mailman/listinfo/bf-committers
>>>
>> _______________________________________________
>> Bf-committers mailing list
>> Bf-committers at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-committers
>>
>>
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
>



-- 
- Campbell


More information about the Bf-committers mailing list