to u"" or not to u"", that is the question

Posted on
Sun Mar 12, 2017 9:17 am
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

to u"" or not to u"", that is the question

I've looked at several 3rd-party plugins, and at the Indigo 6 (even though I'm running 5) iTunes plugin for examples of when to use u"something" versus "something". I see mixed things so I just have to ask… "to be or not to be, that is the question."

Some of the examples I've seen:

k_name = u"value"
k_name = "value"

Code: Select all
prop[u"name"] = value
prop["name"] = value

self.debugLog(u"Log message")
self.debugLog("Log message")

list = [u"one", u"two", u"three"]
list = ["one", "two", "three"]

To me, things that should not change, such as property keys references for checkboxes and hidden things from Devices.xml, don't need the u"" treatment. Values being entered into text fields with the UI do, unless the values are limited to only specific ones like in a validation list (which I don't think need the treatment either).

And if an entered value can contain unicode characters, how do I assure it's stored into and retrieved/logged from the pluginProps or other areas as such?

How important is it that log messages and other literals use the u"" format?

To actually test some of this. I created a device with unicode characters in its name and boy did I see errors in the Indigo log from my plugin! Any place logging is done for the entire Device object (such as in the following code) it fails, even if I try and unicode() it:
Code: Select all
print (u"device: %s" % (unicode(dev)))

So I experimented and get:
Code: Select all
>>> dev=indigo.devices[1190300210]
>>> print dev
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)
>>> print unicode(dev)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)
>>> print dev.name
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)
>>> print unicode(dev.name)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)
whereas on this smaller piece of the Device object I can make it work:
Code: Select all
>>> print dev.name.encode("utf-8")

So how do I wrap my debugging statements with references to the entire Device object to make them work? Or are device names supposed to be ASCII only (seems unlikely) so this never shows up in real-world cases?

I hope that the behavior isn't just a version 5 artifact that I'm seeing. I'd hate to waste too many of your brain cycles if that's the case.

Any advise would be appreciated.

/Marty

Posted on
Sun Mar 12, 2017 10:48 am
jay (support) offline
Site Admin
User avatar
Posts: 18220
Joined: Mar 19, 2008
Location: Austin, Texas

Re: to u"" or not to u"", that is the question

Strings in Python 2 are a mess. In strings that are embedded you can use either unless you have explicit unicode in them. Everything Indigo touches is treated as a unicode. The print problem you're seeing is because it automatically calls the str() method, which is not unicode friendly. Try print unicode(dev) instead.

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Posted on
Sun Mar 12, 2017 12:13 pm
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

Re: to u"" or not to u"", that is the question

jay (support) wrote:
Strings in Python 2 are a mess. In strings that are embedded you can use either unless you have explicit unicode in them. Everything Indigo touches is treated as a unicode. The print problem you're seeing is because it automatically calls the str() method, which is not unicode friendly. Try print unicode(dev) instead.

I understand the need for unicode, how its represented, etc. as I've worked with I18N for more years than I care to count. It's the Python manipulation/conversions that has me confused. That's why I looked at other plugins for examples.

I've tried print unicode(dev) as I shown in my first set of example outputs and get the exact same error:
Code: Select all
>>> print unicode(dev)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)
and the same happens with self.debugLog() and other logging methods. :cry:

Unfortunately, the Device object doesn't have an encode method — that's the only way I've had work to get to the device name (by itself).

Are device names allowed to be in unicode? Maybe I'm just hitting an edge case?

So when a text field from either the plugin or device config is presented to a plugin method it will be represented as u"value" and I can just shove it into self.whatever or a variable as-is to allow a later retrieval as a unicode object?

/Marty

Posted on
Sun Mar 12, 2017 12:22 pm
matt (support) offline
Site Admin
User avatar
Posts: 21417
Joined: Jan 27, 2003
Location: Texas

Re: to u"" or not to u"", that is the question

Device names can be unicode. I'm not sure why the print unicode() case is failing. I just tested it myself here without an error:

Code: Select all
>>> print unicode(dev.name)
ñánó

What specific device name are you trying to use?

Image

Posted on
Sun Mar 12, 2017 12:31 pm
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

Re: to u"" or not to u"", that is the question

matt (support) wrote:
Device names can be unicode. I'm not sure why the print unicode() case is failing. I just tested it myself here without an error:

Code: Select all
>>> print unicode(dev.name)
ñánó

What specific device name are you trying to use?

The device's name I am using is "Heizung Büro Edgeschoss" without the quotes. It's a string I randomly picked up from a forum post.

Is this an issue that only happens in Python 2.5/2.6 perhaps? I'm at a loss.

/Marty

Posted on
Sun Mar 12, 2017 12:40 pm
matt (support) offline
Site Admin
User avatar
Posts: 21417
Joined: Jan 27, 2003
Location: Texas

Re: to u"" or not to u"", that is the question

Ah, I didn't catch that it sounds like you aren't running Indigo 7. I just tried it under Indigo 7 with that name and did not get an error:

Code: Select all
>>> print unicode(dev.name)
Heizung Büro Edgeschoss

I don't recall if the fix is because Indigo 7 uses python 2.7, or if we changed something. We definitely have made changes/fixes to Indigo's plugin architecture and how it handles unicode in some cases but I don't recall if that would explain the specific error you are seeing or not. My hunch is it was a change we made but I don't know what version it was made in.

Image

Posted on
Sun Mar 12, 2017 12:43 pm
matt (support) offline
Site Admin
User avatar
Posts: 21417
Joined: Jan 27, 2003
Location: Texas

Re: to u"" or not to u"", that is the question

Found it. That was fixed in Indigo 6.0 beta 2.

Image

Posted on
Sun Mar 12, 2017 12:46 pm
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

Re: to u"" or not to u"", that is the question

Thanks, Matt.

I don't have Indigo 7 to try, but I just tried with Indigo 6.1.11 and have the same results as with Indigo 5. Strange, and frustrating.

/Marty

Posted on
Sun Mar 12, 2017 12:49 pm
matt (support) offline
Site Admin
User avatar
Posts: 21417
Joined: Jan 27, 2003
Location: Texas

Re: to u"" or not to u"", that is the question

Same exact error even when logging to the Event Log window? If so, from Indigo 6 can you copy/paste the Event Log window results of:

indigo.server.log(u"test1: dev name is" + dev.name)
indigo.server.log(u"test2: dev name is" + unicode(dev.name))

Image

Posted on
Sun Mar 12, 2017 1:15 pm
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

Re: to u"" or not to u"", that is the question

matt (support) wrote:
Same exact error even when logging to the Event Log window? If so, from Indigo 6 can you copy/paste the Event Log window results of:

indigo.server.log(u"test1: dev name is" + dev.name)
indigo.server.log(u"test2: dev name is" + unicode(dev.name))

Okay, that's a new one on me. I thought that print in the interactive console would be the same as the indigo.server.log but they aren't. Here's the output to the log using Indigo 6:
Code: Select all
  Interactive Shell               test1: dev name is Heizung Büro Edgeschoss
  Interactive Shell               test1: dev name is Heizung Büro Edgeschoss

And I can output unicode(dev) as well.

Looks like my choices are:

    support Indigo 5 with full debugging available by limiting to non-unicode device names

    support Indigo 5 without full debugging (and errors in the log) if unicode device names are used

    support Indigo 5 without full debugging (and no errors in the log) if unicode device names are used and I put try: … except: around the logging of dev.name or dev

    support only Indigo 6 and higher (cuts me out from using the plugin!)
Since there might only be a dozen users of the plugin anyway (I have no way of knowing) I'll need to ponder what direction to take.

/Marty

Posted on
Sun Mar 12, 2017 2:21 pm
jay (support) offline
Site Admin
User avatar
Posts: 18220
Joined: Mar 19, 2008
Location: Austin, Texas

Re: to u"" or not to u"", that is the question

Try:

Code: Select all
print d.name.encode("latin1", errors="ignore")


That should replace any UTF8 characters with a "?". Close enough for the user to be able to identify the device if it's only logging that's an issue.

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Posted on
Mon Mar 13, 2017 3:01 am
MartyS offline
Posts: 86
Joined: May 06, 2008
Location: Charlotte, North Carolina

Re: to u"" or not to u"", that is the question

jay (support) wrote:
Try:

Code: Select all
print d.name.encode("latin1", errors="ignore")


That should replace any UTF8 characters with a "?". Close enough for the user to be able to identify the device if it's only logging that's an issue.

I had to change the line to:
Code: Select all
print d.name.encode("latin1", "ignore")

But other than that, the suggestion works—thanks! That would decrease the number of errors logged in v5, but the bigger issue is when I dump all of the Device object into the log which works in v6 but not v5 for unicode values.

I'm going to shelve the issue and go with what works for the majority and if someone needs a unicode device name in v5 with this plugin then I'll look back into it.

/Marty

Page 1 of 1

Who is online

Users browsing this forum: No registered users and 6 guests