Ned Batchelder: IronPython is weird

Have you fully understood how Python 2 and Python 3 deal with bytes and Unicode? Have you watched Pragmatic Unicode (also known as the Unicode Sandwich, or unipain) forwards and backwards? You're a Unicode expert! Nothing surprises you any more.

Until you try IronPython...

Turns out IronPython 2.7.7 has str as unicode!

C:\Users\Ned>"\Program Files\IronPython 2.7\ipy.exe" IronPython 2.7.7 (2.7.7.0) on .NET 4.0.30319.42000 (32-bit) Type "help", "copyright", "credits" or "license" for more information. >>> "abc" 'abc' >>> type("abc") <type 'str'> >>> u"abc" 'abc' >>> type(u"abc") <type 'str'> >>> str is unicode True >>> str is bytes False

String literals work kind of like they do in Python 2: \u escapes are recognized in u"" strings, but not "" strings, but they both produce the same type:

>>> "abc\u1234" 'abc\\u1234' >>> u"abc\u1234" u'abc\u1234'

Notice that the repr of this str/unicode type will use a u-prefix if any character is non-ASCII, but it the string is all ASCII, then the prefix is omitted.

OK, so how do we get a true byte string? I guess we could encode a unicode string? WRONG. Encoding a unicode string produces another unicode string with the encoded byte values as code points!:

>>> u"abc\u1234".encode("utf8") u'abc\xe1\x88\xb4' >>> type(_) <type 'str'>

Surely we could at least read the bytes from a file with mode "rb"? WRONG.

>>> type(open("foo.py", "rb").read()) <type 'str'> >>> type(open("foo.py", "rb").read()) is unicode True

On top of all this, I couldn't find docs that explain that this happens. The IronPython docs just say, "Since IronPython is a implementation of Python 2.7, any Python documentation is useful when using IronPython," and then links to the python.org documentation.

A decade-old article on InfoQ, The IronPython, Unicode, and Fragmentation Debate, discusses this decision, and points out correctly that it's due to needing to mesh well with the underlying .NET semantics. It seems very odd not to have documented it some place. Getting coverage.py working even minimally on IronPython was an afternoon's work of discovering each of these oddnesses empirically.

Also, that article quotes Guido van Rossum (from a comment on Calvin Spealman's blog):

You realize that Jython has exactly the same str==unicode issue, right? I've endorsed this approach for both versions from the start. So I don't know what you are so bent out of shape about.

I guess things have changed with Jython in the intervening ten years, because it doesn't behave that way now:

$ jython Jython 2.7.1b3 (default:df42d5d6be04, Feb 3 2016, 03:22:46) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_31 Type "help", "copyright", "credits" or "license" for more information. >>> 'abc' 'abc' >>> type(_) <type 'str'> >>> str is unicode False >>> type("abc") <type 'str'> >>> type(u"abc") <type 'unicode'> >>> u"abc".encode("ascii") 'abc' >>> u"abc" u'abc'

If you want to support IronPython, be prepared to rethink how you deal with bytes and Unicode. I haven't run the whole coverage.py test suite on IronPython, so I don't know if other oddities are lurking there.

Ned Batchelder: IronPython is weird

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List