Quantcast
Channel: Planet Python
Viewing all 22631 articles
Browse latest View live

NumFOCUS: Xarray joins NumFOCUS Sponsored Projects


Wingware News: Wing Python IDE 6.1.1: September 19, 2018

$
0
0
This release improves PEP 8 reformatting, streamlines remote agent installation, improves robustness of remote development in the face of network failures, adds support for debugging PythonQt, optimizes multi-process debugging, and makes a number of other minor improvements.

Wallaroo Labs: Make Python Pandas go fast

$
0
0
Some Background Suppose you have a Data Analysis batch job that runs every hour on a dedicated machine. As the weeks go by, you notice that the inputs are getting larger and the time taken to run it gets longer, slowly nearing the one hour mark. You worry that subsequent executions might begin to ‘run into’ each other and cause your business pipelines to misbehave. Or perhaps you’re under SLA to deliver results for a batch of information within a given time constraint, and with the batch size slowly increasing in production, you’re approaching the maximum allotted time.

Codementor: Quick Guide: Celery Logging

$
0
0
This post was originally published on Distributed Python (https://www.distributedpython.com/) on August 28th, 2018. Python logging handlers define what happens with your log messages. For instance,...

py.CheckIO: Python in science

$
0
0
python in science

The whole world is confidently stepping into the digital age and the tools that scientists use for research are also changing. Electronic microscopes and telescopes, sending and analysis of information collected during the experiments, in electronic form and much more. In the article Python in science, you will learn why this language is ideal for such purposes.

Chris Warrick: Python Hackery: merging signatures of two Python functions

$
0
0

Today’s blog post is going to contain fairly advanced Python hackery. We’ll take two functions — one is a wrapper for the other, but also adds some positional arguments. And we’ll change the signature displayed everywhere from the uninformative f(new_arg, *args, **kwargs) to something more appropriate.

This blog post was inspired by F4D3C0D3 on #python (freenode IRC). I also took some inspiration from Gynvael Coldwind’s classic Python 101 (April Fools) video. (Audio and some comments are in Polish, but even if you don’t speak the language, it’s still worth it to click through the time bar and see some (fairly unusual) magic happen.)

Starting point

defold(foo,bar):"""This is old's docstring."""print(foo,bar)returnfoo+bardefnew(prefix,foo,*args,**kwargs):returnold(prefix+foo,*args,**kwargs)

Let’s test it.

>>> o=old('a','b')a b>>> n=new('!','a','b')!a b>>> print(o,n,sep=' - ')ab - !ab>>> help(old)Help on function old in module __main__:old(foo, bar)    This is old's docstring.>>> help(new)Help on function new in module __main__:new(prefix, foo, *args, **kwargs)

The last line is not exactly informative — it doesn’t tell us that we need to pass bar as an argument. Sure, you could define new as just (prefix, foo, bar)— but that means every change to old requires editing new as well. So, not ideal. Let’s try to fix this.

The existing infrastructure: functools.wraps

First, let’s start with the basic facility Python already has. The standard library already comes with functools.wraps and functools.update_wrapper.

If you’ve never heard of those two functions, here’s a crash course:

defdecorator(f):@functools.wraps(f)defwrapper(*args,**kwargs):print("Inside wrapper")f(*args,**kwargs)returnwrapper@decoratordefsquare(n:float)->float:"""Square a number."""returnn*n

If we try to inspect the square function, we’ll see the original name, arguments, annotations, and the docstring. If we ran this code again, but with the @functools.wraps(f) line commented out, we would only see wrapper(*args, **kwargs).

This approach gives us a hint of what we need to do. However, if we apply wraps (or update_wrapper, which is what wraps ends up calling) to our function, it will only have foo and bar as arguments, and its name will be displayed as old.

So, let’s take a look at functools.update_wrapper. What does it do? Two things:

  • copy some attributes from the old function to the new one (__module__, __name__, __qualname__, __doc__, __annotations__)
  • update __dict__ of the new function
  • set wrapper.__wrapped__

If we try to experiment with it — by changing the list of things to copy, for example — we’ll find out that the annotations, the docstring, and the displayed name come from the copied attributes, but the signature itself is apparently taken from __wrapped__.

Further investigation reveals this fact about inspect.signature:

inspect.signature(callable, *, follow_wrapped=True)

New in version 3.5:follow_wrapped parameter. Pass False to get a signature of callable specifically (callable.__wrapped__ will not be used to unwrap decorated callables.)

And so, this is our end goal:

Craft a function with a specific signature (that merges old and new) and set it as new.__wrapped__.

But first, we need to talk about parallel universes.

Or actually, code objects.

Defining a function programmatically

Let’s try an experiment.

>>> deffoo(bar):pass>>> foo.__wrapped__=lambdax,y:None>>> help(foo)foo(x, y)

So, there are two ways to do this. The first one would be to generate a string with the signature and just use eval to get a __wrapped__ function. But that would be cheating, and honestly, quite boring. (The inspect module could help us with preparing the string.) The second one? Create code objects manually.

Code objects

To create a function, we’ll need the types module. types.FunctionType gives us a function, but it asks us for a code object. As the docs state, Code objects represent byte-compiled executable Python code, or bytecode.

To create one by hand, we’ll need types.CodeType. Well, not exactly by hand — we’ll end up doing a three-way merge between source (old), dest (new) and def _blank(): pass (a function that does nothing).

Let’s look at the docstring for CodeType:

code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
    constants, names, varnames, filename, name, firstlineno,
    lnotab[, freevars[, cellvars]])
Create a code object.  Not for the faint of heart.

All of the arguments end up being fields of a code objects (name starts with co_). For each function f, its code object is f.__code__. You can find the filename in f.__code__.co_filename, for example. The meaning of all fields can be found in docs for the inspect module. We’ll be interested in the following three fields:

  • argcount— number of arguments (not including keyword only arguments, * or ** args)
  • kwonlyargcount— number of keyword only arguments (not including ** arg)
  • varnames— tuple of names of arguments and local variables

For all the other fields, we’ll copy them from the appropriate function (one of the three). We don’t expect anyone to call the wrapped function directly; as long as help and inspect members don’t crash when they look into it, we’re fine.

Everything you need to know about function arguments

>>> deff(a,b=1,c=2,*,d=3):pass>>> inspect.getfullargspec(f)FullArgSpec(args=['a', 'b', 'c'], varargs=None, varkw=None, defaults=(1, 2), kwonlyargs=['d'], kwonlydefaults={'d': 3}, annotations={})

A function signature has the following syntax:

  1. Any positional (non-optional) arguments
  2. Variable positional arguments (*x, name stored in varargs)
  3. Arguments with defaults (keyword-maybe arguments); their value is stored in __defaults__ left-to-right
  4. Keyword-only arguments (after an asterisk); their values are stored in a dictionary. Cannot be used if varargs are defined.
  5. Variable keyword arguments (**y, name stored in varkw)

We’re going to make one assumption: we aren’t going to support a source function that uses variable arguments of any kind. So, our final signature will be composed like this:

  1. dest positional arguments
  2. source positional arguments
  3. dest keyword-maybe arguments
  4. source keyword-maybe arguments
  5. dest keyword-only arguments
  6. source keyword-only arguments

That will be saved into co_names. The first two arguments are counts — the first one is len(1+2+3+4) and the other is len(5+6). The remaining arguments to CodeType will be either safe minimal defaults, or things taken from one of the three functions.

We’ll also need to do one more thing: we must ensure __defaults__, __kwdefaults__, and __annotations__ are all in the right places. That’s also a fairly simple thing to do (it requires more tuple/dict merging). And with that, we’re done.

Final results

Before I show you the code, let’s test it out:

# old defined as before@merge_args(old)defnew(prefix,foo,*args,**kwargs):returnold(prefix+foo,*args,**kwargs)

And the end result — help(new) says:

new(prefix, foo, bar)
    This is old's docstring.

We did it!

The code is available on GitHub and on PyPI (pip install merge_args). There’s also an extensive test suite.

PS. you might be interested in another related post of mine, in which I reverse-engineer the compilation of a function: Gynvael’s Mission 11 (en): Python bytecode reverse-engineering

Stack Abuse: Course Review: Master the Python Interview

$
0
0

Introduction

Course Review: Master the Python Interview

This article will be a continuation of the topic of my prior article Preparing for a Python Developer Interview where I gave my opinions and suggestions that I feel will put you in the best position to out perform other developers competing for a Python developer role. In this article I will be reviewing the popular Udemy course on preparing for a Python developer interview by Nicolas Georges called Master the Python Interview - get the senior & well paid job.

Structure and Topics Covered in the Course

The structure of this course is composed of sections covering the topics listed below where each section ends in either one or more exercises or quizzes to reinforce the material.

The topics covered by Nicolas in his course are as follows:

  • Collections with Lists and Tuples
  • Intro to OOP in Python
  • Unit testing
  • Idiomatic Python - Ask for forgiveness not permission
  • Must know Python Programming Constructs
  • Must know Python Data Structures
  • More on OOP in Python
  • Data Structure Comprehensions

In the sections that follow I briefly discuss the content of each section along with things that I liked and did not like about each. I conclude with an additional section discussing things that I feel would benefit this Udemy course if they were included or done differently.

Before I get into the individual sections I would like to note that this course was taught using "legacy" Python 2.7 which I feel is a bit of a flaw in the course. The Python community is just over a year away from completely loosing support from the Core developer team with regards to maintenance of Python 2. For this reason I feel it necessary for Python content producers to 100 percent adopt and use Python 3.

Collections with Lists and Tuples

Collections are an enormously important topic in all high level programming languages and Python is certainly no exception to this, so I am quite glad that they were covered in this course. Nicolas does a good job of differentiating between immutability and mutability in relation to lists and tuples which, in my opinion, are the primary differentiators between the two.

Unfortunately there was a charge made about implementation of lists and tuples that I found to be either very misleading or flat out incorrect. In this section Nicolas states that "lists contain homogenous data types while tuples are meant to contain heterogenous data types". At first I thought this was simply a harmless gaffe that all of are susceptible to in life, but later in this section it was reiterated and it was even reenforced in one of the section ending quizzes.

I would like to take some time to correct this statement as I believe that Nicolas was probably trying to describe a common usage trend where lists often contain homogenous data types while tuples can often contain heterogenous data types. In my experience it is true that when I use lists the data in them are usually of the same type. However, it is important to know that both lists and tuples can in fact contain different data types as well as the same.

Here is an example of lists and tuples containing the same data types which are strings representing the letters of my name:

>>> x = ['a','d', 'a', 'm']
>>> y = ('a', 'd', 'a', 'm')

And here is an example of lists and tuples containing different data types of a string representing my name and an integer representing my age:

>>> x = ['Adam', 30]
>>> y = ('Adam', 30)

Intro to OOP in Python

In this section Nicolas explains a very important feature of the Python programming language in that every single element of the language is in the form of an object. From this you can extrapolate that the language is a fully object oriented language. Nicolas goes on to demonstrate and explain the usage and usefulness of many built-in functions that enable to the programmer to inspect objects like dir(), id(), help() as well as others.

However, Nicolas does contradict his earlier statements about homogeny / heterogeneity of data types in lists during this section, which I hope can get cleaned up as I believe most early Python users would become quite confused at this point of the course.

Unit testing

I was most impressed with this section of the course. I feel many, if not most, of the courses on programming often fail to address the importance of testing one's code. Nicolas does an excellent job covering the basics of the unittest module and even devotes considerable time explaining how to use test driven development and why it is important.

Idiomatic Python - Ask for forgiveness not permission

This is the part of the course where Nicolas begins to transition into common conventions, or idioms, of the Python programming community. I do not want to steal Nicolas's thunder by going too far into the explanation of the material covered here because I believe he does a great job explaining what it means to "ask for forgiveness and not permission" and how this convention differs in Python as opposed to other languages, such as Java.

Must know Python Programming Constructs

I was a little confused at why this section of the courses exists and why it was placed in the middle of the course. The topics covered in this section go over the very basic syntactic constructs like boolean expressions, conditionals, and loops. For a course targeting mid to senior level Python developers it felt like this section should be assumed knowledge, but I guess for completeness it is not inappropriate to include it. I do think it would perhaps make better sense to put this material at the beginning of the course, however.

With the above said about this section I do want to leave my review of this section with something that I did find quite positive. I liked that Nicolas explained what it meant in the language to be considered truthy / falsy, to steal a term from the Javascript community. Nicolas did a great job of taking the time to describe the usefulness of the built-in bool() function to test for boolean equivalents to commonly used conditional expressions to test values such as empty lists, empty strings, None, and others.

Must know Python Data Structures

Here Nicolas introduces an additional collection data type, which is known as a set and follows with a comparisons of sets and lists. During this explanation he covers the notion of what it means to be hashable.

However, one thing that I felt was missing here was an explanation of performance benefits of searching a set for inclusion of a value as compared to a list, which is a major benefit of using sets.

More on OOP in Python

This section circles back around to further elaborate on OOP in Python. Nicolas further explains the syntax and meaning of defining a custom class and creating objects from it. He introduces the concepts of defining custom instance attributes and methods as well as goes into what magic methods are and how they are used. In general I felt this section was well covered and is important knowledge for a mid-to-senior level Python developer.

Data Structure Comprehensions

The course finishes with a section on one of my favorite Pythonic features, comprehensions. Here Nicolas demonstrates how comprehensions are used and why you might use them when working with lists and dictionaries.

Topics to Add that would Benefit the Course

Given that the title of this course indicates that its target audience is geared towards mid-to-senior level Python developer roles I feel that not enough content was aimed at more describing more mid-level to advanced features of the language. Below are a set of topics that I believe would elevate the course to better suit its target audience.

A. More idiomatic Python programming techniques are in order. An example of what I mean by this is simply unpacking of tuples and lists into component elements. I see this often demonstrated in advanced texts as well as blogs and personally find it to be congruent with the well known Python idiom that explicit is better than implicit.

I think a coding example would better demonstrate my argument here. Consider the case where you have a list of tuples where each tuple represents the length and width of a rectangle and you would like to iterate over them to calculate and display each one's area. I can think of two variations in which I might implement this: (i) one uses indexing of the tuple elements, and (ii) the other utilizes tuple unpacking into meaningfully named variables.

Using indexing:

>>> shapes = [(1,1), (2,2), (3,2)]
>>> for shape in shapes:
...     print "Area of shape %.2f" % (shape[0] * shape[1])
... 
Area of shape 1.00  
Area of shape 4.00  
Area of shape 6.00  

Using unpacking:

>>> for width, height in shapes:
...     print "Area of shape %.2f" % (width * height)
... 
Area of shape 1.00  
Area of shape 4.00  
Area of shape 6.00  

To me the second example that uses unpacking is more readable and demonstrates a greater idiomatic Python implementation.

B. A discussion of built-in Python functions that perform operations on collections would be a great addition to this course. Many of the built-in functions have been provided because they provide solutions to common programming problems, but have highly optimized implementations that often give significant performance boosts. Some of the built-in functions that I think would be worth mentioning are zip, filter, and map.

For example, say you want to filter a list of numbers and only select those that are even. I can think of two common approaches that would be taken. One that would use a loop to iterate over the items along with a conditional to test each number to see if it is even or not and when even add the number to a separate lists designated for the even numbers. This is likely to be the approach taken by a junior developer who is less familiar with the language. The second would be to use the built in filter() function along with a lambda function to test for even numbers.

In code these two approaches would look like so:

First method:

>>> nums = [1, 2, 3, 4, 5, 6, 7, 8]
>>> even_nums = []
>>> for num in nums:
...     if num % 2 == 0:
...             even_nums.append(num)
... 
>>> even_nums
[2, 4, 6, 8]

Second method:

>>> even_nums = filter(lambda num: num % 2 == 0, nums)
>>> even_nums
[2, 4, 6, 8]

C. Another topic that I think would be beneficial to add to the existing content would be coverage of some of the advanced collection data types such as named tuples and ordered dictionaries. I have often reached for a named tuple in cases where I wanted to represent a real world object but, would be maybe awkward or overkill to use a custom class or the overly used dictionary. Not only are they a great way to organize data representing something in the real word but they have excellent performance, particularly better than a dictionary.

D. Last but certainly not least I would really have liked to see mention of the differences between Python 2 and 3. In particular I feel it would have been important to give some pointers for migrating existing systems from Python 2 to Python 3 which is quickly becoming a priority item for many companies and leading to increased demand for senior Python developers.

Conclusion

In this article I have done my best to give a thorough and honest review of the Udemy course, Master the Python Interview - get the senior & well paid job by Nicolas Georges, which currently has seen about 2,700 enrollments.

My overall opinion of the course is that it is a bit misleading because it's title leads one to believe that the content is geared more towards the mid-to-senior level Python developer, while I found it to be a bit lacking for that. That being said there is some really excellent content covered in this course that will be valuable to entry and junior level Python devs.

As always I thank you for reading and welcome comments and criticisms below.

Codementor: The Best Programming Languages for Data Science and Machine Learning in 2018

$
0
0
Data science is a very vast field and for this, a suitable programming language is a requirement. To learn this field of data science and machine learning one must know the best programming...

Mike Driscoll: Return to editingPython 101: Episode #25 – Decorators

Stack Abuse: Introduction to the Python Pickle Module

$
0
0

Introduction

Pickling is a popular method of preserving food. According to Wikipedia, it is also a pretty ancient procedure – although the origins of pickling are unknown, the ancient Mesopotamians probably used the process 4400 years ago. By placing a product in a specific solution, it is possible to drastically increase its shelf life. In other words, it's a method that lets us store food for later consumption.

If you're a Python developer, you might one day find yourself in need of a way to store your Python objects for later use. Well, what if I told you, you can pickle Python objects too?

Serialization

Serialization is a process of transforming objects or data structures into byte streams or strings. A byte stream is, well, a stream of bytes – one byte is composed of 8 bits of zeros and ones. These byte streams can then be stored or transferred easily. This allows the developers to save, for example, configuration data or user's progress, and then store it (on disk or in a database) or send it to another location.

Python objects can also be serialized using a module called Pickle.

One of the main differences between pickling Python objects and pickling vegetables is the inevitable and irreversible change of the pickled food's flavor and texture. Meanwhile, pickled Python objects can be easily unpickled back to their original form. This process, by the way, is universally known as deserialization.

Pickling (or serialization in general) should not be confused with compression. The purpose of pickling is to translate data into a format that can be transferred from RAM to disk. Compression, on the other hand, is a process of encoding data using fewer bits (in order to save disk space).

Serialization is especially useful in any software where it's important to be able to save some progress on disk, quit the program and then load the progress back after reopening the program. Video games might be the most intuitive example of serialization's usefulness, but there are many other programs where saving and loading a user's progress or data is crucial.

Pickle vs JSON

There is a chance that you have heard of JSON (JavaScript Object Notation), which is a popular format that also lets developers save and transmit objects encoded as strings. This method of serialization has some advantages over pickling. JSON format is human-readable, language-independent, and faster than pickle.

It does have, however, some important limitations as well. Most importantly, by default, only a limited subset of Python built-in types can be represented by JSON. With Pickle, we can easily serialize a very large spectrum of Python types, and, importantly, custom classes. This means we don't need to create a custom schema (like we do for JSON) and write error-prone serializers and parsers. All of the heavy liftings is done for you with Pickle.

What can be Pickled and Unpickled

The following types can be serialized and deserialized using the Pickle module:

  • All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
  • Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
  • Functions and classes that are defined at the top level of a module

It is important to remember that pickling is not a language-independent serialization method, therefore your pickled data can only be unpickled using Python. Moreover, it's important to make sure that objects are pickled using the same version of Python that is going to be used to unpickle them. Mixing Python versions, in this case, can cause many problems.

Additionally, functions are pickled by their name references, and not by their value. The resulting pickle does not contain information on the function's code or attributes. Therefore, you have to make sure that the environment where the function is unpickled is able to import the function. In other words, if we pickle a function and then unpickle it in an environment where it's either not defined or not imported, an exception will be raised.

It is also very important to note that pickled objects can be used in malevolent ways. For instance, unpickling data from an untrusted source can result in the execution of a malicious piece of code.

Pickling a Python List

The following very simple example shows the basics of using the Pickle module in Python 3:

import pickle

test_list = ['cucumber', 'pumpkin', 'carrot']

with open('test_pickle.pkl', 'wb') as pickle_out:  
    pickle.dump(test_list, pickle_out)

First, we have to import the pickle module, which is done in line 1. In line 3 we define a simple, three element list that will be pickled.

In line 5 we state that our output pickle file's name will be test_pickle.pkl. By using the wb option, we tell the program that we want to write (w) binary data (b) inside of it (because we want to create a byte stream). Note that the pkl extension is not necessary – we're using it in this tutorial because that's the extension included in Python's documentation.

In line 6 we use the pickle.dump() method to pickle our test list and store it inside the test_pickle.pkl file.

I encourage you to try and open the generated pickle file in your text editor. You'll quickly notice that a byte stream is definitely not a human-readable format.

Unpickling a Python List

Now, let's unpickle the contents of the test pickle file and bring our object back to its original form.

import pickle

with open('test_pickle.pkl', 'rb') as pickle_in:  
    unpickled_list = pickle.load(pickle_in)

print(unpickled_list)  

As you can see, this procedure is not more complicated than when we pickled the object. In line 3 we open our test_pickle.pkl file again, but this time our goal is to read (r) the binary data (b) stored within it.

Next, in line 5, we use the pickle.load() method to unpickle our list and store it in the unpickled_list variable.

You can then print the contents of the list to see for yourself that it is identical to the list we pickled in the previous example. Here is the output from running the code above:

$ python unpickle.py
['cucumber', 'pumpkin', 'carrot']

Pickling and Unpickling Custom Objects

As I mentioned before, using Pickle, you can serialize your own custom objects. Take a look at the following example:

import pickle

class Veggy():  
    def __init__(self):
        self.color = ''
    def set_color(self, color):
        self.color = color

cucumber = Veggy()  
cucumber.set_color('green')

with open('test_pickle.pkl', 'wb') as pickle_out:  
    pickle.dump(cucumber, pickle_out)

with open('test_pickle.pkl', 'rb') as pickle_in:  
    unpickled_cucumber = pickle.load(pickle_in)

print(unpickled_cucumber.color)  

As you can see, this example is almost as simple as the previous one. Between the lines 3 and 7 we define a simple class that contains one attribute and one method that changes this attribute. In line 9 we create an instance of that class and store it in the cucumber variable, and in line 10 we set its attribute color to "green".

Then, using the exact same functions as in the previous example, we pickle and unpickle our freshly created cucumber object. Running the code above results in the following output:

$ python unpickle_custom.py
green  

Remember, that we can only unpickle the object in an environment where the class Veggy is either defined or imported. If we create a new script and try to unpickle the object without importing the Veggy class, we'll get an "AttributeError". For example, execute the following script:

import pickle

with open('test_pickle.pkl', 'rb') as pickle_in:  
    unpickled_cucumber = pickle.load(pickle_in)

print(unpickled_cucumber.color)  

In the output of the script above, you will see the following error:

$ python unpickle_simple.py
Traceback (most recent call last):  
  File "<pyshell#40>", line 2, in <module>
    unpickled_cucumber = pickle.load(pickle_in)
AttributeError: Can't get attribute 'Veggy' on <module '__main__' (built-in)>  

Conclusion

As you can see, thanks to the Pickle module, serialization of Python objects is pretty simple. In our examples, we pickled a simple Python list – but you can use the exact same method to save a large spectrum of Python data types, as long as you make sure your objects contain only other pickleable objects.

Pickling has some disadvantages, the biggest of which might be the fact that you can only unpickle your data using Python – if you need a cross-language solution, JSON is definitely a better option. And finally, remember that pickles can be used to carry the code that you don't necessarily want to execute. Similarly to pickled food, as long as you get your pickles from trusted sources, you should be fine.

PyCharm: PyCharm 2018.2.4

$
0
0

PyCharm 2018.2.4 is now available, with some small improvements. You can download this version from our website.

New in This Version

  • Various small pipenv improvements
  • A bug in our pytest with fixtures support was fixed: previously, if yield statements were used in the fixture, PyCharm would assume that the return type of the function was a Generator. Now, the correct return type is inferred, preventing false positives.
  • And more, see the release notes

Download PyCharm 2018.2.4

Get PyCharm from the JetBrains website

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm, and stay up to date. You can find the installation instructions on our website.

Preview PyCharm 2018.3

Are you interested in trying the next version of PyCharm already? We’re currently developing PyCharm 2018.3, and you can help us by letting us know how you like our work so far.

New in PyCharm 2018.3 EAP 3

  • Faster generation of skeletons for Docker Compose interpreters. If you have used PyCharm Professional Edition with Docker Compose you’ve probably seen that sometimes it takes a bit of time for PyCharm to index your container. This is now a lot faster.
  • And more, check out the release notes

To get the EAP version, visit the Early Access Preview (EAP) page on our website. You can also use JetBrains Toolbox to keep PyCharm – and other JetBrains products – up to date.

PyCharm 2018.3 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.3 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

Continuum Analytics Blog: AI Opportunities for Financial Services Companies

$
0
0

By Michael Grant AI is undeniably a hot topic right now, and financial services companies are not immune to the hype. And in truth, they shouldn’t be: the applications of advanced AI within financial services are numerous, and the potential for cost savings and new value generation is high.At the same time, the financial services …
Read more →

The post AI Opportunities for Financial Services Companies appeared first on Anaconda.

PyPy Development: Inside cpyext: Why emulating CPython C API is so Hard

$
0
0
/* :Author: David Goodger (goodger@python.org) :Id: $Id: html4css1.css 7952 2016-07-26 18:15:59Z milde $ :Copyright: This stylesheet has been placed in the public domain. Default cascading style sheet for the HTML output of Docutils. See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to customize this style sheet. */ /* used to remove borders from tables and images */ .borderless, table.borderless td, table.borderless th { border: 0 } table.borderless td, table.borderless th { /* Override padding for "table.docutils td" with "! important". The right padding separates the table cells. */ padding: 0 0.5em 0 0 ! important } .first { /* Override more specific margin styles with "! important". */ margin-top: 0 ! important } .last, .with-subtitle { margin-bottom: 0 ! important } .hidden { display: none } .subscript { vertical-align: sub; font-size: smaller } .superscript { vertical-align: super; font-size: smaller } a.toc-backref { text-decoration: none ; color: black } blockquote.epigraph { margin: 2em 5em ; } dl.docutils dd { margin-bottom: 0.5em } object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] { overflow: hidden; } /* Uncomment (and remove this text!) to get bold-faced definition list terms dl.docutils dt { font-weight: bold } */ div.abstract { margin: 2em 5em } div.abstract p.topic-title { font-weight: bold ; text-align: center } div.admonition, div.attention, div.caution, div.danger, div.error, div.hint, div.important, div.note, div.tip, div.warning { margin: 2em ; border: medium outset ; padding: 1em } div.admonition p.admonition-title, div.hint p.admonition-title, div.important p.admonition-title, div.note p.admonition-title, div.tip p.admonition-title { font-weight: bold ; font-family: sans-serif } div.attention p.admonition-title, div.caution p.admonition-title, div.danger p.admonition-title, div.error p.admonition-title, div.warning p.admonition-title, .code .error { color: red ; font-weight: bold ; font-family: sans-serif } /* Uncomment (and remove this text!) to get reduced vertical space in compound paragraphs. div.compound .compound-first, div.compound .compound-middle { margin-bottom: 0.5em } div.compound .compound-last, div.compound .compound-middle { margin-top: 0.5em } */ div.dedication { margin: 2em 5em ; text-align: center ; font-style: italic } div.dedication p.topic-title { font-weight: bold ; font-style: normal } div.figure { margin-left: 2em ; margin-right: 2em } div.footer, div.header { clear: both; font-size: smaller } div.line-block { display: block ; margin-top: 1em ; margin-bottom: 1em } div.line-block div.line-block { margin-top: 0 ; margin-bottom: 0 ; margin-left: 1.5em } div.sidebar { margin: 0 0 0.5em 1em ; border: medium outset ; padding: 1em ; background-color: #ffffee ; width: 40% ; float: right ; clear: right } div.sidebar p.rubric { font-family: sans-serif ; font-size: medium } div.system-messages { margin: 5em } div.system-messages h1 { color: red } div.system-message { border: medium outset ; padding: 1em } div.system-message p.system-message-title { color: red ; font-weight: bold } div.topic { margin: 2em } h1.section-subtitle, h2.section-subtitle, h3.section-subtitle, h4.section-subtitle, h5.section-subtitle, h6.section-subtitle { margin-top: 0.4em } h1.title { text-align: center } h2.subtitle { text-align: center } hr.docutils { width: 75% } img.align-left, .figure.align-left, object.align-left, table.align-left { clear: left ; float: left ; margin-right: 1em } img.align-right, .figure.align-right, object.align-right, table.align-right { clear: right ; float: right ; margin-left: 1em } img.align-center, .figure.align-center, object.align-center { display: block; margin-left: auto; margin-right: auto; } table.align-center { margin-left: auto; margin-right: auto; } .align-left { text-align: left } .align-center { clear: both ; text-align: center } .align-right { text-align: right } /* reset inner alignment in figures */ div.align-right { text-align: inherit } /* div.align-center * { */ /* text-align: left } */ .align-top { vertical-align: top } .align-middle { vertical-align: middle } .align-bottom { vertical-align: bottom } ol.simple, ul.simple { margin-bottom: 1em } ol.arabic { list-style: decimal } ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } ol.lowerroman { list-style: lower-roman } ol.upperroman { list-style: upper-roman } p.attribution { text-align: right ; margin-left: 50% } p.caption { font-style: italic } p.credits { font-style: italic ; font-size: smaller } p.label { white-space: nowrap } p.rubric { font-weight: bold ; font-size: larger ; color: maroon ; text-align: center } p.sidebar-title { font-family: sans-serif ; font-weight: bold ; font-size: larger } p.sidebar-subtitle { font-family: sans-serif ; font-weight: bold } p.topic-title { font-weight: bold } pre.address { margin-bottom: 0 ; margin-top: 0 ; font: inherit } pre.literal-block, pre.doctest-block, pre.math, pre.code { margin-left: 2em ; margin-right: 2em } pre.code .ln { color: grey; } /* line numbers */ pre.code, code { background-color: #eeeeee } pre.code .comment, code .comment { color: #5C6576 } pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold } pre.code .literal.string, code .literal.string { color: #0C5404 } pre.code .name.builtin, code .name.builtin { color: #352B84 } pre.code .deleted, code .deleted { background-color: #DEB0A1} pre.code .inserted, code .inserted { background-color: #A3D289} span.classifier { font-family: sans-serif ; font-style: oblique } span.classifier-delimiter { font-family: sans-serif ; font-weight: bold } span.interpreted { font-family: sans-serif } span.option { white-space: nowrap } span.pre { white-space: pre } span.problematic { color: red } span.section-subtitle { /* font-size relative to parent (h1..h6 element) */ font-size: 80% } table.citation { border-left: solid 1px gray; margin-left: 1px } table.docinfo { margin: 2em 4em } table.docutils { margin-top: 0.5em ; margin-bottom: 0.5em } table.footnote { border-left: solid 1px black; margin-left: 1px } table.docutils td, table.docutils th, table.docinfo td, table.docinfo th { padding-left: 0.5em ; padding-right: 0.5em ; vertical-align: top } table.docutils th.field-name, table.docinfo th.docinfo-name { font-weight: bold ; text-align: left ; white-space: nowrap ; padding-left: 0 } /* "booktabs" style (no vertical lines) */ table.docutils.booktabs { border: 0px; border-top: 2px solid; border-bottom: 2px solid; border-collapse: collapse; } table.docutils.booktabs * { border: 0px; } table.docutils.booktabs th { border-bottom: thin solid; text-align: left; } h1 tt.docutils, h2 tt.docutils, h3 tt.docutils, h4 tt.docutils, h5 tt.docutils, h6 tt.docutils { font-size: 100% } ul.auto-toc { list-style-type: none }
cpyext is PyPy's subsystem which provides a compatibility layer to compile and run CPython C extensions inside PyPy. Often people ask why a particular C extension doesn't work or is very slow on PyPy. Usually it is hard to answer without going into technical details. The goal of this blog post is to explain some of these technical details, so that we can simply link here instead of explaining again and again :).
From a 10.000 foot view, cpyext is PyPy's version of "Python.h". Every time you compile an extension which uses that header file, you are using cpyext. This includes extension explicitly written in C (such as numpy) and extensions which are generated from other compilers/preprocessors (e.g. Cython).
At the time of writing, the current status is that most C extensions "just work". Generally speaking, you can simply pip install them, provided they use the public, official C API instead of poking at private implementation details. However, the performance of cpyext is generally poor. A Python program which makes heavy use of cpyext extensions is likely to be slower on PyPy than on CPython.
Note: in this blog post we are talking about Python 2.7 because it is still the default version of PyPy: however most of the implementation of cpyext is shared with PyPy3, so everything applies to that as well.

C API Overview

In CPython, which is written in C, Python objects are represented as PyObject*, i.e. (mostly) opaque pointers to some common "base struct".
CPython uses a very simple memory management scheme: when you create an object, you allocate a block of memory of the appropriate size on the heap. Depending on the details, you might end up calling different allocators, but for the sake of simplicity, you can think that this ends up being a call to malloc(). The resulting block of memory is initialized and casted to to PyObject*: this address never changes during the object lifetime, and the C code can freely pass it around, store it inside containers, retrieve it later, etc.
Memory is managed using reference counting. When you create a new reference to an object, or you discard a reference you own, you have to increment or decrement the reference counter accordingly. When the reference counter goes to 0, it means that the object is no longer used and can safely be destroyed. Again, we can simplify and say that this results in a call to free(), which finally releases the memory which was allocated by malloc().
Generally speaking, the only way to operate on a PyObject* is to call the appropriate API functions. For example, to convert a given PyObject* to a C integer, you can use PyInt_AsLong(); to add two objects together, you can call PyNumber_Add().
Internally, PyPy uses a similar approach. All Python objects are subclasses of the RPython W_Root class, and they are operated by calling methods on the space singleton, which represents the interpreter.
At first, it looks very easy to write a compatibility layer: just make PyObject* an alias for W_Root, and write simple RPython functions (which will be translated to C by the RPython compiler) which call the space accordingly:
defPyInt_AsLong(space,o):returnspace.int_w(o)defPyNumber_Add(space,o1,o2):returnspace.add(o1,o2)
Actually, the code above is not too far from the actual implementation. However, there are tons of gory details which make it much harder than it looks, and much slower unless you pay a lot of attention to performance.

The PyPy GC

To understand some of cpyext challenges, you need to have at least a rough idea of how the PyPy GC works.
Contrarily to the popular belief, the "Garbage Collector" is not only about collecting garbage: instead, it is generally responsible for all memory management, including allocation and deallocation.
Whereas CPython uses a combination of malloc/free/refcounting to manage memory, the PyPy GC uses a completely different approach. It is designed assuming that a dynamic language like Python behaves the following way:
  • You create, either directly or indirectly, lots of objects.
  • Most of these objects are temporary and very short-lived. Think e.g. of doing a + b + c: you need to allocate an object to hold the temporary result of a + b, then it dies very quickly because you no longer need it when you do the final + c part.
  • Only small fraction of the objects survive and stay around for a while.
So, the strategy is: make allocation as fast as possible; make deallocation of short-lived objects as fast as possible; find a way to handle the remaining small set of objects which actually survive long enough to be important.
This is done using a Generational GC: the basic idea is the following:
  1. We have a nursery, where we allocate "young objects" very quickly.
  2. When the nursery is full, we start what we call a "minor collection".
    • We do a quick scan to determine the small set of objects which survived so far
    • We move these objects out of the nursery, and we place them in the area of memory which contains the "old objects". Since the address of the objects changes, we fix all the references to them accordingly.
  1. now the nursery contains only objects which "died young". We can discard all of them very quickly, reset the nursery, and use the same area of memory to allocate new objects from now.
In practice, this scheme works very well and it is one of the reasons why PyPy is much faster than CPython. However, careful readers have surely noticed that this is a problem for cpyext. On one hand, we have PyPy objects which can potentially move and change their underlying memory address; on the other hand, we need a way to represent them as fixed-address PyObject* when we pass them to C extensions. We surely need a way to handle that.

PyObject* in PyPy

Another challenge is that sometimes, PyObject* structs are not completely opaque: there are parts of the public API which expose to the user specific fields of some concrete C struct. For example the definition of PyTypeObject which exposes many of the tp_* slots to the user. Since the low-level layout of PyPy W_Root objects is completely different than the one used by CPython, we cannot simply pass RPython objects to C; we need a way to handle the difference.
So, we have two issues so far: objects can move, and incompatible low-level layouts. cpyext solves both by decoupling the RPython and the C representations. We have two "views" of the same entity, depending on whether we are in the PyPy world (the movable W_Root subclass) or in the C world (the non-movable PyObject*).
PyObject* are created lazily, only when they are actually needed. The vast majority of PyPy objects are never passed to any C extension, so we don't pay any penalty in that case. However, the first time we pass a W_Root to C, we allocate and initialize its PyObject* counterpart.
The same idea applies also to objects which are created in C, e.g. by calling PyObject_New(). At first, only the PyObject* exists and it is exclusively managed by reference counting. As soon as we pass it to the PyPy world (e.g. as a return value of a function call), we create its W_Root counterpart, which is managed by the GC as usual.
Here we start to see why calling cpyext modules is more costly in PyPy than in CPython. We need to pay some penalty for all the conversions between W_Root and PyObject*.
Moreover, the first time we pass a W_Root to C we also need to allocate the memory for the PyObject* using a slowish "CPython-style" memory allocator. In practice, for all the objects which are passed to C we pay more or less the same costs as CPython, thus effectively "undoing" the speedup guaranteed by PyPy's Generational GC under normal circumstances.

Crossing the border between RPython and C

There are two other things we need to care about whenever we cross the border between RPython and C, and vice-versa: exception handling and the GIL.
In the C API, exceptions are raised by calling PyErr_SetString() (or one of many other functions which have a similar effect), which basically works by creating an exception value and storing it in some global variable. The function then signals that an exception has occurred by returning an error value, usually NULL.
On the other hand, in the PyPy interpreter, exceptions are propagated by raising the RPython-level OperationError exception, which wraps the actual app-level exception values. To harmonize the two worlds, whenever we return from C to RPython, we need to check whether a C API exception was raised and if so turn it into an OperationError.
We won't dig into details of how the GIL is handled in cpyext. For the purpose of this post, it is enough to know that whenever we enter C land, we store the current thread id into a global variable which is accessible also from C; conversely, whenever we go back from RPython to C, we restore this value to 0.
Similarly, we need to do the inverse operations whenever you need to cross the border between C and RPython, e.g. by calling a Python callback from C code.
All this complexity is automatically handled by the RPython function generic_cpy_call. If you look at the code you see that it takes care of 4 things:
  1. Handling the GIL as explained above.
  2. Handling exceptions, if they are raised.
  3. Converting arguments from W_Root to PyObject*.
  4. Converting the return value from PyObject* to W_Root.
So, we can see that calling C from RPython introduce some overhead. Can we measure it?
Assuming that the conversion between W_Root and PyObject* has a reasonable cost (as explained by the previous section), the overhead introduced by a single border-cross is still acceptable, especially if the callee is doing some non-negligible amount of work.
However this is not always the case. There are basically three problems that make (or used to make) cpyext super slow:
  1. Paying the border-crossing cost for trivial operations which are called very often, such as Py_INCREF.
  2. Crossing the border back and forth many times, even if it's not strictly needed.
  3. Paying an excessive cost for argument and return value conversions.
The next sections explain in more detail each of these problems.

Avoiding unnecessary roundtrips

Prior to the 2017 Cape Town Sprint, cpyext was horribly slow, and we were well aware of it: the main reason was that we never really paid too much attention to performance. As explained in the blog post, emulating all the CPython quirks is basically a nightmare, so better to concentrate on correctness first.
However, we didn't really know why it was so slow. We had theories and assumptions, usually pointing at the cost of conversions between W_Root and PyObject*, but we never actually measured it.
So, we decided to write a set of cpyext microbenchmarks to measure the performance of various operations. The result was somewhat surprising: the theory suggests that when you do a cpyext C call, you should pay the border-crossing costs only once, but what the profiler told us was that we were paying the cost of generic_cpy_call several times more than what we expected.
After a bit of investigation, we discovered this was ultimately caused by our "correctness-first" approach. For simplicity of development and testing, when we started cpyext we wrote everything in RPython: thus, every single API call made from C (like the omnipresent PyArg_ParseTuple(), PyInt_AsLong(), etc.) had to cross back the C-to-RPython border. This was especially daunting for very simple and frequent operations like Py_INCREF and Py_DECREF, which CPython implements as a single assembly instruction!
Another source of slow down was the implementation of PyTypeObject slots. At the C level, these are function pointers which the interpreter calls to do certain operations, e.g. tp_new to allocate a new instance of that type.
As usual, we have some magic to implement slots in RPython; in particular, _make_wrapper does the opposite of generic_cpy_call: it takes a RPython function and wraps it into a C function which can be safely called from C, handling the GIL, exceptions and argument conversions automatically.
This was very handy during the development of cpyext, but it might result in some bad nonsense; consider what happens when you call the following C function:
staticPyObject*foo(PyObject*self,PyObject*args){PyObject*result=PyInt_FromLong(1234);returnresult;}
  1. you are in RPython and do a cpyext call to foo: RPython-to-C;
  2. foo calls PyInt_FromLong(1234), which is implemented in RPython: C-to-RPython;
  3. the implementation of PyInt_FromLong indirectly calls PyIntType.tp_new, which is a C function pointer: RPython-to-C;
  4. however, tp_new is just a wrapper around an RPython function, created by _make_wrapper: C-to-RPython;
  5. finally, we create our RPython W_IntObject(1234); at some point during the RPython-to-C crossing, its PyObject* equivalent is created;
  6. after many layers of wrappers, we are again in foo: after we do return result, during the C-to-RPython step we convert it from PyObject* to W_IntObject(1234).
Phew! After we realized this, it was not so surprising that cpyext was very slow :). And this was a simplified example, since we are not passing a PyObject* to the API call. When we do, we need to convert it back and forth at every step. Actually, I am not even sure that what I described was the exact sequence of steps which used to happen, but you get the general idea.
The solution is simple: rewrite as much as we can in C instead of RPython, to avoid unnecessary roundtrips. This was the topic of most of the Cape Town sprint and resulted in the cpyext-avoid-roundtrip branch, which was eventually merged.
Of course, it is not possible to move everything to C: there are still operations which need to be implemented in RPython. For example, think of PyList_Append: the logic to append an item to a list is complex and involves list strategies, so we cannot replicate it in C. However, we discovered that a large subset of the C API can benefit from this.
Moreover, the C API is huge. While we invented this new way of writing cpyext code, we still need to convert many of the functions to the new paradigm. Sometimes the rewrite is not automatic or straighforward. cpyext is a delicate piece of software, so it happens often that we make a mistake and end up staring at a segfault in gdb.
However, the most important takeaway is that the performance improvements we got from this optimization are impressive, as we will detail later.

Conversion costs

The other potential big source of slowdown is the conversion of arguments between W_Root and PyObject*.
As explained earlier, the first time you pass a W_Root to C, you need to allocate its PyObject* counterpart. Suppose you have a foo function defined in C, which takes a single int argument:
foriinrange(N):foo(i)
To run this code, you need to create a different PyObject* for each value of i: if implemented naively, it means calling N times malloc() and free(), which kills performance.
CPython has the very same problem, which is solved by using a free list to allocate ints. So, what we did was to simply steal the code from CPython and do the exact same thing. This was also done in the cpyext-avoid-roundtrip branch, and the benchmarks show that it worked perfectly.
Every type which is converted often to PyObject* must have a very fast allocator. At the moment of writing, PyPy uses free lists only for ints and tuples: one of the next steps on our TODO list is certainly to use this technique with more types, like float.
Conversely, we also need to optimize the converstion from PyObject* to W_Root: this happens when an object is originally allocated in C and returned to Python. Consider for example the following code:
importnumpyasnpmyarray=np.random.random(N)foriinrange(len(arr)):myarray[i]
At every iteration, we get an item out of the array: the return type is a an instance of numpy.float64 (a numpy scalar), i.e. a PyObject'*: this is something which is implemented by numpy entirely in C, so completely opaque to cpyext. We don't have any control on how it is allocated, managed, etc., and we can assume that allocation costs are the same than on CPython.
As soon as we return these PyObject* to Python, we need to allocate their W_Root equivalent. If you do it in a small loop like in the example above, you end up allocating all these W_Root inside the nursery, which is a good thing since allocation is super fast (see the section above about the PyPy GC).
However, we also need to keep track of the W_Root to PyObject* link. Currently, we do this by putting all of them in a dictionary, but it is very inefficient, especially because most of these objects die young and thus it is wasted work to do that for them. Currently, this is one of the biggest unresolved problem in cpyext, and it is what causes the two microbenchmarks allocate_int and allocate_tuple to be very slow.
We are well aware of the problem, and we have a plan for how to fix it. The explanation is too technical for the scope of this blog post as it requires a deep knowledge of the GC internals to be understood, but the details are here.

C API quirks

Finally, there is another source of slowdown which is beyond our control. Some parts of the CPython C API are badly designed and expose some of the implementation details of CPython.
The major example is reference counting. The Py_INCREF / Py_DECREF API is designed in such a way which forces other implementation to emulate refcounting even in presence of other GC management schemes, as explained above.
Another example is borrowed references. There are API functions which do not incref an object before returning it, e.g. PyList_GetItem(). This is done for performance reasons because we can avoid a whole incref/decref pair, if the caller needs to handle the returned item only temporarily: the item is kept alive because it is in the list anyway.
For PyPy, this is a challenge: thanks to list strategies, lists are often represented in a compact way. For example, a list containing only integers is stored as a C array of long. How to implement PyList_GetItem? We cannot simply create a PyObject* on the fly, because the caller will never decref it and it will result in a memory leak.
The current solution is very inefficient. The first time we do a PyList_GetItem, we convert the whole list to a list of PyObject*. This is bad in two ways: the first is that we potentially pay a lot of unneeded conversion cost in case we will never access the other items of the list. The second is that by doing that we lose all the performance benefit granted by the original list strategy, making it slower for the rest of the pure-python code which will manipulate the list later.
PyList_GetItem is an example of a bad API because it assumes that the list is implemented as an array of PyObject*: after all, in order to return a borrowed reference, we need a reference to borrow, don't we?
Fortunately, (some) CPython developers are aware of these problems, and there is an ongoing project to design a better C API which aims to fix exactly this kind of problem.
Nonetheless, in the meantime we still need to implement the current half-broken APIs. There is no easy solution for that, and it is likely that we will always need to pay some performance penalty in order to implement them correctly.
However, what we could potentially do is to provide alternative functions which do the same job but are more PyPy friendly: for example, we could think of implementing PyList_GetItemNonBorrowed or something like that: then, C extensions could choose to use it (possibly hidden inside some macro and #ifdef) if they want to be fast on PyPy.

Current performance

During the whole blog post we claimed cpyext is slow. How slow it is, exactly?
We decided to concentrate on microbenchmarks for now. It should be evident by now there are simply too many issues which can slow down a cpyext program, and microbenchmarks help us to concentrate on one (or few) at a time.
The microbenchmarks measure very simple things, like calling functions and methods with the various calling conventions (no arguments, one arguments, multiple arguments); passing various types as arguments (to measure conversion costs); allocating objects from C, and so on.
Here are the results from the old PyPy 5.8 relative and normalized to CPython 2.7, the lower the better:



PyPy was horribly slow everywhere, ranging from 2.5x to 10x slower. It is particularly interesting to compare simple.noargs, which measures the cost of calling an empty function with no arguments, and simple.onearg(i), which measures the cost calling an empty function passing an integer argument: the latter is ~2x slower than the former, indicating that the conversion cost of integers is huge.
PyPy 5.8 was the last release before the famous Cape Town sprint, when we started to look at cpyext performance seriously. Here are the performance data for PyPy 6.0, the latest release at the time of writing:


The results are amazing! PyPy is now massively faster than before, and for most benchmarks it is even faster than CPython: yes, you read it correctly: PyPy is faster than CPython at doing CPython's job, even considering all the extra work it has to do to emulate the C API. This happens thanks to the JIT, which produces speedups high enough to counterbalance the slowdown caused by cpyext.
There are two microbenchmarks which are still slower though: allocate_int and allocate_tuple, for the reasons explained in the section about Conversion costs.

Next steps

Despite the spectacular results we got so far, cpyext is still slow enough to kill performance in most real-world code which uses C extensions extensively (e.g., the omnipresent numpy).
Our current approach is something along these lines:
  1. run a real-world small benchmark which exercises cpyext
  2. measure and find the major bottleneck
  3. write a corresponding microbenchmark
  4. optimize it
  5. repeat
On one hand, this is a daunting task because the C API is huge and we need to tackle functions one by one. On the other hand, not all the functions are equally important, and is is enough to optimize a relatively small subset to improve many different use cases.
Where a year ago we announced we have a working answer to run c-extension in PyPy, we now have a clear picture of what are the performance bottlenecks, and we have developed some technical solutions to fix them. It is "only" a matter of tackling them, one by one. It is worth noting that most of the work was done during two sprints, for a total 2-3 person-months of work.
We think this work is important for the Python ecosystem. PyPy has established a baseline for performance in pure python code, providing an answer for the "Python is slow" detractors. The techniques used to make cpyext performant will let PyPy become an alternative for people who mix C extensions with Python, which, it turns out, is just about everyone, in particular those using the various scientific libraries. Today, many developers are forced to seek performance by converting code from Python to a lower language. We feel there is no reason to do this, but in order to prove it we must be able to run both their python and their C extensions performantly, then we can begin to educate them how to write JIT-friendly code in the first place.
We envision a future in which you can run arbitrary Python programs on PyPy, with the JIT speeding up the pure Python parts and the C parts running as fast as today: the best of both worlds!

Talk Python to Me: #178 Coverage.py

$
0
0
You know you should be testing your code right? How do you know whether it's *well* tested? Are you testing the right things? If you're not using code coverage, chances are is you're guessing.

Davy Wybiral: Running a Python Web Server on a Microcontroller

$
0
0
I built my own RGB smart light that can be controlled with an HTTP API using only a little bit of Python code and a WiPy 3.0.


Not Invented Here: Getting Started With Python inside PostgreSQL

$
0
0

PostgreSQL uses the tagline "the world's most advanced open source relational database." For PostgreSQL, part of being "advanced" means supporting multiple server-side procedural languages, both built-in and provided by third parties. Luckily for us here at the blog, one of the built-in languages is Python. Unluckily, it's not completely obvious how to get started using Python inside PostgreSQL. This post will provide a short walkthrough demonstrating how to do that. It's written for macOS, but many of the steps are generic.

Server Programming Basics

Relational database servers often support user-defined functions and/or procedures. In the best implementations, they can be used whenever a built-in function or procedure can be used. For example, they might be used in a SQLSELECT statement to compute values for a column in a result set, or they might be executed automatically as a trigger when data in a table changes, or they might even be called directly by a client to perform some complex action.

The database servers that support user-defined functions and procedures tend to offer a custom language to write them in. These languages tend to be somewhat similar to SQL. While that can ease writing queries and updates, it can also lead to baroque code when the logic gets more complicated.

In PostgreSQL, that SQL-derived language is called PL/pgSQL, in Oracle database it's called PL/SQL, and in Microsoft SQL server, it's called T-SQL. MySQL doesn't seem to give a name to its dialect, simply referring to SQL Compound-Statements. All of these languages are very different from one another (although PL/pgSQL and PL/SQL share some similarities).

PostgreSQL goes beyond that, using its extension mechanism to support multiple server-side procedural languages. These include:

Installing PostgreSQL

The first step to working with any of those procedural languages is to install PostgreSQL. Because all of these languages are extensions, and thus optional, we need to also compile the extensions we wish to use. Here we'll be using MacPorts to install PostgreSQL 10 with support for Python:

$ sudo port install postgresql10 +python
--->  Computing dependencies for postgresql10--->  Fetching archive for postgresql10--->  Some of the ports you installed have notes:  postgresql10 has the following notes:    To use the postgresql server, install the postgresql10-server port

We want to use the server, not just the client, so we follow the instructions to install the server:

$ sudo port install postgresql10-server
--->  Computing dependencies for postgresql10-server--->  Fetching archive for postgresql10-server...--->  Some of the ports you installed have notes:  postgresql10-server has the following notes:    To create a database instance, after install do      sudo mkdir -p /opt/local/var/db/postgresql10/defaultdb      sudo chown postgres:postgres /opt/local/var/db/postgresql10/defaultdb      sudo su postgres -c 'cd /opt/local/var/db/postgresql10 && /opt/local/lib/postgresql10/bin/initdb -D /opt/local/var/db/postgresql10/defaultdb'

We are prompted to create the initial database, so we follow the instructions there too:

$ sudo mkdir -p /opt/local/var/db/postgresql10/defaultdb
$ sudo chown postgres:postgres /opt/local/var/db/postgresql10/defaultdb
$ sudo su postgres -c 'cd /opt/local/var/db/postgresql10 && /opt/local/lib/postgresql10/bin/initdb -D /opt/local/var/db/postgresql10/defaultdb'The files belonging to this database system will be owned by user "postgres".This user must also own the server process.The database cluster will be initialized with locale "en_US.UTF-8".The default database encoding has accordingly been set to "UTF8".The default text search configuration will be set to "english"....WARNING: enabling "trust" authentication for local connectionsYou can change this by editing pg_hba.conf or using the option -A, or--auth-local and --auth-host, the next time you run initdb.Success. You can now start the database server using:     /opt/local/lib/postgresql10/bin/pg_ctl -D /opt/local/var/db/postgresql10/defaultdb -l logfile start

The instructions at the end for starting the database server will spin it up in the background and write logging information to the file logfile. For the purposes of debugging and understanding, I prefer to run the server in the foreground, like so:

$ sudo -u postgres /opt/local/lib/postgresql10/bin/postgres -D /opt/local/var/db/postgresql10/defaultdb/
2018-09-21 07:48:14.363 CDT [18023] LOG:  listening on IPv6 address "::1", port 54322018-09-21 07:48:14.364 CDT [18023] LOG:  listening on IPv6 address "fe80::1%lo0", port 54322018-09-21 07:48:14.364 CDT [18023] LOG:  listening on IPv4 address "127.0.0.1", port 54322018-09-21 07:48:14.364 CDT [18023] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2018-09-21 07:48:14.381 CDT [18024] LOG:  database system was shut down at 2018-09-21 07:46:28 CDT2018-09-21 07:48:14.384 CDT [18023] LOG:  database system is ready to accept connections...

We now leave that program running in its own terminal window where we can watch its output as we proceed. Everything else from now on will be done in a new terminal window.

Creating Users

Now that the server is running, we're ready to connect a client and start experimenting. We'll be using the stock psql interactive terminal that comes with PostgreSQL. MacPorts installs that as /opt/local/bin/psql10. (pgcli is an excellent alternative shell.)

$ psql10
psql10: FATAL:  role "jmadden" does not exist

When we ran initdb, we were warned that trust authentication would be used for local connections. What this means is that our local macOS user name would be used to connect to the server. My local user name is 'jmadden', and there is no matching user on the server (users and roles in PostgreSQL are essentially interchangeable), so we need to create one. PostgreSQL ships a createuser command for this purpose, but it's more instructive to look under the hood to see what this does by actually issuing the commands ourself.

We do this by connecting as the only user that's pre-existing and can login, the postgres super-user, and creating the role:

$ sudo -u postgres psql10
psql10 (10.5)Type "help" for help.postgres=#createrolejmadden;CREATE ROLE

We're not done yet, we need to grant login rights to that role:

postgres=#\du                                List of rolesRole name |                         Attributes                         | Member of-----------+------------------------------------------------------------+-----------jmadden   | Cannot login                                               | {}postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}postgres=#alterrolejmaddenwithcreatedblogincreaterole;ALTER ROLEpostgres=#\du                                List of rolesRole name |                         Attributes                         | Member of-----------+------------------------------------------------------------+-----------jmadden   | Create role, Create DB                                     | {}postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

In psql, \du is a shortcut command for "display users". In PostgreSQL, almost all information about database objects is kept in system tables, where it can be read and manipulated with SQL. In this case, the table we're updating is the pg_roles table:

postgres=#select*frompg_roles;      rolname        | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil | rolbypassrls | rolconfig |  oid----------------------+----------+------------+---------------+-------------+-------------+----------------+--------------+-------------+---------------+--------------+-----------+-------pg_signal_backend    | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  4200postgres             | t        | t          | t             | t           | t           | t              |           -1 | ********    |               | t            |           |    10pg_read_all_stats    | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3375pg_monitor           | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3373jmadden              | f        | t          | t             | t           | t           | f              |           -1 | ********    |               | f            |           | 16384pg_read_all_settings | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3374pg_stat_scan_tables  | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3377(7 rows)

Now that we can login, surely we can connect:

$ psql10
psql10: FATAL:  database "jmadden" does not exist

Nope! By default, PostgreSQL will connect our session to a database matching our user name. That database doesn't exist yet. It's a good idea to create one to hold our experiments in anyway, as opposed to using one of the existing databases (to which we probably don't have access rights anyway). We'll use the createdb command to create this database and then try to login again:

$ /opt/local/lib/postgresql10/bin/createdb jmadden
$ psql10
psql10 (10.5)Type "help" for help.jmadden=> \l                               List of databases   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges-----------+----------+----------+-------------+-------------+----------------------- jmadden   | jmadden  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +           |          |          |             |             | postgres=CTc/postgres template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +           |          |          |             |             | postgres=CTc/postgres(4 rows)

Tip

Notice that the psql prompt is different for regular users and the superuser; regular users have => in the prompt, while the super user has =#.

Creating Languages

Fantastic! Now we're logged in to our own database. Let's create the first example Python function:

jmadden=>CREATEFUNCTIONpymax(ainteger,binteger)RETURNSintegerAS$$ifa>b:returnareturnb$$LANGUAGEplpythonu;ERROR:  language "plpythonu" does not existHINT:  Use CREATE LANGUAGE to load the language into the database.

Well that didn't work. If we back up to the introduction to PL/Python, it says we need to create the extension first (the "PL" means "procedural language"; we'll see what the "u" means in a little bit):

jmadden=>createextensionplpythonu;ERROR:  permission denied to create extension "plpythonu"HINT:  Must be superuser to create this extension.

Hmm, ok, I suppose that makes sense. Lets use our superuser login to create the extension:

$ sudo -u postgres psql10
psql10 (10.5)Type "help" for help.postgres=# create extension plpythonu;CREATE EXTENSION

Ok, extension created. Now lets go back to our own database and create that example function:

jmadden=>CREATEFUNCTIONpymax(ainteger,binteger)RETURNSintegerAS$$ifa>b:returnareturnb$$LANGUAGEplpythonu;ERROR:  language "plpythonu" does not existHINT:  Use CREATE LANGUAGE to load the language into the database.

Same error! Argh!

After examining the documentation a bit, it turns out that extensions are local to particular databases. So we need to use the 'jmadden' database as the superuser 'postgres' to create the extension in that database:

$ sudo -u postgres psql10 jmadden
psql10 (10.5)Type "help" for help.jmadden=# create extension plpythonu;CREATE EXTENSION

Untrusted Languages

Surely now we can create that simple example function:

jmadden=>CREATEFUNCTIONpymax(ainteger,binteger)RETURNSintegerAS$$ifa>b:returnareturnb$$LANGUAGEplpythonu;ERROR:  permission denied for language plpythonu

At least we got a different error this time.

It turns out that the "u" in "plpythonu" means that the language is "untrusted." I'll quote the docs for what that means:

PL/Python is only available as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it and is therefore named plpythonu. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such as plpythonu.

The safest way to work with untrusted languages is to only temporarily escalate your privileges by using a superuser role to create "safe" functions, as outlined in this StackExchange post. But that's no fun. Since we're only experimenting on our own machine, and we don't want to have superuser priviliges if we can avoid it (to limit the risks of accidentally damaging our database), we have another option: we can mark the language as trusted.

To do this, we'll need our superuser shell again:

postgres=#select*frompg_language;  lanname  | lanowner | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator | lanacl-----------+----------+---------+--------------+---------------+-----------+--------------+-------- internal  |       10 | f       | f            |             0 |         0 |         2246 | c         |       10 | f       | f            |             0 |         0 |         2247 | sql       |       10 | f       | t            |             0 |         0 |         2248 | plpgsql   |       10 | t       | t            |         12545 |     12546 |        12547 | plpythonu |       10 | t       | f            |         16416 |     16417 |        16418 |(5 rows)postgres=#updatepg_languagesetlanpltrusted=truewherelanname='plpythonu';UPDATE 1

Now our regular user can create and execute this function:

jmadden=>CREATEFUNCTIONpymax(ainteger,binteger)RETURNSintegerAS$$ifa>b:returnareturnb$$LANGUAGEplpythonu;CREATE FUNCTIONjmadden=>selectpymax(1,2); pymax-------   2(1 row)

More

PL/Python can do much more than just compare integers, of course. It can query the database, processing rows in batches or iteratively. It can access and produce PostgreSQL "compound types." It can be used for row or statement level triggers. That's all beyond the scope of this post, but you can read more about it in the PostgreSQL documentation.

Marc Richter: Create your own Telegram bot with Django on Heroku – Part 7 – Introducing apps and URLconf

$
0
0

 

Django_Pony

In the previous part of this series, we started with the basics for kicking off a new Django project. We prepared our virtualenv, installed needed modules to it, created and integrated a new Heroku – project for it and learned how to work with variables in Heroku to control our application with an easy example. We also learned how to check our results locally before we publish it to our production space and how we can add an addon to our Heroku project by adding a PostgreSQL database to it.

Today, we will learn what an “app” is in Django and how to create it. Also, we will learn about and create a so-called URLconf / routing to direct specific URLs to specific parts of our code.

Project, App, Application, … what is all this? 🤯

In Django, there are a few terms, which are a bit confusing at first. The fact that careless people tend to use them as exchangeable terms sometimes(including myself, like I did in the previous part by writing “Creating the Django app” even though we created a project 😅), confuses beginners even more. So: Let’s begin today’s article with a short definition of these terms:

A project is what we have created using the 

django-admin startproject dtbot .
 in theprevious part. It is the absolute lowest entry level of the Django structure. For me, it helps the best thinking about what my personal definition of the term “Project” is without thinking about Django, but the term in general. It may be a collection of pieces, which form the stack build for a specific customer, maybe. For each completely new thing, like a different website or a different customer, you are creating an own project for, usually. That’s exactly what it describes in the Django-World, too.
The tutorial describes a project like this:

(A Django project is) a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.

The terms app and application are harder to distinguish. To make it even more confusing, an “app” is also called a “package“, sometimes. These two terms mean the same thing absolutely and are 100% interchangeable. If any, most people tend to use the term “app” as the thing that lives in your project tree and “package”, when they are talking about a packaged distribution of that code for shipping or download. But generally, these are describing the same thing.
The term “application” does not exist in the Django world as an own term, really. In my experience, people tend to talk about “the application” if “the project” is what they meant in the first place. Another thing which is confusing is the fact that “app” and “application” are not really two different words; “app” is just an abbreviation of the term “application” in the end. Anyways, they are used differently.
An “app” is something which is really doing something active in your project instead of defining settings, routes or basics for anything to base upon, like generating a page, receiving data, applying logic on that requests, and so on.

Again, this is how the Django Tutorial describes the difference between a project and an app:

A project is a collection of configuration and apps for a particular website. A project can contain multiple apps. An app can be in multiple projects.

Why multiple apps?

If you are not too familiar with this concept, it might look like creating several apps in one project is making everything more complicated than it is necessary. In the end, you might complain, you are about to create just such a tiny little application which seems like not being worth the efforts to structure everything in such a complicated way.

The last sentence of the previous quote gives an idea for this already: Reusability is a strong reason. Maybe you want to share the result of your development sooner or later. Or, after having finished this one bot, you might have a great idea for another one, dealing with your shopping list in some way instead of your household-budged. Or maybe these two should even interact with each other later? Or the second bot you create simply does not need huge parts of the functionality of your first bot – why should you carry around “dead” code which makes your bot’s code just more complicated and blown up? Wouldn’t it be great if you had created separated apps for things like:

  • User registration
  • Calculations and reporting
  • Analyzing the message and creating the replies

The point is: You can’t know at the beginning how your project evolves over time or what additional ideas you might have. Separating functionality into apps make it appear a bit overcomplicated, but as soon as you have made your first steps with this, it won’t feel like being complicated anymore. Just stick to it and stay tuned until we have created our app in a minute!

If you want to learn more about apps like: “When does it make sense to separate some functionality to separate apps?” I recommend reading this article about it: 0–100 in Django: Starting an app the right way

Creating the app

A new application is created, using a command from the manage.py script in the root of your project-dir:

(dtbot-hT9CNosh) ~/dtbot $ python manage.py startapp bot
(dtbot-hT9CNosh) ~/dtbot $

… that’s not too exciting, is it? To see what this command has changed in our project, I like to use Git to display all differences:

(dtbot-hT9CNosh) ~/dtbot $ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    bot/

nothing added to commit but untracked files present (use "git add" to track)
(dtbot-hT9CNosh) ~/dtbot $ git add .
(dtbot-hT9CNosh) ~/dtbot $ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   bot/__init__.py
    new file:   bot/admin.py
    new file:   bot/apps.py
    new file:   bot/migrations/__init__.py
    new file:   bot/models.py
    new file:   bot/tests.py
    new file:   bot/views.py

(dtbot-hT9CNosh) ~/dtbot $

Seems as if it has created just one additional folder without touching any of our other files. Fine – this way, it isn’t messing things up.

Right now, this is of absolutely no use for us; we need to do some things before we can really start to build something in that app:

Writing a view 🧐

… a what? What’s that?
A view is … more or less: Code. A function, to be even more precise, which decides what happens to a request that hits your app. The easiest view I can possibly think of is a static reply to any request. In other words: Answering any request with the same, static page.
Before I overcomplicate this with my explanations, let’s head for an example:

Views are created in the file 

views.py
 of the app. So, let’s edit the file 
bot/views.py
 . After the app was created, the file has some default content:

from django.shortcuts import render

# Create your views here.

Just remove that and replace it with the following:

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world. This is the bot app.")

Do not think about “How should I know this is what I need to import and use for a simple HTTP response???” too hard for now – just accept it. This is something that comes with the time; in the beginning, you need to read a lot of docs and lookup examples for nearly every baby-step. That’s normal, you are not stupid or so: Everyone needs to get used to this, first!

Apart from that, it’s pretty obvious what happens here, isn’t it? First, a module is imported, which seems to generate a response, which can be served by the web server to the client then.
A function named 

index
  is defined, taking one argument named 
request
 . 
request
 is not used in that function; so: Not too important what that is. It’s needed in the API of the function anyway though since Django provides this to any view as part of its internal function.

Alright! But – how to reach that now with our browser? We need to define a so-called URLconf for this, which is what follows next.

Creating a URLconf

A URLconf or “routing configuration” is simply a list of URIs, which points towards a view. For example, we could create a URLconf, which calls for all requests to the URL 

https://our-domain.com/bot/hello
  the previously created
index
 view from the 
bot
 app.
Even though this is not meaningful really, let’s do that to explain this step-by-step without expecting everything to be self-explanatory:

Open the file 

dbot/urls.py
 in your favorite editor. You will notice, that after some comments, there is already one routing in place:

from django.contrib import admin
from django.urls import path

urlpatterns = [
    path('admin/', admin.site.urls),
]

Again: Ignore the imports for now. Just take the fact that it works like this for granted for now:
To map the URL 

/bot/hello
 to our 
index
 – view in 
bot/views.py
, we first need to step back and remember, what we are doing here:

We are writing an app, which eventually can be taken and copied to other Django projects later. This app might have several URLs, pointing to different functions inside of it. Also, a Django project may have several apps installed.
Does it really make sense that the potentially complicated or even conflicting URLconf is handled in one central file of the project?
Would you really like to solve a naming conflict like one app demanding the URL 

/conf
 for itself internally when you had decided to use that very same URL to access your app for configuring your project?

Most certainly not. That’s why it’s a common pattern to create sub-paths for each app and “delegating” the URLconf of that branch to the app. This way, you need to create just one single line or each app you are using, instead of dealing with dozens per app and conflicting patterns.
To do that, we are changing the file 

dbot/urls.py
 in the following way:

First, we add 

include
 to the list of imported elements from 
django.urls
 :

from django.urls import path, include

Next, we register the path 

bot/
 to be delegated to that app’s own URLconf by adding the following to the 
urlpatterns
 – list:

urlpatterns = [
    path('admin/', admin.site.urls),
    path('bot/', include('bot.urls')),
]

This makes Django search for the file 

bot/urls.py
 for additional URLconf configurations for everything below the URI 
bot/
 (like 
bot/hook
 or similar).
The file 
bot/urls.py
 is not created by executing 
python manage.py startapp bot
 ; we need to create that file ourselves.  Let’s do so now with the following content:

from django.urls import path

from . import views

urlpatterns = [
    path('hello/', views.index),
]

And – we are done setting up our demo-URLconf for now!

To test this, run the HTTP server locally and access http://localhost:8000/bot/hello/ with your browser. This should display the text we entered to our 

index
 – view before:

Hello world! for bot app

Stop: Hammertime !!!

I have to admit that I underestimated the extent of this article a lot! That’s why I will make a stop here and postpone the rest of the pre-announced content like creating a database, showing additional Heroku – tools, etc. to the next part to not make this a too big and boring thing to read.

Outlook for the next part of the series

We just learned about some terminology, what an app is and how it is created and made available.

In the next article of this series, we will utilize this freshly gained knowledge to create the hook for our bot, finally. Also, we will create the database. And, because of the lesson I just learned: That’s it – nothing more 😉

If you liked or disliked this article, I’d love to read that in the comments!

Enjoy coding!

Born in 1982, Marc Richter is an IT enthusiastic since 1994. He became addicted when he first put hands on their family’s pc and never stopped investigating and exploring new things since then.
He is married to Jennifer Richter and proud father of two wonderful children, Lotta and Linus.
His current professional focus is DevOps and Python development.

An exhaustive bio can be found at this blog post.

Found my articles useful? Maybe you would like to support my efforts and give me a tip then?

Spyder IDE: QtConsole 4.4 Released!

$
0
0

We're excited to announce a significant update to QtConsole—the package that powers Spyder's IPython Console interface—which the Spyder team maintains in collaboration with Project Jupyter. Two of the biggest changes—user-selectable syntax highlighting themes, and enhanced external editor/IDE integration—are already built right into Spyder, so they'll likely be of more interest if you use QtConsole standalone or with another editor/IDE. However, most of the other changes should prove quite useful within Spyder as well, and many were in fact suggested and even implemented by users of our IDE. Particular highlights include a block indent/unindent feature, Select-All (Ctrl-Shift-A) being made cell-specific, Ctrl-Backspace and Ctrl-Delete behaving more intelligently across whitespace and line boundaries, Ctrl-D allowing you to easily exit ipdb, input() and the like, and numerous smaller enhancements and bug fixes. If you'd like to learn more about what's new, please check out our article over on the Jupyter blog, where we go over the major changes in more detail, with plenty of screenshots and GIFs to illustrate each feature and how to use it.

Screenshot of the QtConsole main window, with a new syntax highlighting theme applied

To update to the newest version with your existing Spyder install, open an Anaconda Prompt (Windows), Terminal (macOS) or command line (Linux), activate the conda environment or virtualenv/venv of the Spyder install you are using, and run conda update qtconsole (or pip install --upgrade qtconsole, if not using Anaconda). If you'd like to try QtConsole out separate from Spyder or integrate it in with your own editor or IDE, it is also available as a standalone GUI by running jupyter qtconsole from the Python environment where it or Spyder is installed.

If you have any questions, problems or feedback, we'd love to hear from you. Report issues, request features or participate in QtConsole's development at its Github site, and check out its documentation for help using it. For the latest Spyder news, releases, previews and tips, you can follow our Facebook and Twitter, and help support the development on Spyder and its sister projects like QtConsole on OpenCollective.

Our new documentation and Spyder 4 beta 1 have been fully live for some time now; given the dramatic scale of the changes in both, their respective blog posts are still in the works. We'll also have an upcoming article on our official Spyder 4 feature roadmap and more, and Spyder 3.3.2 is due out soon, so keep in right here for your Spyder fix! Until then, happy Spydering and enjoy QtConsole 4.4!

Python Bytes: #96 Python Language Summit 2018

Kay Hayen: Nuitka this week #7

$
0
0

Nuitka Design Philosophy

Note

I wrote this as part of a discussion recently, and I think it makes sense to share my take on Nuitka and design. This is a lot text though, feel free to skip forward.

The issue with Nuitka and design mainly for me is that the requirements for many parts were and are largely unknown to me, until I actually start to do it.

My goto generators approach worked out as originally designed, and that felt really cool for once, but the whole "C type" thing was a total unknown to me, until it all magically took form.

But rather I know it will evolve further if I go from "bool" (complete and coming for 0.6.0) via "void" (should be complete already, but enabling will happen only for 0.6.1 likely) to "int", not sure how long that will take.

I really think Nuitka, unlike other software that I have designed, is more of a prototype project that gradually turns more and more into the real thing.

I have literally spent years to inject proper design in steps into the optimization phase, what I call SSA, value tracing, and it is very much there now. I am probably going to spend similar amounts of time, to execute on applying type inference results to the code generation.

So I turned that into something working with code strings to something working with variable declaration objects knowing their type for the goto generators, aiming at C types generally. All the while carrying the full weight of passing every compatibility test there is.

Then e.g. suddenly cleaning up module variables to no longer have their special branch, but a pseudo C type, that makes them like everything else. Great. But when I first introduced the new thing, I postponed that, because I could sooner apply its benefits to some things and get experience from it.

While doing partial solutions, the design sometimes horribly degrades, but only until some features can carry the full weight, and/or have been explored to have their final form.

Making a whole Nuitka design upfront and then executing it, would instead give a very high probability of failing in the real world. I am therefore applying the more agile approach, where I make things work first. And then continue to work while I clean it up.

For every feature I added, I actively go out, and change the thing, that made it hard or even fail. Always. I think Nuitka is largely developed by cleanups and refactoring. Goto generators were a fine example of that, solving many of the issues by injecting variable declarations objects into code generation, made it easy to indicate storage (heap or object or stack) right there.

That is not to say that Nuitka didn't have the typical compiler design. Like parsing inputs, optimizing a tree internally, producing outputs. But that grand top level design only tells you the obvious things really and is stolen anyway from knowing similar projects like gcc.

There always were of course obvious designs for Nuitka, but that really never was what anybody would consider to make a Python compiler hard. But for actual compatibility of CPython, so many details were going to require examination with no solutions known ahead of time.

I guess, I am an extreme programmer, or agile, or however they call it these days. At least for Nuitka. In my professional life, I have designed software for ATC on the drawing board, then in paper, and then in code, the design just worked, and got operational right after completion, which is rare I can tell you.

But maybe that is what keeps me exciting about Nuitka. How I need to go beyond my abilities and stable ground to achieve it.

But the complexity of Nuitka is so dramatically higher than anything I ever did. It is doing a complicated, i.e. detail rich work, and then it also is doing hard jobs where many things have to play together. And the wish to have something working before it is completed, if it ever is, makes things very different from projects I typically did.

So the first version of Nuitka already had a use, and when I publicly showed it first, was capable of handling most complex programs, and the desire was to evolve gradually.

I think I have desribed this elsewhere, but for large parts of the well or bad designed solutions of Nuitka, there is reliable ways of demonstrating it works correctly. Far better than I have ever encountered. i believe it's the main reason I managed to get this off the ground is that. Having a test "oracle" is what makes Nuitka special, i.e. comparing to existing implementations.

Like a calculator can be tested comparing it to one of the many already perfect ones out there. That again makes Nuitka relatively easy despite the many details to get right, there is often an easy way to tell correct from wrong.

So for me, Nuitka is on the design level, something that goes through many iterations, discovery, prototyping, and is actually really exciting in that.

Compilers typically are boring. But for Nuitka that is totally not the case, because Python is not made for it. Well, that*s technically untrue, lets say not for optimizing compilers, not for type inference, etc.

UI rework

Following up on discussion on the mailing list, the user interface of Nuitka will become more clear with --include-* options and --[no]follow-import* options that better express what is going to happen.

Also the default for following with extension modules is now precisely what you say, as going beyond what you intend to deliver makes no sense in the normal case.

Goto Generators

Now release as 0.5.33 and there has been little regressions so far, but the one found is only in the pre-release of 0.6.0 so use that instead if you encounter a C compilation error.

Benchmarks

The performance regressions fixed for 0.6.0 impact pystone by a lot, loops were slower, so were subscripts with constant integer indexes. It is a pity these were introduced in previous releases during refactorings without noticing.

We should strive to have benchmarks with trends. Right now Nuitka speedcenter cannot do it. Focus shoud definitely go to this. Like I said, after 0.6.0 release, this will be a priority, to make them more useful.

Twitter

I continue to be active there. I just put out a poll about the comment system, and disabling Disqus comments I will focus on Twitter for web site comments too now.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Egg files in PYTHONPATH

This is a relatively old issue that now got addressed. Basically these should be loaded from for compilation. Nuitka now unpacks them to a cache folder so it can read source code from them, so this apparently rare use case works now, yet again improving compatibility.

Will be there for 0.6.0 release.

Certifi

Seems request module sometimes uses that. Nuitka now includes that data file starting with 0.6.0 release.

Compatibility with pkg_resources

It seems that getting "distributions" and taking versions from there, is really a thing, and Nuitka fails pkg_resources requirement checks in standalone mode at least, and that is of course sad.

I am currently researching how to fix that, not sure yet how to do it. But some forms of Python installs are apparently very affected by it. I try looking into its data gathering, maybe compiled modules can be registered there too. It seems to be based on file system scans of its own makings, but there is always a monkey patch possible to make it better.

Plans

Still working on the 0.6.0 release, cleaning up open ends only. Release tests seem to be pretty good looking. The UI changes and stuff are a good time to be done now, but delay things, and there is a bunch of small things that are low hanging fruits while I wait for test results.

But since it fixes so many performance things, it really ought to be out any day now.

Also the in-place operations stuff, I added it to 0.6.0 too, just because it feels very nice, and improves some operations by a lot too. Initially I had made a cut for 0.6.1 already, but that is no more.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Viewing all 22631 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>