Planet Python

This week I came across someone who was wondering if there was a way to allow the user to edit the contents of a wx.ComboBox. By editing the contents, I mean change the names of the pre-existing choices that the ComboBox contains, not adding new items to the widget.

While editing the contents of the selected item in a ComboBox works out of the box, the widget will not save those edits automatically. So if you edit something and then choose a different option in the ComboBox, the edited item will revert back to whatever it was previously and your changes will be lost.

Let’s find out how you can create a ComboBox that allows this functionality!

Creating GUI Applications with wxPython

Purchase now on Leanpub or Amazon

Changing a ComboBox

Changing wx.ComboBox

The first step when trying something new out is to write some code. You’ll need to create an instance of wx.ComboBox and pass it a list of choices as well as set the default choice. Of course, you cannot create a single widget in isolation. The widget must be inside of a parent widget. In wxPython, you almost always want the parent to be a wx.Panel that is inside of a wx.Frame.

Let’s write some code and see how this all lays out:

import wx

class MainPanel(wx.Panel):

    def __init__(self, parent):
        super().__init__(parent)

        self.cb_value = 'One'

        self.combo_contents = ['One', 'Two', 'Three']
        self.cb = wx.ComboBox(self, choices=self.combo_contents,
                              value=self.cb_value, size=(100, -1))

        self.cb.Bind(wx.EVT_TEXT, self.on_text_change)
        self.cb.Bind(wx.EVT_COMBOBOX, self.on_selection)

    def on_text_change(self, event):
        current_value = self.cb.GetValue()
        if current_value != self.cb_value and current_value not in self.combo_contents:
            # Value has been edited
            index = self.combo_contents.index(self.cb_value)
            self.combo_contents.pop(index)
            self.combo_contents.insert(index, current_value)
            self.cb.SetItems(self.combo_contents)
            self.cb.SetValue(current_value)
            self.cb_value = current_value
            
    def on_selection(self, event):
        self.cb_value = self.cb.GetValue()

class MainFrame(wx.Frame):

    def __init__(self):
        super().__init__(None, title='ComboBox Changing Demo')
        panel = MainPanel(self)
        self.Show()


if __name__ == "__main__":
    app = wx.App(False)
    frame = MainFrame()
    app.MainLoop()

The main part of the code that you are interested in is inside the MainPanel class. Here you create the widget, set its choices list and a couple of other parameters. Next you will need to bind the ComboBox to two events:

wx.EVT_TEXT– For text change events
wx.EVT_COMBOBOX– For changing item selection events

The first event, wx.EVT_TEXT, is fired when you change the text in the widget by typing and it also fires when you change the selection. The other event only fires when you change selections. The wx.EVT_TEXT event fires first, so it has precedence over wx.EVT_COMBOBOX.

When you change the text, on_text_change() is called. Here you will check if the current value of the ComboBox matches the value that you expect it to be. You also check to see if the current value matches the choice list that is currently set. This allows you to see if the user has changed the text. If they have, then you want to grab the index of the currently selected item in your choice list.

Then you use the list’s pop() method to remove the old string and the insert() method to add the new string in its place. Now you need to call the widget’s SetItems() method to update its choices list. Then you set its value to the new string and update the cb_value instance variable so you can check if it changes again later.

The on_selection() method is short and sweet. All it does is update cb_value to whatever the current selection is.

Give the code a try and see how it works!

Wrapping Up

Adding the ability to allow the user to update the wx.ComboBox‘s contents isn’t especially hard. You could even subclass wx.ComboBox and create a version where it does that for you all the time. Another enhancement that might be fun to add is to have the widget load its choices from a config file or a JSON file. Then you could update on_text_change() to save your changes to disk and then your application could save the choices and reload them the next time you start your application.

Have fun and happy coding!

The post Letting Users Change a wx.ComboBox’s Contents in wxPython appeared first on The Mouse Vs. The Python.

The this python package is simple to use: [mythcat@desk ~]$ python3 Python 3.7.5 (default, Dec 15 2019, 17:54:26) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better

2019 was a good year for my blog. While we didn’t end up getting a lot of new readers, we did receive a small bump. There has also been a lot more interest in the books that are available on this site.

For the year 2019, these are the top ten most read:

#1 – Creating PDFs with PyFPDF and Python
#2 – Exporting Data from PDFs with Python
#3 – Reading Excel Spreadsheets with Python and xlrd
#4 – Determining if all Elements in a List are the Same in Python
#5 – A Simple Step-by-Step Reportlab Tutorial
#6 – How to Get a List of Class Attributes in Python
#7 – Python: Using Turtles for Drawing Circles
#8 – How to Export Jupyter Notebooks into Other Formats
#9 – Python 201: A multiprocessing tutorial
#10 – Python 3: An Intro to Enumerations

Note that none of these articles were actually written in 2019. Half of them were written in 2018 and one of them dates all the way back to 2010. Interestingly enough, my most popular article of 2019 is about using Python to take a photo of the black hole. That article ranks way down at #28.

For 2020, I am going to work hard at creating new content and tutorials that you will find useful in your Python journey. In the meantime, I hope you’ll enjoy reading the archives while I work on some new ones!

The post Top 10 Most Read Mouse vs Python Articles of 2019 appeared first on The Mouse Vs. The Python.

Let’s find out how you can create a ComboBox that allows this functionality!

Creating GUI Applications with wxPython

Purchase now on Leanpub or Amazon

Changing a ComboBox

Changing wx.ComboBox

Let’s write some code and see how this all lays out:

import wx

class MainPanel(wx.Panel):

    def __init__(self, parent):
        super().__init__(parent)

        self.cb_value = 'One'

        self.combo_contents = ['One', 'Two', 'Three']
        self.cb = wx.ComboBox(self, choices=self.combo_contents,
                              value=self.cb_value, size=(100, -1))

        self.cb.Bind(wx.EVT_TEXT, self.on_text_change)
        self.cb.Bind(wx.EVT_COMBOBOX, self.on_selection)

    def on_text_change(self, event):
        current_value = self.cb.GetValue()
        if current_value != self.cb_value and current_value not in self.combo_contents:
            # Value has been edited
            index = self.combo_contents.index(self.cb_value)
            self.combo_contents.pop(index)
            self.combo_contents.insert(index, current_value)
            self.cb.SetItems(self.combo_contents)
            self.cb.SetValue(current_value)
            self.cb_value = current_value
            
    def on_selection(self, event):
        self.cb_value = self.cb.GetValue()

class MainFrame(wx.Frame):

    def __init__(self):
        super().__init__(None, title='ComboBox Changing Demo')
        panel = MainPanel(self)
        self.Show()


if __name__ == "__main__":
    app = wx.App(False)
    frame = MainFrame()
    app.MainLoop()

wx.EVT_TEXT– For text change events
wx.EVT_COMBOBOX– For changing item selection events

The on_selection() method is short and sweet. All it does is update cb_value to whatever the current selection is.

Give the code a try and see how it works!

Wrapping Up

Have fun and happy coding!

The post Letting Users Change a wx.ComboBox’s Contents in wxPython appeared first on The Mouse Vs. The Python.

Python must be doing something beautiful internally to support super long integers and today we find out what's under the hood. The article goes in-depth to explain design, storage, and operations on super long integers as implemented by Python.

I have been getting periodic deprecation notice emails from github for the last few months:

Hi @nenskins,
You recently used a password to access an endpoint through the GitHub API using okhttp/2.7.5. We will deprecate basic authentication using password to this endpoint soon:
https://api.github.com/
We recommend using a personal access token (PAT) with the appropriate scope to access this endpoint instead. Visit https://github.com/settings/tokens for more information.
Thanks, The GitHub Team

Hm, that @nenskins user, that is our old jenkins instance talking to github somehow. Apparently through basic auth. Only... where? Most of the github traffic seemed to use just an access token. Jenkins calls that the secret text type. Basic auth is type username with password in jenkins.

What it turned out to be was the github branch source plugin. This periodically looks at our github organisation to see if there are new projects or new branches that it missed. Normally github tells our jenkins when there's a new project or pull request or so.

Ok, on to the jenkins settings for my organisation. The confusing thing here is that the "credentials" setting says this:

Note that only "username with password" credentials are
supported. Existing credentials of other kinds will be filtered out. This
is because jenkins exercises GitHub API, and this last one does not
support other ways of authentication.

Huh? Github is refusing user/password basic auth, which is what this plugin only supports? I updated every plugin, but the problem still persisted.

I only got it after reading this bug report and especially this comment:

Isn't that message saying that you can continue to use basic auth so long as instead of using your actual password you use a personal access token. Generate a personal access token from the GitHub "Settings" page and store that personal access token in the Jenkins username / password credential as the password. Place your username as the username. Check that it works. It has been working that way for me.

Ah! So "github is refusing user/password basic auth" really means "github is refusing user/password basic auth". Using an access token instead of your password is actually fine.

The info in jenkins on those credentials actually mention that somewhat:

If your organization contains private repositories, then you need to
specify a credential from an user who have access to those
repositories. This is done by creating a "username with password"
credential where the password is GitHub personal access tokens. The
necessary scope is "repo".

So I visited https://github.com/settings/tokens and generated a new token with full "repo" rights (this is actually quite restricted in scope, despite the name).

In Jenkins I added a new global username/password credential with the github username + the access token and hurray, everything worked again.

In just a few days, I’ll be starting a new cohort of Weekly Python Exercise A1: Data structures for beginners. From my experience teaching Python for 20 years, I’d say that this is one of the best ways out there to improve your Python fluency. That’s because it combines actual practice, automated “pytest” tests, and community interactions.

But don’t believe me! Here’s what previous participants have said:

I was a total Python noob when I started. I just wanted to learn the syntax, how to look at problems and find the solution. You provided both… your teaching is instrumental in drilling some concepts into our brains.
I learned a lot of features of the language and had a fun time doing it. I also got to apply what I learned when programming for work.
I expected to see Python in real world examples. I am not disappointed. During WPE, there were many examples, with a wide variety of programming blueprints.
The exercises are perfect for me because they are right in my “wheelhouse”. I have enough background knowledge that the context of the problems is relevant in my experience, yet I can’t just rattle off the solutions instantly.

So, WPE was right for them. But is it right for you?

If you have been using Python for less than one year, then WPE A1 will help you to understand how and when to use core data structures.
If you haven’t quite grasped when to use lists vs. tuples vs. dicts vs. sets, then WPE A1 will make this clearer.
If you feel like you’ll never remember or understand the many methods on core data structures, then WPE A1 will help you to understand and remember.
If you’re tired of going to Stack Overflow and Google multiple times each day, then WPE A1 will help.

But don’t take my word for it: You can sample WPE, taking you through two questions and answers to show you how it works.

Moreover: I’m running a free Webinar on Monday, and will be happy to answer your questions about Weekly Python Exercise, and how it works. It’ll take place on Monday, at 17:00 Israel time; to convert it to your time zone, click here:

https://www.timeanddate.com/worldclock/fixedtime.html?msg=WPE+webinar&iso=20200113T17&p1=110&ah=1

You’ll then be able to join me at the Webinar via the following link:

https://zoom.us/j/193166387

There are six different WPE classes; this is one of three beginner-level classes I’ll be running in 2020. But it’s the only time I’ll be running A1 — data structures for beginners. So if you want to improve your understanding of data structures, including how and when to use them, this is your chance!

Join WPE A1: Data structures for beginners

Remember: I offer discounts for students, seniors/retirees/pensioners, and anyone living outside of the world’s 30 richest countries. E-mail me at reuven@lerner.co.il if you qualify, and I’ll send you a coupon code.

The post Is Weekly Python Exercise for you? appeared first on Reuven Lerner.

Because speed matters…

With the increased speed of SSDs single threaded file access does not always fully utilize the disk. The bottleneck more and more is the CPU itself.

When I started learning the great new language Rust I came across a crate called jwalk. This crate does directory scanning in parallel with a thread pool. The benchmarks are amazing. So I thought I write a Python module as a faster alternative to os.walk which uses jwalk. The result is the new module scandir-rs which can be found on pypi.org.

The API is a bit different and provides more features. But it should be easy to replace os.walk and os.scandir with scandir-rs.

Usage examples

Get statistics of a directory:

importscandir_rsasscandirprint(scandir.count.count("~/workspace",metadata_ext=True))

The same, but asynchronously in background using a class instance and a context manager:

importscandir_rsasscandirC=scandir.count.Count("~/workspace",metadata_ext=True))withC:whileC.busy():statistics=C.statistics# Do something

os.walk() example:

importscandir_rsasscandirforroot,dirs,filesinscandir.walk.Walk("~/workspace",iter_type=scandir.ITER_TYPE_WALK):# Do something

os.scandir() example:

importscandir_rsasscandirforentryinscandir.scandir.Scandir("~/workspace",metadata_ext=True):# Do something

Benchmarks

Now let’s have a look at some benchmarks. In the below table the line scandir_rs.walk.Walk returns comparable results to os.walk.

Linux with Ryzen 5 2400G and SSD

Directory ~/workspace with

22845 directories
321354 files
130 symlinks
22849 hardlinks
4 devices
1 pipes
4.6GB size and 5.4GB usage on disk

Time [s]	Method
0.547	os.walk (Python 3.7)
0.132	scandir_rs.count.count
0.142	scandir_rs.count.Count
0.237	scandir_rs.walk.Walk
0.224	scandir_rs.walk.toc
0.242	scandir_rs.walk.collect
0.262	scandir_rs.scandir.entries
0.344	scandir_rs.scandir.entries(metadata=True)
0.336	scandir_rs.scandir.entries(metadata_ext=True)
0.280	scandir_rs.scandir.Scandir.collect
0.262	scandir_rs.scandir.Scandir.iter
0.330	scandir_rs.scandir.Scandir.iter(metadata_ext=True)

Up to 2 times faster on Linux.

Windows 10 with Laptop Core i7-4810MQ @ 2.8GHz Laptop, MTF SSD

Directory C:\Windows with

84248 directories
293108 files
44.4GB size and 45.2GB usage on disk

Time [s]	Method
26.881	os.walk (Python 3.7)
4.094	scandir_rs.count.count
3.654	scandir_rs.count.Count
3.978	scandir_rs.walk.Walk
3.848	scandir_rs.walk.toc
3.777	scandir_rs.walk.collect
3.987	scandir_rs.scandir.entries
3.905	scandir_rs.scandir.entries(metadata=True)
4.062	scandir_rs.scandir.entries(metadata_ext=True)
3.934	scandir_rs.scandir.Scandir.collect
3.981	scandir_rs.scandir.Scandir.iter
3.821	scandir_rs.scandir.Scandir.iter(metadata_ext=True)

Up to 6.7 times faster on Windows 10.

Directory C:\testdir with

185563 directories
1641277 files
2696 symlinks
97GB size and 100.5GB usage on disk

Time [s]	Method
151.143	os.walk (Python 3.7)
7.549	scandir_rs.count.count
7.531	scandir_rs.count.Count
8.710	scandir_rs.walk.Walk
8.625	scandir_rs.walk.toc
8.599	scandir_rs.walk.collect
9.014	scandir_rs.scandir.entries
9.208	scandir_rs.scandir.entries(metadata=True)
8.925	scandir_rs.scandir.entries(metadata_ext=True)
9.243	scandir_rs.scandir.Scandir.collect
8.462	scandir_rs.scandir.Scandir.iter
8.380	scandir_rs.scandir.Scandir.iter(metadata_ext=True)

Up to 17.4 times faster on Windows 10.

Check out the scandir-rs module on github, licensed under the MIT license.

The uvloop project is great with an amazing performance, and a good replacement for the default asyncio module, if Linux is used. Unfortunately uvloop is not available for Windows.

Just for interest I wanted to know how a multi threaded version competes with the uvloop, which is single threaded. Is a comparable performance possible, or is a solution with multiple threads even faster?

So I took the echoserver.py example from the uvloop project and extended it with support for fastthreadpool.

Here is a simplified code example of a socket server with fastthreadpool:

defpool_echo_server(address,threads,size):sock=socket(AF_INET,SOCK_STREAM)sock.setsockopt(SOL_SOCKET,SO_REUSEADDR,1)sock.bind(address)sock.listen(threads)withsock:whileTrue:client,addr=sock.accept()pool.submit(pool_echo_client,client,size)defpool_echo_client(client,size):client.setsockopt(IPPROTO_TCP,TCP_NODELAY,1)b=bytearray(size)bl=[b]withclient:try:whileTrue:client.recvmsg_into(bl)client.sendall(b)except:passpool=fastthreadpool.Pool(8)pool.submit(pool_echo_server,addr,8,4096)pool.join()

For a complete example please have a look into the examples/bench directory.

Folowing benchmarks were executed on a Ryzen 7 with Linux Mint 18.3, kernel 4.13.0-37-generic and Python 3.6. echoserver.py and echoclient.py were executed on the same machine. echoclient.py was always run with 5 parallel workers. Only the message size was modified for the different tests. I’ve only compared uvloop using simple protocol with the multi threaded version.

Module	Server buffer size	Message size	Messages/s	MB/s
uvloop		1000 bytes	128220	122,28
uvloop		4kB	108581	424,14
uvloop		64kB	26004	1625,25
threads	4kB	1000 bytes	423226	403,62
threads	4kB	4kB	112033	437,63
threads	8kB	4kB	256438	1001,71
threads	16kB	4kB	381320	1489,53
threads	4kB	64kB	6690	418,13
threads	64kB	64kB	66672	4167
threads	128kB	64kB	62236	3889,75
threads	256kB	64kB	60292	3768,25

Folowing benchmarks were executed on a Core i5-8250U with Linux Mint 18.3, kernel 5.2.3-050203-generic and Python 3.7.2. echoserver.py and echoclient.py were executed on the same machine. echoclient.py was always run with 5 parallel workers. Only the message size was modified for the different tests.

Module	Server buffer size	Message size	Messages/s	MB/s
uvloop		1000 bytes	34628	33,21
uvloop		4kB	35975	140,53
uvloop		64kB	2170	135,68
uvloop/streams		1000 bytes	65588	62,55
uvloop/streams		4kB	61856	241,63
uvloop/streams		64kB		1625,25
uvloop/protocol		1000 bytes	34628	33,21
uvloop/protocol		4kB	35975	140,53
uvloop/protocol		64kB	26004	1625,25
asyncio		1000 bytes	128220	122,28
asyncio		4kB	108581	424,14
asyncio		64kB	26004	1625,25
asyncio/streams		4kB	108581	424,14
asyncio/streams		4kB	108581	424,14
asyncio/streams		64kB	26004	1625,25
asyncio/protocol		64kB	26004	1625,25
asyncio/protocol		4kB	108581	424,14
asyncio/protocol		64kB	26004	1625,25
threads	4kB	1000 bytes	423226	403,62
threads	4kB	4kB	112033	437,63
threads	8kB	4kB	256438	1001,71
threads	16kB	4kB	381320	1489,53
threads	4kB	64kB	6690	418,13
threads	64kB	64kB	66672	4167
threads	128kB	64kB	62236	3889,75
threads	256kB	64kB	60292	3768,25

The results show clearly that the multi threaded version performs much better if the buffer size on the server side is adjusted perfectly depending on the message size. The magic why the multi threaded example is so fast is that it uses recvmsg_into which writes the received data into a preallocated buffer.

Tomorrow (friday 2019-05-10), I'm going to the nice one-day Dutch python (and friends) pygrunn conference in Groningen (NL) again. Sold out, as usual. And rightfully so.

Anyway, to be honest, this blog entry is mostly about me testing my blog setup on a new laptop. I've got a linux laptop at work (which has its advantages) and since a few months I've got a second-hand macbook pro at home (because linux also has its disadvantages). But I never got around to setting up my blog software here till now. And I'm taking the macbook to the conference, so high time to get everything working :-)

I'm giving a talk myself on cookiecutter, the handy tool to create initial project structures. I'm a big fan of it, so I'm glad to tell the fine folk at pygrunn about the joys and advantages of using it. I gave the talk before at work and at the Amsterdam python meetup, but that was a couple of months ago. So I invested two days to polish and re-order and rehearse the presentation again. And I updated the presentation with some more photos and videos of my model railway...

(One of my summaries of a talk at the 2019 PyGrunn conference).

Patrick and Bogdan are students at Groningen University and they made the Flask Monitoring Dashboard. Some questions you might be interested in:

What is the performance of your flask-based apps?
How much is my web app being used?
Mean response time per endpoint?
Are there a lot of outliers? Most customers might have 20 items and one customer might have a couple of thousands: that'll hurt performance for only that specific customer. Important to understand.
Monitor performance improvements in case you deploy a new version.

What are your options?

Commercial monitoring like google analytics or pingdom.
Write your own monitoring in flask middleware.
No middleware.
Best: use the flask monitoring dashboard!

It offers several levels of monitoring that you can configure per endpoint. From just monitoring the last time the endpoint has been called to full profiling including outlier detection. They showed a webpage with the profiling information: it sure looked useful. As an example, there's a table view (hours vertically, days of the week horizontally) showing the relative usage per day/hour.

It works by "monkeypatching" the view functions. Flask has an internal dictionary with all the view functions! When profiling, a separate thread is started that periodically collects stacktrace info from the function being monitored.

Such monitoring of course has a performance impact. They're actually researching that right now. The lower levels of monitoring ("time it was last used" and "amount of usage") have no discernable impact. The two levels that do profiling have more overhead. For cpu/memory intensive tasks, the overhead is around 10%. For disk intensive tasks, it can hit 60%.

(One of my summaries of a talk at the 2019 PyGrunn conference).

Time zones... If you do datatime.datetime.now() you'll get a date+time without timezone information. You can get different results on your laptop (set to local time) and a server (that might be set to UTC).

You can use datetime.datetime.utcnow() that returns UTC time. But... without a timezone attached. Best is to request the time in a specific timezone.

There are gotchas regarding time. Check your time only once in a calculation. If you call .utcnow() multiple times, you can get different dates when your code runs around 0:00.

Same with time.time(): if the "ntp" daemon adjusts your system clock in the mean time you get weird results. For that, there is time.monotonic().

The original source for all time zone information is the time zone database (tzdata). You can download it and look at all the files per timezone. Interesting reading! Look at Istanbul's timezone. Daylight savings time being delayed by a day in a specific year because of a nationwide school exam. It was anounced a few weeks before. That's all in the time zone database.

So if you make a Docker now and still use it in two years' time, you might run into problems because summer time might have been abolished by the EU by then. So make sure you keep your time zone libary up to date.

(One of my summaries of a talk at the 2019 PyGrunn conference).

Servers used to be managed by proper wizards. But even wizards can be killed by a balrog. So... what happens when your sysadmin leaves?

The point of failure is the sysadmin.
Knowledge about infrastructure is centralised.
It is non-reproducible.

A solution is configuration management. Chef, ansible, saltstack, puppet. Configuration that's in source control instead of information in a sysadmin's head.

It is a reproducible way to build your infrastructure.
Source code, so everyone can see how a system works.
You can even version your infrastructure.

He'll use saltstack as an example, that's what they're using in his company. It is a master/minion system. So a central master pushes out commands to the minion systems.

For testing, he uses a tool called "kitchen", originally intended for puppet, which can however also be used with saltstack: https://kitchen.saltstack.com/ . He showed a demo where he created a couple of virtualbox machines and automatically ran the salt scripts on them.

You can then ssh to those boxes and check if they're OK.

But... that's manual work. So he started using testinfra and pytest. Testinfra helps you test infrastructure. There are build-in tests for checking if a package has been installed, for instance:

def test_emacs_installed(host):
    assert host.package("emacs").is_installed

You can run those tests via "kitchen". They use that to test their infrastructure setup from travis-ci.com.

(One of my summaries of a talk at the 2019 PyGrunn conference).

GraphQL is a different way to create APIs. So: differently from REST. You describe what you want to recieve back, instead of having a fixed REST api. With REST you often get too much or too little. You may need to do a lot of different calls.

REST often leads to a tight coupling between the front-end and the back-end. Changes to a REST api often break the front-end...

What they especially like about GraphQL: it is designed to have documentation build-in.

What they use: "graphene", "graphene-django" and "relay". On the front-end it is "apollo" (react-native, react, native ios/android).

With graphene-django you first have to define the data you're exposing. The various object types the types of the attributes, the relations, etc.

A tip: differentiate between "a user" and "me". Don't add more data to a user object if it turns out to be you. Just have a separate endpoint for "me". Way easier.

Caching: that needs to be done outside of graphene, it only can do a bit of caching right at then end on the resulting json. You're better off caching at the django object level.

A potential problem spot is the flexibility that GraphQL gives you in querying relations. You need to do quite some clever hacking to use django's select_related/prefetch_related speedups. You need to pay attention.

Uploading files is tricky. GraphQL itself does not handle file uploads. Their solution was to have a POST or PUT endpoint somewhere and to return the info about the uploaded file as GraphQL.

A downside of GraphQL: it is hard to predict the costs of a query. You can ask for adresses of contacts living at addresses of contacts and so on: you can kill the server that way. You could prevent that by, for instance, limiting the depth of the query.

There are reasons to stay with REST:

GraphQL is not a silver bullet. Yes, it has advantages.
The django/python tooling is still not very mature.
Determining the cost of a query is hard to predict beforehand.

But: just use GraphQL, it is fun!

(One of my summaries of a talk at the 2019 PyGrunn conference).

Writing scripts inside applications is often hard. Some of them luckily have an embedded version of python, but not all of them.

Two important terms: extending and embedding. Lots of scientific software is made available via extending: a python wrapper. Numpy and tensorflow, for instance.

The other way around is embedding: you put python inside your application. Useful for plugins, scripting. He doesn't know if jupyter notebooks are a good example of embedding, but in any case, jupyter is doing funny things with the python interpreter.

CPython, which is the version of python we're talking about, consists of three parts:

Bytecode compiler
Python virtual machine (the one running the compiled bytecode).
Python's C API, which allows other programs to call python. The C API is the opposite of python: it is hard to read and write :-) Oh, and the error messages are horrendous.

But... starting python from C and sending lines to the REPL, that's quite easy. PyRun-SimpleString(). He showed a 10-line C program that reads from stdin and lets python execute it.

He then expanded it to run in a separate thread. But soon his program crashed. The solution was to explicitly acquire and release the GIL ("global interpreter lock").

A problem: multiprocessing doesn't work. At least on windows. Starting another process from within python opens another version of the whole program you're in...

A suggestion: pybind11, a handy library for helping you embed python into c++. It especially helps with managing the GIL and for embedding python modules.

Something he sees often, is that it is used to parallellize code for the benefit of python:

Convert python types to c/c++ types
release GIL
Perform computation
acquire GIL
Convert c/c++ types to return type.

A note on deployment: just include python in your installer.

(One of my summaries of a talk at the 2019 PyGrunn conference).

He works for Dacom, a firm that writes software to help farmers be more effective. Precision farming is a bit of a buzzword nowadays. You can get public elevation data, you can let someone fly over your fields to take measurements or a cart can do automatic ground samples. This way you can make a "prescription map"where to apply more fertilizer and where less will do.

Another source of data is the equipment the farmer uses to drive over his field. As an example, the presentation looks at a potato harvester.

Which route did the harvester take through the field?
What was the yield (in potatoes per hectare) in all the various spots?

Some tools and libraries that they use:

Numpy: very efficient numerical processing. Arrays.
Pandas: dataseries.
Matplotlib: graph plotting library.
Postgis: geographical extension to the postgres databases.

Pandas is handy for reading in data, from csv for instance. It integrates nicely with matplotlib. With a one-liner you can let it create a histogram from the data.

With the .describe() function, you get basic statistics about your data.

Another example: a map (actually a graph, but it looks like a map) with color codes for the yield. The locations where the yields are lower are immediately clear this way.

When converting data, watch out with your performance. What can be done by pandas itself is much quicker than if it has to ask python to do it. For instance, creating a datetime from a year field, a month field, etc, that takes a long time as it basically happens per row. It is way quicker to let pandas concatenate the yyyy/mm/dd + time info into one string and then convert that one string to a datetime.

He showed the same example for creating a geometric point. It is quickest to create a textual POINT(1.234,8.234) string from two x/y fields and only then to convert it to a point.

Use the best tool for the job. Once he had massaged the data in pandas, he exported it to a postgis database table. Postgis has lots of geographical functions, like ST_CENTROID, ST_BUFFER, and ST_MAKELINE, which he used to do the heavy geographical lifting.

He then used the "geopandas" extension to let pandas read the resulting postgis query's result. Which could again be plotted with matplotlib.

Nice!

(One of my summaries of a talk at the 2019 PyGrunn conference).

He's a scientist. Quite often, he searches for python packages.

If you're writing python packages, you can learn how someone might search for your package.
If you don't write python packages, you can learn how to investigate.

Scientists try to solve unsolved problems. When doing it with computers, you basically do three things.

Perform simulations.
Set up simulations.
Analyze results.

Newton said something about "standing on the shoulders of giants". So basically he predicted the python package index! So much libraries to build upon!

A problem is that there is so much software. There are multiple libraries that can handle graphs (directed graphs, not diagrams). He's going to use that as an example.

Rule one: PR is important. If you don't know a package exists, it won't come on the list. Google, github discovery, stackoverflow, scientific liberature, friends, pygrunn talks, etc.

A README is critical. Without a good readme: forget it.

The five he found: graph-tool, networkx, igraph, python-graph scipy.sparse.csgraph.

Rule two: documentation is very important. Docs should showcase the capabilities. This goes beyond explaining it, it should show it.

I must be able to learn how to use your package from the docs. Just some API documentation is not enough, you need examples.

Watch out with technical jargon and terms. On the one hand: as a scientist you're often from a different field and you might not know your way around the terms. On the other hand, you do want to mention those terms to help with further investigation.

Bonus points for references to scientific literature.

Documentation gold standard: scikit-learn!

python-graph has no online docs, so that one's off the shortlist. The other four are pretty OK.

Rule three: it must be python3 compatible. On 1 january 2020 he's going to wipe python2 from all the machines that he has write access to.

All four packages are OK.

Rule four: it must be easy to install. So pypi (or a conda channel). You want to let pip/coda deal with your dependencies. If not, at least list them.

Pure python is desirable. If you need compilation of c/c++/fortran, you need all the build dependencies. This also applies to your dependencies.

He himself is a computer scientist, so he can compile stuff. But most scientists can't really do.

He himself actually wants to do research: he doesn't want to solve packaging problems.

graph-tool is not on pypi, networkx is pure python. scipy is fortran/c, but provides wheels. igraph is C core and not on pypi.

So scipy and networkx are left.

Rule five: it must be versatile. Your package must do everything. If your package does a lot, there are fewer dependencies in the future. And I have to learn fewer packages.

If it doesn't do everything, it might still be ok if it is extendable. He might even open a pull request to add the functionality that he needs.

Note: small projects that solve one project and solve it well are OK, too.

networkx: does too much to count. Nice. scipy.sparse.csgraph has six functions. So for now, networkx is the package of choice.

The first and third rules are hard rules: if it is a python2-only package it is out and if you can't find a package, you can't find it, period.

Conclusions

You need to invest effort to make ME try your package.
"My software is so amazing, so you should invest time and effort to use it": NO :-)
If it doesn't work in 15 minutes: next candidate.

(One of my summaries of a talk at the 2019 PyGrunn conference).

Imagine being a developer being woken up at night because your latest commit broke the website. You fix the issue, run the tests of your part of the code (which passes) and push to github. That runs all the tests and it fails in a completely unrelated piece of the code. But what is happening? Is the test wrong? Is your code wrong? "3 is not 90": what does that mean?

What does it mean that this fails? What is the test's promise? If a test you wrote fails, it should fail beautifully. It should tell exactly what's wrong:

assert num_something == 2, "The number should match the number of added items"

You can use pytest fixtures to at least make the data the test is working with clearer.

You can make fixtures that work as context managers:

@pytest.fixture
def test_with_teardown():
    thing = create_something()  # setup
    yield thing
    thing.destroy()  # teardown

A tip: have fixtures per subsystem. Assuming you have multiple test directories, one per subsystem. Give every subsystem its own conftest.py. Different subsystems might get different fixtures even though using the same name ("product" for instance). This way you can tweak your main fixture items per subsystem.

Disadvantage: it is implicit instead of explicit...
Advantage: the fixtures can stay minimal. Otherwise your fixture has to support all use cases.

You can parametrize fixtures:

@pytest.fixture(params=["no-user", "disabled-user", "read-only-user"])
def unauthorized_user():
    if param == ...
        return ...
    if param == ...
        ...

Tests using that fixture are run three times, once for every possible kind of unauthorized user!

You can do it even more elaborate. You can make a kind of a "build matrix" and use @pytest.mark.parametrize.

If every test needs a temporary database or a temporary something, you can pass auto_use=True to the fixture, that'll apply it automatically.

Pytest can help you with mocking, but sometimes you're better off setting up dependency injection. So adding a parameter to the method you're testing to accept some mocked item instead of its regular default.

If you think regular code is more important than the tests: he pro-tests :-)

You need tests because they give you a feeling of safety. If you feel safe, you dare to try things. Tests are a bit of a shared goal inside a team: you and your code want to belong. You want interaction: make sure your tests are communicative and helpful.

PyCharm’s debugger is one of its most popular features. But many just stick to the basics and don’t learn intermediate and advanced features. In this webinar, two of PyCharm’s core developers who work on the debugger show its less-known but powerful features, while talking a bit about the debugger architecture and future improvements.

Thursday, January 23rd
5:00 PM – 6:00 PM CET (11:00 AM – 12:00 PM EST)
Register here
Aimed at intermediate Python developers

Outline

Here’s what we’re thinking about covering. Got something you’d like to see? Add a comment below and we can try to work it into the schedule.

Architecture overview
Intermediate features
- More about breakpoints
- Use logging instead of print()
- Watch expressions
- Smart stepping and stepping filters
- Attach to process
- Show return values
- Run with Python Console
- On-demand loading

Speakers

Elizaveta Shashkova is a software developer of the PyCharm IDE at JetBrains. She’s been working on Python debugger for several years and currently she’s focused on Data Science tools.

Andrey Lisin is a software developer at JetBrains. He is the current maintainer of the PyCharm debugging subsystem. Before that, he was doing back-end development and machine learning.

One reason I’m learning Python is its ecosystem. A culture of documentation and the countless learning and training resources create opportunities of growing as a developer.

The table of contents of the Python documentation.

There are all sorts of free and paid tutorials, books, videos, courses, and other materials on all aspects of the language, the tools, and the libraries. For example, I maintain a list of free Python books.

Still, some important intermediate to advanced topics receive little or no attention.

So, I’d like to offer some suggestions and feedback to Python authors and instructors on what may interest a hobbyist like me. Here are some ideas for topics to cover. Although I found something relevant, the material I’ve seen is still missing something.

If you know of any such resources, please let me know. Not being a visual learner I’m more interested in text-based content than videos. I also prefer books to the more structured approach of courses.

System design

Some Python books present examples longer than the typical short code snippets of up to a few dozen lines.

But design considerations are incidental in these examples, of which the learner sees only the final version. There’s often no explicit discussion of how the code is structured or ended up like that as the focus is on demonstrating other features, like teaching a web framework such as Django or Flask.

I’d like resources that teach how to design and structure medium to large Python systems, preferably not just in highly popular domains like web development. Something that explains how an idea evolves through successive iterations from a few lines of code or a rough specification to a large, production-quality program.

The emphasis should be on the process, the design decisions and tradeoffs that move development forward.

Test-driven development

Test-driven development is valuable, especially for a dynamic and interactive language like Python. I’d like to learn more on how TDD, especially the "development" part, can help shape and evolve programs besides testing them.

The books and articles I’ve seen present the steps of the methodology almost mechanically, without providing the bigger picture and a path forward in the evolution of programs.

Python security

Another topic rarely covered if at all, even in advanced books, is writing secure Python code.

Many books discuss example programs that use web frameworks, SQL databases, or network servers but, as a hobby programmer, I wouldn't feel confident deploying on the open web something that doesn't prevent or address at least the most common security issues.

Most of the security advice I see comes from the typical computing environments of the early days of Python, such as avoiding pickle-ing from untrusted sources. What’s missing is some advice on contemporary web apps and how to design security into Python systems.

Mike Driscoll: Letting Users Change a wx.ComboBox’s Contents in wxPython

Creating GUI Applications with wxPython

Changing a ComboBox

Wrapping Up

Catalin George Festila: Python 3.7.5 : The this python package.

Mike Driscoll: Top 10 Most Read Mouse vs Python Articles of 2019

Mike Driscoll: Letting Users Change a wx.ComboBox’s Contents in wxPython

Creating GUI Applications with wxPython

Changing a ComboBox

Wrapping Up

Codementor: How python implements super long integers?

Reinout van Rees: Github basic auth deprecation and jenkins

Reuven Lerner: Is Weekly Python Exercise for you?

Speed Matters: scandir-rs

Usage examples

Benchmarks

Linux with Ryzen 5 2400G and SSD

Directory ~/workspace with

Windows 10 with Laptop Core i7-4810MQ @ 2.8GHz Laptop, MTF SSD

Directory C:\Windows with

Directory C:\testdir with

Speed Matters: A socketserver with threads vs uvloop

Reinout van Rees: Pygrunn preparations

Reinout van Rees: PyGrunn: monitoring and profiling Flask apps - Patrick Vogel & Bogdan Petre

Reinout van Rees: PyGrunn: a day has only 24 ± 1 hours - Miroslav Šedivý

Reinout van Rees: PyGrunn: testing your infrastructure code - Ruben Homs

Reinout van Rees: PyGrunn: lessons from using GraphQL in production - Niek Hoekstra & Jean-Paul van Oosten

Reinout van Rees: PyGrunn: embedding the python interpreter - Mark Boer

Reinout van Rees: PyGrunn: data processing and visualisation of tractor data - Erik-Jan Blanksma

Reinout van Rees: PyGrunn: python as a scientist's playground - Peter Kroon

Reinout van Rees: PyGrunn: advanced pytest - Òscar Vilaplana

PyCharm: Webinar: “Advanced Debugging in PyCharm”

Outline

Speakers

Paolo Amoroso: Ideas for Python Authors

System design

Test-driven development

Python security