Quantcast
Channel: Planet Python
Viewing all 23144 articles
Browse latest View live

Catalin George Festila: Python Qt4 - part 006.

$
0
0
Today I will deal with QFileDialog widget.
You can read the more about this widget here.
This allow us to open a dialog to load a resource - a file.
The example come with the base PyQt4 application window with a my_example dialog from fileDialogSample python class.
Into this python class I have some variable for file: file_name, data and my_file_open.
The my_text_edit for text area and my_button to open the QFileDialog.
Also the vbox variable to put all on QVBoxLayout from application.
Let's see the example:
import sys
from PyQt4 import QtGui
from PyQt4 import QtCore

class fileDialogSample(QtGui.QWidget):
def __init__(self):
QtGui.QWidget.__init__(self)

#make text area and button
self.my_text_edit = QtGui.QTextEdit()
self.my_button = QtGui.QPushButton('Select File', self)

#open the showDialog
self.my_button.clicked.connect(self.showDialog)

#put all into application area
vbox = QtGui.QVBoxLayout()
vbox.addWidget(self.my_text_edit)
vbox.addWidget(self.my_button)
self.setLayout(vbox)

#set title and geometry of application
self.setWindowTitle('File Dialog example')
self.setGeometry(50, 50, 300, 300)

#make a function with seeting for my QFileDialog
def showDialog(self):
file_name = QtGui.QFileDialog.getOpenFileName(self, 'Open file', 'C://')
my_file_open = open(file_name)
data = my_file_open.read()
self.my_text_edit.setText(data)
my_file_open.close()

#run the application
app = QtGui.QApplication(sys.argv)
my_dialog = fileDialogSample()
my_dialog.show()
sys.exit(app.exec_())
Now, just press the Select File button , take a text file and open it.

Mike C. Fletcher: Docker on Centos7 host underwhelms

$
0
0

So one of the things I've been working on recently is creating a docker-based container to host a complex application for video processing that requires a lot of OS-level setup. I, of course, work on a number of Ubuntu 17.04 machines for development, and docker there has proven to be exactly what you expect. You create a Dockerfile, and if it works here, it works there, no muss, no fuss. It does what it says on the box. Honestly it's been a pleasure to work with.

Then I went to run the containers on Centos7 hosts.

Note that this is about 90% of the value proposition of Docker, portable containers you can ship to a different host.

For those who haven't done this the default Centos7 install is an XFS filesystem. Yay for progress and all that. Oh, and the default on Centos7 for docker is to use overlayfs which is compatible with XFS, except it's only compatible with XFS if you happened to setup the non-default ftype=1 flag when creating the filesystem, but it doesn't check for that flag. Queue up much gnashing of teeth as the docker build process becomes a wonderful world of pain and suffering with RPM db corruptions, random failures to delete directories, and generally ridiculously non-predictable behaviour.

But, you say, overlayfs + XFS ftype=0 is an unsupported configuration. True, I might answer, but dang it, it's the default install of the tool on a default install of the OS. At the very least, refuse to run if you're on a configuration which is just going to fall over and die in exotic and hard-to-debug ways. Make the default the slow-as-dirt loopback fs if you are running against XFS with the wrong ftype, better that things run slow than you violate the entire *point* of the framework. We use docker to have a stable, reliable way to replicate work across environments, if that's not possible, error out and report the failure, or choose a reliable-but-slow alternative.

Apparently the solution is to have a separate set of block devices that you use for docker, and because that's a royal PITA to configure the default is to do silly things. Sigh.

Django Weekly: Django Weekly - This week in Django

$
0
0
Worthy Read

My experience from my profiler hunt motivated me to build an easy to use, user-friendly 2 step profiling middle-ware. Neither too complicate nor any unnecessary installations. You can check it out on : django-profile-middleware.
profiler

The same way an ORM allows us to forget about SQL when writing queries to the database, migrations make sure we don’t write a single ‘ALTER TABLE’ in our schema changes. Some may argue that’s bad: we “lose control” over a critical part of our infrastructure, we don’t know how to write SQL anymore when needed, we’re not sure how that operation is really translated into SQL, etc, etc. Ok, these points are actually valid. However, Django migrations module is more than just a way of automatically generating and applying SQL statements, it’s also a transparent API to write your own database changes in Python. It comes with wheels for those who need it (or trust enough) and tools for those who like to get their hands dirty.
database migration

Get the report.
sponsor

The slides look at how strings can be marked for translating using Django template tags, and methods from django.utils.translation in python code, template html and js.
translation

project

2018's DjangoCon Europe will be held in beautiful Heidelberg, from the 23rd to the 27th May. There is a lot to do, but it's very much worth it – DjangoCon Europe is an extremely friendly, open, inclusive, and informative (for beginners and advanced users alike) conference. We're looking for support in the following areas, but if you have other interests and want to help out, please contact us:
djangocon

The code for this blog post is written using the Django Web Framework. The techniques discussed here will certainly work with other frameworks (and perhaps even compiled languages), but one core component that we’ll be leveraging here is the presence of an explicit mapping of URL routes to the views that handle requests to those routes.
security

This is the second part of Django project optimisation series. The first part was about profiling and Django settings, it's available here. This part will be about working with database optimisation (Django models).
optimization


Projects

SOTA-Py - 258 Stars, 9 Fork
SOTA-Py is a Python-based solver for the policy- and path-based "SOTA" problems, using the algorithm(s) described in Tractable Pathfinding for the Stochastic On-Time Arrival Problem (also in the corresponding arXiv preprint) and previous works referenced therein.

djurl - 44 Stars, 3 Fork
Simple yet helpful library for writing Django urls by an easy, short an intuitive way.

django-bot - 10 Stars, 2 Fork
A django library that makes it easier to develop bots with a common interface for messaging platforms (eg. Slack, FB messenger) and natural langauge parsers (eg. api.ai).

recruitr: - 6 Stars, 0 Fork
Online Code Judging Tool

Catalin George Festila: The speech python module.

$
0
0
About this python module can be read here.
It's a little un-documented and I have not found tutorials about this python module but I tested with a simple example.
I'm sure he can do more than I tried with my example.
First , the install of this python module:
C:\Python27\Scripts>pip install speech
Collecting speech
Downloading speech-0.5.2.tar.gz
Installing collected packages: speech
Running setup.py install for speech ... done
Successfully installed speech-0.5.2
Let's see more baout this python module:
>>> dir(speech)
['Listener', '_ListenerBase', '_ListenerCallback', '__builtins__', '__doc__', '__file__', '__name__', '__package__'
, '_constants', '_ensure_event_thread', '_eventthread', '_handlerqueue', '_listeners', '_recognizer',
'_startlistening', '_voice', 'gencache', 'input', 'islistening', 'listenfor', 'listenforanything', 'pythoncom',
'say', 'stoplistening', 'thread', 'time', 'win32com']>>> help(speech)
Help on module speech:

NAME
speech - speech recognition and voice synthesis module.

FILE
c:\python27\lib\site-packages\speech.py

DESCRIPTION
Please let me know if you like or use this module -- it would make my day!

speech.py: Copyright 2008 Michael Gundlach (gundlach at gmail)
License: Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

For this module to work, you'll need pywin32 (http://tinyurl.com/5ezco9
for Python 2.5 or http://tinyurl.com/5uzpox for Python 2.4) and
the Microsoft Speech kit (http://tinyurl.com/zflb).


Classes:
Listener: represents a command to execute when phrases are heard.

Functions:
say(phrase): Say the given phrase out loud.
input(prompt, phraselist): Block until input heard, then return text.
stoplistening(): Like calling stoplistening() on all Listeners.
islistening(): True if any Listener is listening.
listenforanything(callback): Run a callback when any text is heard.
listenfor(phraselist, callback): Run a callback when certain text is heard.
Let's make simple example with one script that say us something:
>>> speech.say('Hello Catalin George')
The result of this line of code will be hear into your audio device like: Hello Catalin George .

Codementor: Debugging in Python

$
0
0
Learn how to easily debug your Python application with PDB.

Kushal Das: Story of mashing in Bodhi (Fedora updates system)

$
0
0

Bodhi mash

This is mostly Fedora release engineering work specific post. Feel free to skip if you are not into Fedora land.

I started packaging for Fedora from 2006, back in Fedora Extras repository days. Now, we have only one big repo. Packagers can just submit their new packages or updates to this repository. Now, to do so, first, we will have to get the package reviewed (in the case of new packages). Later, after getting a git repo for the spec file, one can build the package in Koji, this will create the RPMs and the SRPM in the koji server. Next, we mark the package for a push to the testing repository (or after certain criteria, mark them to push to the stable repository). For rawhide (the latest of everything) repository, we don’t have to do this, everything we build for rawhide will be automatically pushed.

What happens after we mark the package ready for push?

Every day, someone from the Fedora release engineering team is on push duty. This person calls bodhi-push commands, which in turn generates a fedmsg, which then consumed by a the consumer, defined in bodhi masher.py to generate new repos from the updates, and also mark the packages properly in koji. It also composes the ostree tree. We will now dig into the source of this command to see what all happens.

The work starts at line 271 At line 290, it initiates the state. There are use cases when it resumes from the middle of a previous mash, the if-else block at line 301 helps to load or save the state of the mash accordingly. To load or save it actually reads up a JSON file and a few other steps.

Next, at line 306, it calls a function load_updates. This function talks to the database and finds out the list of packages which need to be pushed (the final repo will be generated from all the packages in that particular tag) for updates. Now all of this work happens based on a unique tag, like f25-updates. The verify_updates function makes sure that list of the updates has the right updates for the current tag (it can happen that someone tries to mark an F24 build as the F25 update).

We have to do some more checks if this push is for stable repositories, like if the build has enough positive karma, or has it spent 7 days in the testing repository. At line 310, the perform_gating function does these gating checks.

At line 312, in the determine_and_perform_tag_actions functions, we talk to Koji and update the tags in Koji as required. Either it adds a new tag or moves the build to a new tag. Like, from candidate to the testing. After that we update the bugzilla bug entries if there are any update with a bug associated, and in the next few lines we remove any extra tags, and also update the comps files (from the git repo). At line 321, we create a new thread which does the real mash work using the mash command from the package with the same name. While we wait for this thread to finish on line 330, we have already created some digest mail information, and other updateinfo (in the uinfo variable, which is the content of updateinfo.xml). In line 337 we check if we are supposed to do build ostree repos for atomic, if yes, then we call the compose_atomic_trees function.

Then we sync the newly generated repo with the master mirror, and wait for it to finish (on line 345). After that it sends out fedmsg notifications, modifies the bugs in the bugzilla, adds comments to the koji updates, and also does the announcement emails (the function calls are documented).

The main reason I wanted to write this post is the upcoming change where we will stop using the mash command, and instead call pungi to do the work. As pungi can also do ostree tree compose.

I want to specially thank Patrick who helped me to understand this piece of code (as he does for many other things).

Amjith Ramanujam: FuzzyFinder - in 10 lines of Python

$
0
0

Introduction:

FuzzyFinder is a popular feature available in decent editors to open files. The idea is to start typing partial strings from the full path and the list of suggestions will be narrowed down to match the desired file. 

Examples: 

Vim (Ctrl-P)

Sublime Text (Cmd-P)

This is an extremely useful feature and it's quite easy to implement.

Problem Statement:

We have a collection of strings (filenames). We're trying to filter down that collection based on user input. The user input can be partial strings from the filename. Let's walk this through with an example. Here is a collection of filenames:

When the user types 'djm' we are supposed to match 'django_migrations.py' and 'django_admin_log.py'. The simplest route to achieve this is to use regular expressions. 

Solutions:

Naive Regex Matching:

Convert 'djm' into 'd.*j.*m' and try to match this regex against every item in the list. Items that match are the possible candidates.

This got us the desired results for input 'djm'. But the suggestions are not ranked in any particular order.

In fact, for the second example with user input 'mig' the best possible suggestion 'migrations.py' was listed as the last item in the result.

Ranking based on match position:

We can rank the results based on the position of the first occurrence of the matching character. For user input 'mig' the position of the matching characters are as follows:

Here's the code:

We made the list of suggestions to be tuples where the first item is the position of the match and second item is the matching filename. When this list is sorted, python will sort them based on the first item in tuple and use the second item as a tie breaker. On line 14 we use a list comprehension to iterate over the sorted list of tuples and extract just the second item which is the file name we're interested in.

This got us close to the end result, but as shown in the example, it's not perfect. We see 'main_generator.py' as the first suggestion, but the user wanted 'migration.py'.

Ranking based on compact match:

When a user starts typing a partial string they will continue to type consecutive letters in a effort to find the exact match. When someone types 'mig' they are looking for 'migrations.py' or 'django_migrations.py' not 'main_generator.py'. The key here is to find the most compact match for the user input.

Once again this is trivial to do in python. When we match a string against a regular expression, the matched string is stored in the match.group(). 

For example, if the input is 'mig', the matching group from the 'collection' defined earlier is as follows:

We can use the length of the captured group as our primary rank and use the starting position as our secondary rank. To do that we add the len(match.group()) as the first item in the tuple, match.start() as the second item in the tuple and the filename itself as the third item in the tuple. Python will sort this list based on first item in the tuple (primary rank), second item as tie-breaker (secondary rank) and the third item as the fall back tie-breaker. 

This produces the desired behavior for our input. We're not quite done yet.

Non-Greedy Matching

There is one more subtle corner case that was caught by Daniel Rocco. Consider these two items in the collection ['api_user', 'user_group']. When you enter the word 'user' the ideal suggestion should be ['user_group', 'api_user']. But the actual result is:

Looking at this output, you'll notice that api_user appears before user_group. Digging in a little, it turns out the search user expands to u.*s.*e.*r; notice that user_group has two rs, so the pattern matches user_gr instead of the expected user. The longer match length forces the ranking of this match down, which again seems counterintuitive. This is easy to change by using the non-greedy version of the regex (.*? instead of .*) on line 4. 

Now that works for all the cases we've outlines. We've just implemented a fuzzy finder in 10 lines of code.

Conclusion:

That was the design process for implementing fuzzy matching for my side project pgcli, which is a repl for Postgresql that can do auto-completion. 

I've extracted fuzzyfinder into a stand-alone python package. You can install it via 'pip install fuzzyfinder' and use it in your projects.

Thanks to Micah Zoltu and Daniel Rocco for reviewing the algorithm and fixing the corner cases.

If you found this interesting, you should follow me on twitter

Epilogue:

When I first started looking into fuzzy matching in python, I encountered this excellent library called fuzzywuzzy. But the fuzzy matching done by that library is a different kind. It uses levenshtein distance to find the closest matching string from a collection. Which is a great technique for auto-correction against spelling errors but it doesn't produce the desired results for matching long names from partial sub-strings.

Doug Hellmann: tarfile — Tar Archive Access — PyMOTW 3

$
0
0
The tarfile module provides read and write access to UNIX tar archives, including compressed files. In addition to the POSIX standards, several GNU tar extensions are supported. UNIX special file types such as hard and soft links, and device nodes are also handled. Read more… This post is part of the Python Module of the … Continue reading tarfile — Tar Archive Access — PyMOTW 3

Catalin George Festila: Python tutor - web tool for python programming.

$
0
0
The website come with this intro about this web tool.
Python Tutor, created by Philip Guo, helps people overcome a fundamental barrier to learning programming: understanding what happens as the computer runs each line of source code.
Using this tool, you can write Python, Java, JavaScript, TypeScript, Ruby, C, and C++ code in your web browser and visualize what the computer is doing step-by-step as it runs your code.
Over 3.5 million people in over 180 countries have used Python Tutor to visualize over 30 million pieces of code, often as a supplement to textbooks, lectures, and online tutorials.

I tested and worked very well.
You can use python programming language 2.7 and 3.6 versions.
No need to import python modules , you will got error.
Just programming on fly to test and see the result.
The website come with some example to see how to deal with this tool.
Let's see some examples:

example with factorial :

# dumb recursive factorial
def fact(n):
if (n <= 1):
return 1
else:
return n * fact(n - 1)

print(fact(6))

example with for - else:

# find primes using a for-else construct
for n in range(2, 10):
x_range = range(2, n)
for x in x_range:
if n % x == 0:
break
else:
# loop fell through without finding a factor
print(n)

example with inputs:

prefix = "Hello "

n1 = raw_input("Enter your name")

n2 = raw_input("Enter another name")

res = prefix + n1 + " and " + n2
print(res)
To run your script just press: Visualize Execution or Live Programming Mode buttons and the will run step by step with:
First , Back, Forward and Last.
One good feature of this tool - with a single line of JavaScript code, you can embed a Python Tutor visualization within any webpage.
Another good feature is COLLABORATE to learn together - this allow us to give and get direction with real-time python programming.
Can be a good tool for python chat users.
Let's show you a screenshot to see how this tool working with python scripting.

Caktus Consulting Group: Readability Counts (PyCon 2017 Must-See Talk 6/6)

$
0
0

Part 6 in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

"Readability Counts" was a good talk about why your code should be readable and how you get it there. One of the things I appreciated was that while it was very developer-focused, it was human-oriented rather than technical.

In his presentation, Trey Hunner shared four reasons why code should be readable:

  • It makes your life easier
  • Code is more often read than written
  • It is easier to maintain readable code
  • It’s easier to onboard new team members

He also shared a few best practices to achieve this, including usage of white space, line breaks, and code structure; descriptive naming; and choosing the right construct and coding idioms.

GoDjango: Django 1.11+ django.contrib.auth Class Based Views - Part 2 - Password Change and Reset

$
0
0
Since we can log in and logout what about managing our password? Learn the power of using the builtin Generic Class Based Views now in django.contrib.auth. They are simple to use once you know about them.
Watch Now...

Daniel Bader: How to Reverse a List in Python

$
0
0

How to Reverse a List in Python

A step-by-step tutorial on the three main ways to reverse a Python list or array: in-place reversal, list slicing, and reverse iteration.

Reversing a list is a common operation in Python programming.

For example, imagine you had a sorted list of customer names that your program displays in alphabetical (A-Z) order. Some of your users would like to view the customer list so that the names are in reverse alphabetical order. How are you going to flip the order of this existing list on its head? Or in other words:

What’s the best way to reverse the order of a list in Python?

In this article you’ll see three different ways to achieve this result in “plain vanilla” Python, meaning without the use of any third-party libraries:

  1. Reversing a list in-place with the list.reverse() method
  2. Using the “[::-1]” list slicing trick to create a reversed copy
  3. Creating a reverse iterator with the reversed() built-in function

All examples I’m using here will be based on the following list object containing the numbers 1 through 5:

# You have this:[1,2,3,4,5]# And you want that:[5,4,3,2,1]

Ready? Let’s reverse some lists together!

Option #1: Reversing a List In-Place With the list.reverse() Method

Every list in Python has a built-in reverse() method you can call to reverse the contents of the list object in-place. Reversing the list in-place means won’t create a new list and copy the existing elements to it in reverse order. Instead, it directly modifies the original list object.

Here’s an example:

>>>mylist=[1,2,3,4,5]>>>mylist[1,2,3,4,5]>>>mylist.reverse()None>>>mylist[5,4,3,2,1]

As you can see, calling mylist.reverse() returned None, but modified the original list object. This implementation was chosen deliberately by the developers of the Python standard library:

The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence. (Source: Python 3 Docs)

In-place reversal has some benefits and some downsides. On the plus side, it’s a fast operation—shuffling the list elements around doesn’t require much extra memory, as we’re not creating a full copy of the list.

However, reversing a list in-place overwrites the original sort order. This could be a potential downside. (Of course, to restore the original order you coud simply reverse the same list again.)

From a code readability standpoint, I like this approach. The syntax is clear and easy to understand, even for developers new to Python or someone who comes from another language background.

Option #2: Using the “[::-1]” Slicing Trick to Reverse a Python List

Python’s list objects have an interesting feature called slicing. You can view it as an extension of the square-brackets indexing syntax. It includes a special case where slicing a list with “[::-1]” produces a reversed copy:

>>>mylist[1,2,3,4,5]>>>mylist[::-1][5,4,3,2,1]

Reversing a list this way takes up a more memory compared to an in-place reversal because it creates a (shallow) copy of the list. And creating the copy requires allocating enough space to hold all of the existing elements.

Note that this only creates a “shallow” copy where the container is duplicated, but not the individual list elements. Instead of duplicating the list elements themselves, references to the original elements are reused in the new copy of the container. If the elements are mutable, modifying an element in the original list will also be reflected in the copy.

The biggest downside to reversing a list with the slicing syntax is that it uses a more advanced Python feature that some people would say is “arcane.” I don’t blame them—list slicing is fast, but also a little difficult to understand the first time you encounter its quirky syntax.

When I’m reading Python code that makes use of list slicing I often have to slow down and concentrate to “mentally parse” the statement, to make sure I understand what’s going on. My biggest gripe here is that the “[::-1]” slicing syntax does not communicate clearly enough that it creates a reversed copy of the original list.

Using Pythons’s slicing feature to reverse a list is a decent solution, but it can be a difficult to read to the uninitiated. Be sure to remember the wise words of master Yoda: With great power, great responsibility comes🙂

Sidebar: How does list slicing work in Python?

Reversing a list this way takes advantage of Python’s “slicing” syntax that can be used to do a number of interesting things. List slicing uses the “[]” indexing syntax with the following “[start:stop:step]” pattern:

>>>mylist[start:end:step]>>>mylist[1,2,3,4,5]>>>mylist[1:3][2,3]

Adding “[1:3]” index tells Python to give us a slice of the list from index 1 to index 2. To avoid off-by-one errors it’s important to remember that the upper bound is exclusive—this is why we only got [2, 3] as the sub-list from the [1:3] slice.

All of the indexes are optional, by the way. You can leave them out and, for example, create a full (shallow) copy of a list like this:

>>>mylist[::][1,2,3,4,5]

The step parameter, sometimes called the stride, is also interesting. Here’s how you can create a copy of a list that only includes every other element of the original:

>>>mylist[::2][1,3,5]

Earlier we used the same “step” trick to reverse a list using slicing:

>>>mylist[::-1][5,4,3,2,1]

We ask Python to give us the full list (::), but to go over all of the elements from back to front by setting the step to -1. Pretty neat, eh?

Option #3: Creating a Reverse Iterator With the reversed() Built-In Function

Reversing a list using reverse iteration with the reversed() built-in is another option. It neither reverses a list in-place, nor does it create a full copy. Instead we get a reverse iterator we can use to cyle through the elements of the list in reverse order:

>>>mylist=[1,2,3,4,5]>>>foriteminreversed(mylist):...print(item)54321>>>mylist>>>[1,2,3,4,5]

Using reversed() does not modify the original list. In fact all we get is a “view” into the existing list that we can use to look at all the elements in reverse order. This is a powerful technique that takes advantage of Python’s iterator protocol.

So far, all we did was iterating over the elements of a list in reverse order. But how can you create a reversed copy of a list using Python’s reversed() function?

Here’s how:

>>>mylist=[1,2,3,4,5]>>>list(reversed(mylist))[5,4,3,2,1]

Notice how I’m calling the list() constructor on the result of the reversed() function?

Using the list constructor built-in keeps iterating until the (reverse) iterator is exhausted, and puts all the elements fetched from the iterator into a new list object. And this gives us the desired result: A reversed shallow copy of the original list.

I really like this reverse iterator approach for reversing lists in Python. It communicates clearly what is going on, and even someone new to the language would intuitively understand we’re creating a reversed copy of the list. And while understanding how iterators work at a deeper level is helpful, it’s not absolutely necessary to use this technique.

Summary: Reversing Lists in Python

List reversal is a fairly common operation in programming. In this tutorial we covered three different approaches for reversing a list or array in Python. Let’s do a quick recap on each of those approaches before I’ll give you my final verdict on which option I recommend the most:

Option 1: list.reverse()

Python lists can be reversed in-place with the list.reverse() method. This is a great option to reverse the order of a list (or any mutable sequence) in Python. It modifies the original container in-place which means no additional memory is required. However the downside is, of course, that the original list is modified.

>>>lst=[1,2,3,4,5]>>>lst.reverse()>>>lst[5,4,3,2,1]
  • Reverses the list in-place
  • Fast, doesn’t take up extra memory
  • Modifies the original list

Option 2: List Slicing Trick

You can use Python’s list slicing syntax to create a reversed copy of a list. This works well, however it is slightly arcane and therefore not very Pythonic, in my opinion.

>>>lst=[1,2,3,4,5]>>>lst[::-1][5,4,3,2,1]
  • Creates a reversed copy of the list
  • Takes up memory but doesn’t modify the original

Option 3: reversed()

Python’s built-in reversed() function allows you to create a reverse iterator for an existing list or sequence object. This is a flexible and clean solution that relies on some advanced Python features—but it remains readable due to the clear naming of the reversed() function.

>>>lst=[1,2,3,4,5]>>>list(reversed(lst))[5,4,3,2,1]
  • Returns an iterator that returns elements in reverse order
  • Doesn’t modify the original
  • Might need to be converted into a list object again

If you’re wondering what the “best” way is to reverse a list in Python my answer will be: “It depends.” Personally, I like the first and third approach:

  • The list.reverse() method is fast, clear and speaks for itself. Whenever you have a situation where you want to reverse a list in-place and don’t want a copy and it’s okay to modify the original, then I would go with this option.

  • If that isn’t possible, I would lean towards the reversed iterator approach where you call reversed() on the list object and you either cycle through the elements one by one, or you call the list() function to create a reversed copy. I like this solution because it’s fast and clearly states its intent.

I don’t like the list slicing trick as much. It feels “arcane” and it can be difficult to see at a glance what’s going on. I try to avoid using it for this reason.

Note that there are other approaches like implementing list reversal from scratch or reversing a list using a recursive algorithm that are common interview questions, but not very good solutions for Python programming in the “real world.” That’s why I didn’t cover them in this tutorial.

If you’d like to dig deeper into the subject, be sure to watch my YouTube tutorial on list reversal in Python. It’s also embedded at the top of the article. Happy Pythoning!

Kushal Das: Switching to Emacs land

$
0
0

After 16 years with Vi(m), a week back I switched to Emacs as my primary editor. I used Emacs for a few days in 2010, when it was suggested to me by the #lisp channel. But, neither I continued, nor I understood the keystrokes well.

But, why now?

I was trying out Org Mode in Vim. Looking at the strange key-combinations, I felt it is not for me. I decided to give Emacs a proper try for the first time. If you know about our summer training, or saw me discussing about GNU/Linux world in any college, you would have seen me to suggest vim to everyone as a starting point. Mostly because of two reasons.

  • It is there on default Fedora installation.
  • I still think it is easy enough for the beginners to start with.

For me, to learn anything new, I prefer to use it regularly. Let it be a programming language, or a particular tool. I wanted to see how Org Mode works in Emacs, but, to do so well I had to use it myself. That also means I will have to use Emacs regularly.

How did I start?

I used a few different sources to start reading, and also configuring my init.el file. Shakthi Kannan has some excellent articles on Emacs. I also found another site which introduces Emacs and related configurations well. I am going to suggest both sites to anyone starting with Emacs. Shakthi also has a very good reference card for the keystrokes.

In the #dgplug IRC channel, maxking suggested to start using eshell and magit inside of Emacs. I am using the presentation from Shakthi to learn magit.

The last few blog posts (including this one), and also a few commits in random places were done using this setup (inside of Emacs).

The problems

I am not going to say this was very smooth. Remembering the keystrokes is always difficult. Getting them into the muscle memory is even more time consuming. But, there are few things which I found really difficult (for me).

  • Marking for cut/copy/paste.
  • Sometimes I see the same buffer in two split areas, just can not close them easily.
  • Spell check keystrokes

All the other regular Emacs users I know, are using it for more than 10 years. I am not trying to hurry, I will slowly learn the various facts and HOWTOs. The famous XKCD post fits well in this regard.

Eli Bendersky: Interacting with a long-running child process in Python

$
0
0

The Python subprocess module is a powerful swiss-army knife for launching and interacting with child processes. It comes with several high-level APIs like call, check_output and (starting with Python 3.5) run that are focused at child processes our program runs and waits to complete.

In this post I want to discuss a variation of this task that is less directly addressed - long-running child processes. Think about testing some server - for example an HTTP server. We launch it as a child process, then connect clients to it and run some testing sequence. When we're done we want to shut down the child process in an orderly way. This would be difficult to achieve with APIs that just run a child process to completion synchronously, so we'll have to look at some of the lower-level APIs.

Sure, we could launch a child process with subprocess.run in one thread and interact with it (via a known port, for example) in another thread. But this would make it tricky to cleanly terminate the child process when we're done with it. If the child process has an orderly termination sequence (such as sending some sort of "quit" command), this is doable. But most servers do not, and will just spin forever until killed. This is the use-case this post addresses.

Launch, interact, terminate and get all output when done

The first, simplest use case will be launching an HTTP server, interacting with it, terminating it cleanly and getting all the server's stdout and stderr when done. Here are the important bits of the code (all full code samples for this post are available here), tested with Python 3.6:

defmain():proc=subprocess.Popen(['python3','-u','-m','http.server','8070'],stdout=subprocess.PIPE,stderr=subprocess.STDOUT)try:time.sleep(0.2)resp=urllib.request.urlopen('http://localhost:8070')assertb'Directory listing'inresp.read()finally:proc.terminate()try:outs,_=proc.communicate(timeout=0.2)print('== subprocess exited with rc =',proc.returncode)print(outs.decode('utf-8'))exceptsubprocess.TimeoutExpired:print('subprocess did not terminate in time')

The child process is an HTTP server using Python's own http.server module, serving contents from the directory it was launched in. We use the low-level Popen API to launch the process asynchronously (meaning that Popen returns immediately and the child process runs in the background).

Note the -u passed to Python on invocation: this is critical to avoid stdout buffering and seeing as much of stdout as possible when the process is killed. Buffering is a serious issue when interacting with child processes, and we'll see more examples of this later on.

The meat of the sample happens in the finally block. proc.terminate() sends the child process a SIGTERM signal. Then, proc.communicate waits for the child to exit and captures all of its stdout. communicate has a very convenient timeout argument starting with Python 3.3 [1], letting us know if the child does not exit for some reason. A more sophisticated technique could be to send the child a SIGKILL (with proc.kill) if it didn't exit due to SIGTERM.

If you run this script, you'll see the output:

$ python3.6 interact-http-server.py
== subprocess exited with rc = -15
Serving HTTP on 0.0.0.0 port 8070 (http://0.0.0.0:8070/) ...
127.0.0.1 - - [05/Jul/2017 05:48:34] "GET / HTTP/1.1" 200 -

The return code of the child is -15 (negative means terminated by a signal, 15 is the numeric code for SIGTERM). The stdout was properly captured and printed out.

Launch, interact, get output in real time, terminate

A related use case is getting the stdout of a child process in "real-time" and not everything together at the end. Here we have to be really careful about buffering, because it can easily bite and deadlock the program. Linux processes are usually line-buffered in interactive mode and fully buffered otherwise. Very few processes are fully unbuffered. Therefore, reading stdout in chunks of less than a line is not recommended, in my opinion. Really, just don't do it. Standard I/O is meant to be used in a line-wise way (think of how all the Unix command-line tools work); if you need sub-line granularity, stdout is not the way to go (use a socket or something).

Anyway, to our example:

defoutput_reader(proc):forlineiniter(proc.stdout.readline,b''):print('got line: {0}'.format(line.decode('utf-8')),end='')defmain():proc=subprocess.Popen(['python3','-u','-m','http.server','8070'],stdout=subprocess.PIPE,stderr=subprocess.STDOUT)t=threading.Thread(target=output_reader,args=(proc,))t.start()try:time.sleep(0.2)foriinrange(4):resp=urllib.request.urlopen('http://localhost:8070')assertb'Directory listing'inresp.read()time.sleep(0.1)finally:proc.terminate()try:proc.wait(timeout=0.2)print('== subprocess exited with rc =',proc.returncode)exceptsubprocess.TimeoutExpired:print('subprocess did not terminate in time')t.join()

The sample is similar except for how stdout is handled; there's no more calls to communicate; instead, proc.wait just waits for the child to exit (after SIGTERM has been sent). A thread polls the child's stdout attribute, looping as long as new lines are available and printing them immediately. If you run this sample, you'll notice that the child's stdout is reported in real-time, rather than as one lump at the end.

The iter(proc.stdout.readline, b'') snippet is continously calling proc.stdout.readline(), until this call returns an empty bytestring. This only happens when proc.stdout is closed, which occurs when the child exits. Thus, while it may seem like the reader thread might never terminate - it always will! As long as the child process is running, the thread will dutifully block on that readline; as soon as the child terminates, the readline call returns b'' and the thread exits.

If we don't want to just print the captured stdout, but rather do something with it (such as look for expected patterns), this is easy to organize with Python's thread-safe queue. The reader thread becomes:

defoutput_reader(proc,outq):forlineiniter(proc.stdout.readline,b''):outq.put(line.decode('utf-8'))

And we launch it with:

outq=queue.Queue()t=threading.Thread(target=output_reader,args=(proc,outq))t.start()

Then at any point we can check if there's stuff in the queue by using its non-blocking mode (the full code sample is here):

try:line=outq.get(block=False)print('got line from outq: {0}'.format(line),end='')exceptqueue.Empty:print('could not get line from queue')

Direct interaction with the child's stdin and stdout

This sample is getting into dangerous waters; the subprocess module documentation warns against doing the things described here due to possible deadlocks, but sometimes there's simply no choice! Some programs like using their stdin and stdout for interaction. Alternatively, you may have a program with an interactive (interpreter) mode you'd like to test - like the Python interepreter itself. Sometimes it's OK to feed this program all its input at once and then check its output; this can, and should be done with communicate - the perfect API for this purpose. It properly feeds stdin, closes it when done (which signals many interactive programs that game's over), etc. But what if we really want to provide additional input based on some previous output of the child process. Here goes:

defmain():proc=subprocess.Popen(['python3','-i'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)# To avoid deadlocks: careful to: add \n to output, flush output, use# readline() rather than read()proc.stdin.write(b'2+2\n')proc.stdin.flush()print(proc.stdout.readline())proc.stdin.write(b'len("foobar")\n')proc.stdin.flush()print(proc.stdout.readline())proc.stdin.close()proc.terminate()proc.wait(timeout=0.2)

Let me reiterate what the comment in this code sample is saying:

  • When sending input to a line interpreter, don't forget to send the actual newline.
  • Always flush the stream after placing data into it, since it may be buffered.
  • Use readline to get input from the line interpreter.

We have to be very careful to avoid the following situation:

  1. We send data to the child's stdin, but it doesn't get the complete input for some reason (lack of newline, buffering etc.)
  2. We then invoke readline to wait for the reply.

Since the child is still waiting for input to complete (step 1), our step 2 may hang forever. This is a classic deadlock.

In the end of the interaction, we close the child's stdin (this is optional but useful for some kinds of child processes) call terminate and then wait. It would be better to send the child process some sort of "exit" command (quit() in the case of the Python interpreter); the terminate here is to demonstrate what we have to do if the other options are unavailable. Note that we could also use communicate here instead of wait to capture the stderr output.

Interact using non-blocking reads and stoppable threads

The final sample demonstrates a slighly more advanced scenario. Suppose we're testing a long-lived socket server, and we're interested in orchestrating complex interactions with it, perhaps with multiple concurrent clients. We'll also want a clean shut-down of the whole setup of threads and child processes. The full code sample is here; what follows is a couple of representative snippets. The key ingredient is this socket reading funtion, meant to be run in its own thread:

defsocket_reader(sockobj,outq,exit_event):whilenotexit_event.is_set():try:buf=sockobj.recv(1)iflen(buf)<1:breakoutq.put(buf)exceptsocket.timeout:continueexceptOSErrorase:break

Best used with a socket that has a timeout set on it, this function will repeatedly monitor the socket for new data and push everything it receives [2] into outq, which is a queue.Queue. The function exits when either the socket is closed (recv returns an empty bytestring), or when exit_event (a threading.Event) is set by the caller.

The caller can launch this function in a thread and occasionally try to read new items from the queue in a non-blocking way:

try:v=outq.get(block=False)print(v)exceptqueue.Empty:break

When all is done, the caller can set the exit Event to stop the thread (the thread will stop on its own if the socket it's reading from is closed, but the event lets us control this more directly).

Final words

There's no single fits-all solution for the task described in this post; I presented a bunch of recipes to handle the more commonly occurring situations, but it may be the case that specific use cases may not be addressed by them. Please let me know if you run into an interesting use case these recipes helped (or did not help!) resolve. Any other feedback is also welcome, as usual.


[1]Earlier version of Python would have to emulate this with a thread.
[2]One byte at a time in this sample, but could be easily changed to receive larger chunks.

Continuum Analytics News: Anaconda Expert to Discuss Data Science Best Practices at MIT CDOIQ Symposium

$
0
0
Tuesday, July 11, 2017

CAMBRIDGE, MA— July 11, 2017Continuum Analytics, the creator and driving force behind Anaconda, the leading Open Data Science platform powered by Python, today announced that Solutions Architect Zach Carwile will speak at the 2017 MIT Chief Data Officer and Information Quality (MIT CDOIQ) Symposium on July 13 at 4:30pm EST. As CDOs and IQ Professionals take their place as the central players in the business of data, the MIT CDOIQ Symposium brings the brightest industry minds together to discuss and advance the big data landscape.

In his session, titled “Data Science is Just the First Step…,” Carwile will explore how organizations can empower their data science teams to build enterprise-grade data products for the analysts who drive business processes. Carwile will also discuss the benefits of containerization for data science projects and explain how establishing a robust deployment process maximizes the value and reach of data science investments.

WHO: Zach Carwhile, solutions architect, Anaconda Powered By Continuum Analytics
WHAT:“Data Science is Just the First Step…”
WHEN: July 13, 4:30–5:10pm. EST
WHERE: Massachusetts Institute of Technology, Tang Building (Room E51), MIT East Campus, 70 Memorial Drive, Cambridge, MA, USA 02139
REGISTER:HERE

###

About Anaconda Powered by Continuum Analytics
Anaconda is the leading Open Data Science platform powered by Python, the fastest growing data science language with more than 13 million downloads to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with tools to identify patterns in data, uncover key insights and transform basic data into a goldmine of intelligence to solve the world’s most challenging problems. Anaconda puts superpowers into the hands of people who are changing the world. Learn more at continuum.io.

###

Media Contact:
Jill Rosenthal
InkHouse
anaconda@inkhouse.com


Shopkick Tech Blog: Unum pro multis: Using oauth2_proxy and nginx for web authentication

$
0
0

One of our jobs here in Shopkick’s infrastructure team is to save everyone time, without sacrificing security. During a recent hackathon, we decided it would be fun to replace some of our internal systems using http basic authentication with something a bit more - how shall I put it - from this decade. Fortunately, Bitly’s oauth2_proxy provides a fairly easy way to do just that, allowing us to leverage Google authentication with OAuth, which we were already using elsewhere.

We had a few issues to contend with in starting this.

  • We use nginx for SSL termination and reverse proxying our internal sites.
  • We want to make things more, not less, secure.
  • The transition had to be as seamless as possible.
  • We had to accommodate the WebSocket protocol for Jupyter notebook.
  • We had to be able to finish the work in a single 24-hour hackathon.

You only need two things to make this work: oauth2_proxy and nginx (1.5.4+ for auth_request support). Building a simple Go application like oauth2_proxy is fairly easy, but if you prefer to grab the precompiled binary you can find one on the GitHub releases page. Either way, all you need is the oauth2_proxy binary. In the examples here, we are running oauth2_proxy on the same hosts as nginx, but that is not a requirement.

The configuration file for oauth2_proxy is fairly simple:

email_domains =["shopkick.com"]
upstreams = ["http://<FOO>.shopkick.com"]
cookie_secret = "<REDACTEDCOOKIESECRET>"
cookie_secure = true
provider = "google"
client_id = “<REDACTED_CLIENT_ID>.apps.googleusercontent.com”

Certain configuration values are better left set as flags at runtime. Here are some that we set:

  • -set-xauthrequest - sets the X-Auth-Request-User and X-Auth-Request-Email headers which can be passed through by nginx
  • -client-secret - sets our client-secret at runtime, rather than in the config file
  • -authenticated-emails-file - sets the path to a newline-delimited list of external email addresses that are permitted to authenticate

Once you have your config in place, run oauth2_proxy like this:

$ oauth2_proxy -set-xauthrequest\-config<CONFIG_FILE>\-client-secret<REDACTED_SECRET>\-authenticated-emails-file<EMAIL_LIST_FILE>

The rest of your configuration is in nginx. As mentioned before, make sure you are using 1.5.4+ or something with auth_request support. Following are some example configurations.

Here is an example with WebSocket support, for Jupyter notebook in our case.

server {
    listen 0.0.0.0:443 ssl;
    server_name foo.shopkick.com;
    location /oauth2/auth {
        internal;
        proxy_pass http://127.0.0.1:4180;
    }
    location /oauth2/ {
        proxy_pass http://127.0.0.1:4180;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header X-Auth-Request-Redirect $request_uri;
    }
    location / {
        auth_request /oauth2/auth;
        error_page 401= https://foo.shopkick.com/oauth2/sign_in;
        proxy_pass http://foo:8080/;
        proxy_set_header Host $host;
        # Jupyter requires WebSockets, so we have to add these lines
        #in order for terminals and notebooks to function.
        proxy_http_version 1.1;
        proxy_set_header Upgrade "websocket";
        proxy_set_header Connection "Upgrade";
        proxy_read_timeout    86400;
    }
}

Example passing X-Forwarded-User and X-Email headers, based on the information provided by oauth2_proxy's -set-xauthrequest command-line flag.

server {
    listen  0.0.0.0:443 ssl;
    server_name bar.shopkick.com;
    proxy_headers_hash_max_size 2048;
    proxy_headers_hash_bucket_size 128;
    location / {
        auth_request /oauth2/auth;
        error_page 401= https://bar.shopkick.com/oauth2/sign_in;
        proxy_set_header Host $host;
        proxy_set_header HTTP_X_FORWARDED_SSL on;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        auth_request_set $user   $upstream_http_x_auth_request_user;
        auth_request_set $email  $upstream_http_x_auth_request_email;
        proxy_set_header X-Forwarded-User  $user;
        proxy_set_header X-Email $email;
        proxy_pass http://bar:80;
    }
    location /oauth2/auth {
        internal;
        proxy_pass http://127.0.0.1:4180;
    }
    location /oauth2/ {
        proxy_pass http://127.0.0.1:4180;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header X-Auth-Request-Redirect $request_uri;
    }
}

And finally, an example using a non-standard port for https.

server {
    listen  0.0.0.0:4443 ssl;
    server_name baz.shopkick.com;

    # Redirect to HTTPS when HTTP request comes in
    error_page 497 https://$host:4443$request_uri;
    location / {
        error_page 401= https://baz.shopkick.com/oauth2/sign_in;
        auth_request /oauth2/auth;
        proxy_pass http://baz:4443;# Note that here we are passing $http_host instead of $host.
        #$http_host contains the port information, which in this
        #case is critical for the redirect URL that google hands
        # back after a successful authentication.
        proxy_set_header Host $http_host;
    }
    location /oauth2/auth {
        internal;
        proxy_pass http://127.0.0.1:4180;
    }
    location /oauth2/ {
        proxy_pass http://127.0.0.1:4180;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header X-Auth-Request-Redirect $request_uri;
    }
}

While all of the above uses Google as the OAuth provider, everything here should work with any other supported provider: Azure, Facebook, Github, Gitlab, LinkedIn, or MyUSA. Just remember to set your provider flag or config setting.

Happy authenticating! And don't forget to subscribe!


Weekly Python Chat: Context Managers

$
0
0

Context managers allow programmers to reuse common setup/cleanup code in multiple places.

We'll talk about when context managers are useful, where you usually find them, and how to make your own.

Anwesha Das: How to upload a package in PyPI using twine ?

$
0
0

While I was working on my project on software licensing, I noticed that there are several packages in PyPI which do not have a requirements.txt file (the file mentioning the dependencies). License files, though not a rare species but surely an endangered one in the Python World. These made my life so difficult you lazy programmers, grrr).

“One has to solve her problem.” So whenever I used to meet any developer I used to tell them the best practices for developers. Some of them used to give me not always a good vibe. (oh you irritating lady :(). Most of them were saddened by the long list of best practices. (me (grrr, grrr, grr))

I wanted to upload my new project, gitcen in PyPI. I shared my this thought with my eternal rival. He gave me quite smile.
He thinking : "Lady now you will understand". I smiled back.
Me thinking : "ohh man I hate you".

The code was all set and working

I used both PyCharm and jupyter notebook to write code.

The main job is done, now I will only have to upload my project at PyPI, the Pyhton Package Index.

requirements.txt

I created a requirements.txt file. One has to add the external dependencies. It helps to create the the similar development environment. Instead of manually typing the dependencies one can use pip freeze which will show the all modules installed in that particular virtual environment.

$ pip freeze
$ pip freeze > requirements.txt

This is easy, (Lazy coders).

setup.py

Now is the time for setup.py. It is a Python file which helps us to get the package. In the file, we call the setup function. The parameter inside the function are the different metadata of the project. The setup.py file of gitcen looks like this:

from setuptools import setup

setup(  
    name="gitcen",
    version='0.1.0',
    description="A project to find git information.",
    long_description="A project to find git information about authors' commits,most active days and time.",
    author="Anwesha Das.",
    author_email="anwesha@das.community",
    url="https://github.com/anweshadas/gitcen",
    license="GPLv3+",
    py_modules=['gitcen'],
    install_requires=[
        'Click',
        'pygit2==0.24'
    ],
    entry_points='''
        [console_scripts]
        gitcen=gitcen:main
    ''',
    classifiers=(
        'Development Status :: 3 - Alpha',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)',
        'Programming Language :: Python :: 3.6'
    )
)

A little bite of history, previously there was package a named distutils which provided the setup function. But now we use the package setuptools.

It includes different information such as name, license, entry point etc. There is a parameter classifiers . Read more about it here.

This was lenghty but ok. May these coders are not that lazy (while coding, otherwise is. As per my experience with my man goes.)

Creating Source Distribution

The next was to create a compressed tarball for source distribution.

$python3 setup.py sdist

Output of the command is a tar.gz file.

Creating PyPI account

PyPI, the Python Package Index, the third party official software repository for Python ecosystem. One has to create an account in it.

test.pypi.org

If one wants to test the upload, test.pypi.org is the address to go to. Know more about the steps here.

When will this thing end?

Use twine

twine helps to upload packages in PyPI. It assembles the utilities to interact with PyPI. The first step was to install twine. I installed it from dnf package manager.

$ sudo dnf install python3-twine

I had to register my project. Using the following command

$ twine register dist/gitcen-0.1.0.tar.gz

Finally the moment had come! I uploaded gitcen in PyPI with the following command

 $ twine upload dist/gitcen-0.1.0.tar.gz

I had already put my authentication information in the ~/.pypirc

Now anyone can install gitcen from PyPI

$ pip install gitcen

Therefore going through this now I understand that it is very frustrating. You have a working code and you are not been able to share this with the community (easily, fast). If we give it a thought then we can realize that these are steps one has to memorize. As I was a first timer there were many new things to which I came across, so it made my life difficult. And yes,

Sorry coders, for not understanding your pain!

Mahmoud Hashemi: Plugin Systems

$
0
0

"What are plugins?" and other proceedings of the inaugural PyCon Comparative Plugin Systems BoF.

Within the programming world, and the Python ecosystem in particular, there are a lot of presumptions around plugins. Specifically, we take them for granted. "It's just a plugin.""Oh, another plugin library?"

So for PyCon 2017, I resolved to dismiss the dismissals by revisiting plugins, and it may have been the best programming decision I've made all year.

Why plugins?

For all types of software, open-source or otherwise, the scalability of development poses a problem long before scalability of performance and other technical challenges. Engaging more developers creates code contention and bugs. Too many cooks is all it takes to spoil the broth.

All growing projects need an API for code integration.

Call them plugins, modules, or extensions, from your browser to your kernel, they are the widely successful solution. Tellingly, the only thing wider than the success of plugin-based architecture is the variety of implementations.

Python's dynamic nature in particular seems to encourage inventiveness. The more the merrier, usually, but at some point we cloud a tricky space. How different could these plugin systems be? How wide is the range of functionalities, really? How does a developer choose the right plugin system for a given project? For that matter, what is a plugin system anyway? No one I talked to had clear answers.

So when PyCon 2017 rolled around, I knew exactly what I wanted to do: call together a team of developers to get to the bottom of the above, or at the very least, answer the question,

"What happens when you ask a dozen veteran Python programmers to spill their guts about plugins?"

Setting examples

Our group leapt into action by listing off plugin systems as fast as we could:

With our plate heaping with examples like these, we all felt ready to dig into our big questions.

Taxonomizing

For our first bit of analysis, we asked: What practical and fundamental attributes differentiate these approaches? If we had to create a taxonomy, what characteristics would we look for?

Generalizability

You'll notice our list of example plugin systems included several very specialized examples, from pylint to SQLAlchemy. Many projects even use totally internal plugin systems to achieve better factoring.

Bespoke plugin systems like pylint's are a valuable reference for anyone looking to account for patterns in their own system, especially generic systems like pike and stevedore.

Discovery

A plugin system's first job is locating the plugins to load. The split here is whether plugins are individually specified, or automatically discovered based on paths and patterns.

In either case, we need paths. Some systems provide search functionality, exchanging explicitness for convenience. This can be a good trade, especially when plugins number in the double digits, or whenever less technical users are concerned.

Install location

Closely related to discovery, our next differentiator was the degree to which the plugin system leveraged Python's own package management facilities. Some systems, like venusian, were designed to encourage pip install-ing plugins, searching for them in site-packages, alongside the application itself.

Other systems have their own search paths, locating plugins in the user directory and elsewhere on the filesystem. Still other systems are designed for plugins inside the application tree, as is the case with Django apps.

Plugin independence

One of the most challenging parts of plugin development is finding ways of independently reusing and testing code, while keeping in mind the code's role as an optional component of another application.

In some systems, like Django's, the tailoring is so tightly coupled that reusability doesn't make sense. But other approaches, like gather's, keeps plugin code independently usable.

Dependency registration

Almost all plugins work by providing some set of hooks which are findable and callable by the core. We found another differentiator in whether and how plugins could gain access to resources from the core, and even other plugins.

Not all systems support this, preferring to keep plugins as leaf participants in the application. Those simplistic setups hit limits fast. The next best, and most common, solution is to simply pass the whole core state at the time of hook invocation, providing plugins with the same access as the core. It works, but the API becomes the whole system state.

More advanced systems allow plugins to publish an inventory of dependencies, which the core then injects. Higher granularity enables lazier evaluation for a performance boost, and more explicit structure helps create a more maintainable application overall.

Drawing a line

With our group feeling like we were approaching the nature of things, we reversed direction, asking instead: What isn't a plugin system?

Establishing explicit boundaries and specific counterexamples proved instrumental to producing a final definition.

Is eval() a plugin system? We thought maybe, at first. But the more we thought about it, no, because the code itself was not sufficiently abstracted through a loading or namespacing system.

Is DNS a plugin system? It has names and namespaces galore. But no, because code is not being loaded in. Remote services in general are beyond the boundary of what a plugin can be. They exist out there, and we call out to them. They're plugins, not callouts.

A definition

So with our boundaries established, we were ready to offer a definition:

A plugin system is a software facility used by a running program to discover and load code, often containing hooks called by the host application

But, by this definition, isn't Python's built-in import functionality a plugin system? Mostly, yes! Python's import system is a plugin system.

  • For discovery it uses sys.path, various "site" directories and ".pth" files, and much more.
  • For installation, it uses site-packages, user .local directories, and more.
  • As far as independent reusability, virtually every module can be made its own entrypoint.
  • As for dependency registration, every module is tossed into sys.modules with the others, but also has access to import and sys, making roughly every module an equal partner in application state.

Python's import system is a powerful one, with a plugin system of its own. But finders, loaders, and import hooks aren't Python's plugin system. For that, you need to look to the site module.

Motivation

With our hour nearly up, all these proximate details still needed to be distilled into an ultimate motivation behind plugins. To this end, we returned to one of software engineering's fundamental principles: Separation of concerns.

We want to reason about our software. We want to know what state it is in. What we all want is the ability to say, "the core is ready, proceeding to load modules/extensions/plugins." We want to defer loading some code so that we can add extra instrumentation, checks, resiliency, and error messages to that loading process. If something misbehaves, we can do better than a stack trace and an ImportError.

Python's import system is a plugin system of sorts, but because we use it all the time, we've already used up most of the concern separation potential of import. Hence, all the creativity around plugin systems, seeking a balance between feeling native to Python, while not still successfully separating concerns.

In conclusion

So now we have achieved a complete view of the Python plugin system ecosystem, from motivation to manifestation.

By numbers alone, it may seem on the face like there are more than enough Python plugin solutions. But looking at the motivation and taxonomy above, it's clear that there are still several gaps waiting to be filled.

By taking a holistic look at the implementations and motivations, the PyCon 2017 Plugins Open Session ended with the conclusion that even Python's wide selection could use expansion.

So, until next year, go forth and continue to build! The future of well-factored code depends on it.1


  1. For additional reading, I recommend doing what we did after our discussion, finding and reading this post from Eli Bendersky. While it focuses more on specific implementations and less about generalized systems, Eli's post overlaps in many very reaffirming ways, much to our relief and gratification. The worked example of building ReStructured Text plugins is a perfect complement to the post above. 


Roberto Alsina: New mini-project: Gyro

$
0
0

History

Facubatista: ralsina, yo, vos, cerveza, un local-wiki-server-hecho-en-un-solo-.py-con-interfaz-web en tres horas, pensalo

Facubatista: ralsina, you, me, beer, a local-wiki-server-done-in-one-.py-with-web-interface in three hours, think about it

/images/gyro-1.thumbnail.png

The next day.

So, I could not get together with Facu, but I did sort of write it, and it's Gyro.[1]

Technical Details

Gyro has two parts: a very simple backend, implemented using Sanic[2] which does a few things:

  • Serve static files out of _static/
  • Serve templated markdown out of pages/
  • Save markdown to pages/
  • Keep an index of file contents updated in _static/index.js

The other part is a webpage, implemnted using Bootstrap [3] and JQuery [4]. That page can:

  • Show markdown, using Showdown [5]
  • Edit markdown, using SimpleMDE [6]
  • Search in your pages using Lunr [7]

And that's it. Open the site on any URL that doesn't start with _static and contains only letters and numbers:

  • http://localhost:8000/MyPage : GOOD
  • http://localhost:8000/MyDir/MyPage: BAD
  • http://localhost:8000/__foobar__: BAD

At first the page will be sort of empty, but if you edit it and save it, it won't be empty anymore. You can link to other pages (even ones you have not created) using the standard markdown syntax: [go to FooBar](FooBar)

There is really not much else to say about it, if you try it and find bugs, file an issue and as usual patches are welcome.


[1]Why Gyro? Gyros are delicious fast food. Wiki means quick. Also, I like Gyros. to check it out. So, since this was a toy project, why not?
[2]Why Sanic? Ever since Alejandro Lozanoff mentioned a flask-like framework done with the intention to be fast and async I wanted
[3]Why bootstrap? I know more or less what it does, and the resulting page is not totally horrible.
[4]Why JQuery? It's easy, small and I sort of know how it works.
[5]Why Showdown? It's sort of the standard to show markdown on the web.
[6]Why SimpleMDE? It looks and works awesome!
[7]Why Lunr? It works and is smaller than Tipue which is the only other similar thing I know.
Viewing all 23144 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>