The post 2018 NumFOCUS Board Selection Results appeared first on NumFOCUS.
NumFOCUS: 2018 NumFOCUS Board Selection Results
NumFOCUS: 2018 Board of Directors Selection Process Challenges
The post 2018 Board of Directors Selection Process Challenges appeared first on NumFOCUS.
Chris Warrick: Python Virtual Environments in Five Minutes
In Python, virtual environments are used to isolate projects from each other (if they require different versions of the same library, for example). They let you install and manage packages without administrative privileges, and without conflicting with the system package manager. They also allow to quickly create an environment somewhere else with the same dependencies.
Virtual environments are a crucial tool for any Python developer. And at that, a very simple tool to work with.
Let’s get started!
Install
There are two main tools used to create virtual environments:
- virtualenv has been the de facto standard tool for many years. It can be used with both Python 2 and 3, including very old versions of Python.
- venv (aka pyvenv) was added to the standard library in Python 3.3, and with the addition of ensurepip in 3.4, it’s an even easier way to get a virtual environment created.
virtualenv can be installed with your system package manager, or pip install --user virtualenv.
venv comes built-in with Python 3, although Debian/Ubuntu users will need to run sudo apt-get install python3-venv to make it work. [1]
Which one to use? It’s up to you. Both tools achieve the same goal in similar ways. And if one of them does not work, you can try the other and it might just work better.
(Terminology note: most of the time, the names of both tools are used interchargeably, “venv” was often used as an abbreviation for “virtualenv” before the stdlib tool was created)
Create
To create a virtual environment named env, use (depending on your tool of choice):
$ python3 -m virtualenv env
or
$ python3 -m venv env
Afterwards, you will end up with a folder named env that contains folders named bin (Scripts on Windows — contains executables, including python), lib (contains code), and include (contains C headers).
Both tools install pip and setuptools, but venv does not ship with wheel. In addition, the default versions tend to be more-or-less outdated. Let’s upgrade them real quick (first command is Unix, second is Windows): [2]
$ env/bin/python -m pip install --upgrade pip setuptools wheel > env\Scripts\python -m pip install --upgrade pip setuptools wheel
Where to store virtual environments?
While the tools allow you to put your virtual environments anywhere in the system, it is not a desirable thing to do. There are two options:
- Have one global place for them, like ~/virtualenvs.
- Store them in each project’s directory, like ~/git/foobar/.venv.
The first option comes with tools that make it easier, such as virtualenvwrapper. The second option is equally easy to work with, but comes with one caveat — you must add the venv directory to your .gitignore file, since you don’t want it in your repository (it’s binary bloat, and works only on your machine).
Use
There are three ways of working with virtual environments interactively (in a shell):
- activation (run source bin/activate on *nix; Scripts\activate on Windows) — it simplifies work and requires less typing, although it can sometimes fail to work properly.
- executing bin/python (Scripts\python) and other scripts directly, as activation only changes $PATH and some helper variables — those variables are not mandatory for operation, running the correct python is, and that method is failsafe.
- in subshells (IMO, it’s bad UX)
Whichever method you use, you must remember that without doing any of these things, you will still be working with the system Python.
For non-interactive work (eg. crontab entries, system services, etc.), activation and subshells are not viable solutions. In these cases, you must always use the full path to Python.
Here are some usage examples (paths can be relative, of course):
## *nix, activation ## $ source /path/to/env/bin/activate (env)$ pip install Django (env)$ deactivate ## *nix, manual execution ## $ /path/to/env/bin/pip install Django ## Windows, activation ## > C:\path\to\env\Scripts\activate (venv)> pip install Django (venv)> deactivate ## Windows, manual execution ## > C:\path\to\env\Scripts\pip install Django
The same principle applies to running Python itself, or any other script installed by a package. (With Django’s manage.py, calling it as ./manage.py requires activation, or you can run venv/bin/python manage.py.)
Moving/renaming/copying environments?
If you try to copy or rename a virtual environment, you will discover that the copied environment does not work. This is because a virtual environment is closely tied to both the Python it was created with, and the location it was created in. (The “relocatable” option does not work.) [3]
However, this is very easy to fix. Instead of moving/copying, just create a new environment in the new location. Then, run pip freeze > requirements.txt in the old environment to create a list of packages installed in it. With that, you can just run pip install -r requirements.txt in the new environment to install packages from the saved list. (Of course, you can copy requirements.txt between machines. In many cases, it will just work; sometimes, you might need a few modifications to requirements.txt to remove OS-specific stuff.)
$ oldenv/bin/pip freeze > requirements.txt $ python3 -m venv newenv $ newenv/bin/pip install -r requirements.txt (You may rm -rf oldenv now if you desire)
Note that it might also be necessary to re-create your virtual environment after a Python upgrade, [4] so it might be handy to keep an up-to-date requirements.txt for your virtual environments (for many projects, it makes sense to put that in the repository).
Frequently Asked Questions
Do I need to install the virtualenv tool for each Python I want to use it with?
In most cases, you can use virtualenv -p pythonX env to specify a different Python version, but with some Python version combinations, that approach might be unsuccessful.
I’m the only user on my system. Do I still need virtual environments?
Yes, you do. First, you will still need separation between projects, sooner or later. Moreover, if you were to install packages system-wide with pip, you might end up causing conflicts between packages installed by the system package manager and by pip. Running sudo pip is never a good idea because of this.
I’m using Docker. Do I still need virtual environments?
They are still a good idea in that case. They protect you against any bad system-wide Python packages your OS image might have (and one popular base OS is famous for those). They don’t introduce any extra overhead, while allowing to have a clean environment and the ability to re-create it outside of Docker (eg. for local development without Docker)
What about Pipenv?
Pipenv is a dependency management tool. It isn’t compatible with most workflows, and comes with many issues. In my opinion, it’s not worth using (Also, that thing about it being an officially recommended tool? Turns out it’s not true.)
I also wrote a blog post detailing concerns with that tool, titled Pipenv: promises a lot, delivers very little.
Footnotes
[1] | The thing you’re actually installing is ensurepip. In general, Debian isn’t exactly friendly with Python packaging. |
[2] | On Windows, you must run python -m pip instead of pip if you want to upgrade the package manager itself. |
[3] | All script shebangs contain the direct path to the environment’s Python executable. Many things in the virtual environment are symlinks that point to the original Python. |
[4] | Definitely after a minor version (3.x → 3.y) upgrade, sometimes after a patch version upgrade (3.x.y → 3.x.z) as well. |
NumFOCUS: NumFOCUS Sustainer Weeks
The post NumFOCUS Sustainer Weeks appeared first on NumFOCUS.
Continuum Analytics Blog: Python 3.7 Package Build Out & Miniconda Release
By Ray Donnelly & Crystal Soja We are pleased to announce that Python 3.7 packages for all supported platforms and packages of the Anaconda Distribution Repository (repo.anaconda.com) are now available. There are 865 packages built for Linux, 864 packages built for macOS, and 779 packages built for Windows. Python 3.7, released June 27, 2018, represents …
Read more →
The post Python 3.7 Package Build Out & Miniconda Release appeared first on Anaconda.
Matthew Rocklin: Dask Release 0.19.0
This work is supported by Anaconda Inc.
I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd. This blogpost outlines notable changes since the last release blogpost for 0.18.0 on June 14th.
You can conda install Dask:
conda install dask
or pip install from PyPI:
pip install dask[complete] --upgrade
Full changelogs are available here:
Notable Changes
A ton of work has happened over the past two months, but most of the changes are small and diffuse. Stability, feature parity with upstream libraries (like Numpy and Pandas), and performance have all significantly improved, but in ways that are difficult to condense into blogpost form.
That being said, here are a few of the more exciting changes in the new release.
Python Versions
We’ve dropped official support for Python 3.4 and added official support for Python 3.7.
Deploy on Hadoop Clusters
Over the past few months Jim Crist has bulit a suite of tools to deploy applications on YARN, the primary cluster manager used in Hadoop clusters.
- Conda-pack: packs up Conda environments for redistribution to distributed clusters, especially when Python or Conda may not be present.
- Skein: easily launches and manages YARN applications from non-JVM systems
- Dask-Yarn: a thin library around Skein to launch and manage Dask clusters
Jim has written about Skein and Dask-Yarn in two recent blogposts:
Implement Actors
Some advanced workloads want to directly manage and mutate state on workers. A task-based framework like Dask can be forced into this kind of workload using long-running-tasks, but it’s an uncomfortable experience.
To address this we’ve added an experimental Actors framework to Dask alongside the standard task-scheduling system. This provides reduced latencies, removes scheduling overhead, and provides the ability to directly mutate state on a worker, but loses niceties like resilience and diagnostics. The idea to adopt Actors was shamelessly stolen from the Ray Project :)
classCounter:def__init__(self):self.n=0defincrement(self):self.n+=1returnself.ncounter=client.submit(Counter,actor=True).result()>>>future=counter.increment()>>>future.result()1
You can read more about actors in the Actors documentation.
Dashboard improvements
The Dask dashboard is a critical tool to understand distributed performance. There are a few accessibility issues that trip up beginning users that we’ve addressed in this release.
Save task stream plots
You can now save a task stream record by wrapping a computation in the
get_task_stream
context manager.
fromdask.distributedimportClient,get_task_streamclient=Client(processes=False)importdaskdf=dask.datasets.timeseries()withget_task_stream(plot='save',filename='my-task-stream.html')asts:df.x.std().compute()
>>>ts.data[{'key':"('make-timeseries-edc372a35b317f328bf2bb5e636ae038', 0)",'nbytes':8175440,'startstops':[('compute',1535661384.2876947,1535661384.3366017)],'status':'OK','thread':139754603898624,'worker':'inproc://192.168.50.100/15417/2'},...
This gives you the start and stop time of every task on every worker done during that time. It also saves that data as an HTML file that you can share with others. This is very valuable for communicating performance issues within a team. I typically upload the HTML file as a gist and then share it with rawgit.com
$ gist my-task-stream.html
https://gist.github.com/f48a121bf03c869ec586a036296ece1a
Robust to different screen sizes
The Dashboard’s layout was designed to be used on a single screen, side-by-side with a Jupyter notebook. This is how many Dask developers operate when working on a laptop, however it is not how many users operate for one of two reasons:
- They are working in an office setting where they have several screens
- They are new to Dask and uncomfortable splitting their screen into two halves
In these cases the styling of the dashboard becomes odd. Fortunately, Luke Canavan and Derek Ludwig recently improved the CSS for the dashboard considerably, allowing it to switch between narrow and wide screens. Here is a snapshot.
Jupyter Lab Extension
You can now embed Dashboard panes directly within Jupyter Lab using the newly updated dask-labextension.
jupyter labextension install dask-labextension
This allows you to layout your own dashboard directly within JupyterLab. You
can combine plots from different pages, control their sizing, and so on. You
will need to provide the address of the dashboard server
(http://localhost:8787
by default on local machines) but after that
everything should persist between sessions. Now when I open up JupyterLab and
start up a Dask Client, I get this:
Thanks to Ian Rose for doing most of the work here.
Outreach
Dask Stories
People who use Dask have been writing about their experiences at Dask Stories. In the last couple months the following people have written about and contributed their experience:
- Civic Modelling at Sidewalk Labs by Brett Naul
- Genome Sequencing for Mosquitoes by Alistair Miles
- Lending and Banking at Full Spectrum by Hussain Sultan
- Detecting Cosmic Rays at IceCube by James Bourbeau
- Large Data Earth Science at Pangeo by Ryan Abernathey
- Hydrological Modelling at the National Center for Atmospheric Research by Joe Hamman
- Mobile Networks Modeling by Sameer Lalwani
- Satellite Imagery Processing at the Space Science and Engineering Center by David Hoese
These stories help people understand where Dask is and is not applicable, and provide useful context around how it gets used in practice. We welcome further contributions to this project. It’s very valuable to the broader community.
Dask Examples
The Dask-Examples repository maintains easy-to-run examples using Dask on a small machine, suitable for an entry-level laptop or for a small cloud instance. These are hosted on mybinder.org and are integrated into our documentation. A number of new examples have arisen recently, particularly in machine learning. We encourage people to try them out by clicking the link below.
Other Projects
The dask-image project was recently released. It includes a number of image processing routines around dask arrays.
This project is mostly maintained by John Kirkham.
Dask-ML saw a recent bugfix release
The TPOT library for automated machine learning recently published a new release that adds Dask support to parallelize their model training. More information is available on the TPOT documentation
Acknowledgements
Since June 14th, the following people have contributed to the following repositories:
The core Dask repository for parallel algorithms:
- Anderson Banihirwe
- Andre Thrill
- Aurélien Ponte
- Christoph Moehl
- Cloves Almeida
- Daniel Rothenberg
- Danilo Horta
- Davis Bennett
- Elliott Sales de Andrade
- Eric Bonfadini
- GPistre
- George Sakkis
- Guido Imperiale
- Hans Moritz Günther
- Henrique Ribeiro
- Hugo
- Irina Truong
- Itamar Turner-Trauring
- Jacob Tomlinson
- James Bourbeau
- Jan Margeta
- Javad
- Jeremy Chen
- Jim Crist
- Joe Hamman
- John Kirkham
- John Mrziglod
- Julia Signell
- Marco Rossi
- Mark Harfouche
- Martin Durant
- Matt Lee
- Matthew Rocklin
- Mike Neish
- Robert Sare
- Scott Sievert
- Stephan Hoyer
- Tobias de Jong
- Tom Augspurger
- WZY
- Yu Feng
- Yuval Langer
- minebogy
- nmiles2718
- rtobar
The dask/distributed repository for distributed computing:
- Anderson Banihirwe
- Aurélien Ponte
- Bartosz Marcinkowski
- Dave Hirschfeld
- Derek Ludwig
- Dror Birkman
- Guillaume EB
- Jacob Tomlinson
- Joe Hamman
- John Kirkham
- Loïc Estève
- Luke Canavan
- Marius van Niekerk
- Martin Durant
- Matt Nicolls
- Matthew Rocklin
- Mike DePalatis
- Olivier Grisel
- Phil Tooley
- Ray Bell
- Tom Augspurger
- Yu Feng
The dask/dask-examples repository for easy-to-run examples:
- Albert DeFusco
- Dan Vatterott
- Guillaume EB
- Matthew Rocklin
- Scott Sievert
- Tom Augspurger
- mholtzscher
"Menno's Musings": IMAPClient 2.1.0
IMAPClient 2.1.0 has just been released! Here's the main highlights:
- Python 3.7 is now officially suppported
- Testing against PyPy (version 2 and 3)
- Added support for the QUOTA extension
- Helper for locating special folders
- Document usage for using self-signed TLS certificates
- Document how to use the
email
package from the standard library to parse fetched emails - Handling NIL values for INTERNALDATE
As always, IMAPClient can be installed using pip (pip install imapclient
). Full documentation is available at Read the Docs.
Enjoy!
Real Python: Conditional Statements in Python
From the previous tutorials in this series, you now have quite a bit of Python code under your belt. Everything you have seen so far has consisted of sequential execution, in which statements are always performed one after the next, in exactly the order specified.
But the world is often more complicated than that. Frequently, a program needs to skip over some statements, execute a series of statements repetitively, or choose between alternate sets of statements to execute.
That is where control structures come in. A control structure directs the order of execution of the statements in a program (referred to as the program’s control flow).
Here’s what you’ll learn in this tutorial: You’ll encounter your first Python control structure, the if
statement.
In the real world, we commonly must evaluate information around us and then choose one course of action or another based on what we observe:
If the weather is nice, then I’ll mow the lawn. (It’s implied that if the weather isn’t nice, then I won’t mow the lawn.)
In a Python program, the if
statement is how you perform this sort of decision-making. It allows for conditional execution of a statement or group of statements based on the value of an expression.
The outline of this tutorial is as follows:
- First, you’ll get a quick overview of the
if
statement in its simplest form. - Next, using the
if
statement as a model, you’ll see why control structures require some mechanism for grouping statements together into compound statements or blocks. You’ll learn how this is done in Python. - Lastly, you’ll tie it all together and learn how to write complex decision-making code.
Ready? Here we go!
Introduction to the if
Statement
We’ll start by looking at the most basic type of if
statement. In its simplest form, it looks like this:
if<expr>:<statement>
In the form shown above:
<expr>
is an expression evaluated in Boolean context, as discussed in the section on Logical Operators in the Operators and Expressions in Python tutorial.<statement>
is a valid Python statement, which must be indented. (You will see why very soon.)
If <expr>
is true (evaluates to a value that is “truthy”), then <statement>
is executed. If <expr>
is false, then <statement>
is skipped over and not executed.
Note that the colon (:
) following <expr>
is required. Some programming languages require <expr>
to be enclosed in parentheses, but Python does not.
Here are several examples of this type of if
statement:
>>> x=0>>> y=5>>> ifx<y:# Truthy... print('yes')...yes>>> ify<x:# Falsy... print('yes')...>>> ifx:# Falsy... print('yes')...>>> ify:# Truthy... print('yes')...yes>>> ifxory:# Truthy... print('yes')...yes>>> ifxandy:# Falsy... print('yes')...>>> if'aul'in'grault':# Truthy... print('yes')...yes>>> if'quux'in['foo','bar','baz']:# Falsy... print('yes')...
Note: If you are trying these examples interactively in a REPL session, you’ll find that, when you hit Enter after typing in the print('yes')
statement, nothing happens.
Because this is a multiline statement, you need to hit Enter a second time to tell the interpreter that you’re finished with it. This extra newline is not necessary in code executed from a script file.
Grouping Statements: Indentation and Blocks
So far, so good.
But let’s say you want to evaluate a condition and then do more than one thing if it is true:
If the weather is nice, then I will:
- Mow the lawn
- Weed the garden
- Take the dog for a walk
In all the examples shown above, each if <expr>:
has been followed by only a single <statement>
. There needs to be some way to say “If <expr>
is true, do all of the following things.”
The usual approach taken by most programming languages is to define a syntactic device that groups multiple statements into one compound statement or block. A block is regarded syntactically as a single entity. When it is the target of an if
statement, and <expr>
is true, then all the statements in the block are executed. If <expr>
is false, then none of them are.
Virtually all programming languages provide the capability to define blocks, but they don’t all provide it in the same way. Let’s see how Python does it.
Python: It’s All About the Indentation
Python follows a convention known as the off-side rule, a term coined by British computer scientist Peter J. Landin. (The term is taken from the offside law in association football.) Languages that adhere to the off-side rule define blocks by indentation. Python is one of a relatively small set of off-side rule languages.
Recall from the previous tutorial on Python program structure that indentation has special significance in a Python program. Now you know why: indentation is used to define compound statements or blocks. In a Python program, contiguous statements that are indented to the same level are considered to be part of the same block.
Thus, a compound if
statement in Python looks like this:
1 if<expr>: 2 <statement> 3 <statement> 4 ... 5 <statement> 6 <following_statement>
Here, all the statements at the matching indentation level (lines 2 to 5) are considered part of the same block. The entire block is executed if <expr>
is true, or skipped over if <expr>
is false. Either way, execution proceeds with <following_statement>
(line 6) afterward.
Notice that there is no token that denotes the end of the block. Rather, the end of the block is indicated by a line that is indented less than the lines of the block itself.
Note: In the Python documentation, a group of statements defined by indentation is often referred to as a suite. This tutorial series uses the terms block and suite interchangeably.
Consider this script file foo.py
:
1 if'foo'in['bar','baz','qux']: 2 print('Expression was true') 3 print('Executing statement in suite') 4 print('...') 5 print('Done.') 6 print('After conditional')
Running foo.py
produces this output:
C:\Users\john\Documents>python foo.py
After conditional
The four print()
statements on lines 2 to 5 are indented to the same level as one another. They constitute the block that would be executed if the condition were true. But it is false, so all the statements in the block are skipped. After the end of the compound if
statement has been reached (whether the statements in the block on lines 2 to 5 are executed or not), execution proceeds to the first statement having a lesser indentation level: the print()
statement on line 6.
Blocks can be nested to arbitrary depth. Each indent defines a new block, and each outdent ends the preceding block. The resulting structure is straightforward, consistent, and intuitive.
Here is a more complicated script file called blocks.py
:
# Does line execute? Yes No# --- --if'foo'in['foo','bar','baz']:# xprint('Outer condition is true')# xif10>20:# xprint('Inner condition 1')# xprint('Between inner conditions')# xif10<20:# xprint('Inner condition 2')# xprint('End of outer condition')# xprint('After outer condition')# x
The output generated when this script is run is shown below:
C:\Users\john\Documents>python blocks.py
Outer condition is trueBetween inner conditionsInner condition 2End of outer conditionAfter outer condition
Note: In case you have been wondering, the off-side rule is the reason for the necessity of the extra newline when entering multiline statements in a REPL session. The interpreter otherwise has no way to know that the last statement of the block has been entered.
What Do Other Languages Do?
Perhaps you’re wondering what the alternatives are. How are blocks defined in languages that don’t adhere to the off-side rule?
The tactic used by most programming languages is to designate special tokens that mark the start and end of a block. For example, in Perl blocks are defined with pairs of curly braces ({}
) like this:
# (This is Perl, not Python)
if (<expr>) {
<statement>;
<statement>;
...
<statement>;
}
<following_statement>;
C/C++, Java, and a whole host of other languages use curly braces in this way.
Compound if Statement in C/C++, Perl, and JavaOther languages, such as Algol and Pascal, use keywords begin
and end
to enclose blocks.
Which Is Better?
Better is in the eye of the beholder. On the whole, programmers tend to feel rather strongly about how they do things. Debate about the merits of the off-side rule can run pretty hot.
On the plus side:
- Python’s use of indentation is clean, concise, and consistent.
- In programming languages that do not use the off-side rule, indentation of code is completely independent of block definition and code function. It’s possible to write code that is indented in a manner that does not actually match how the code executes, thus creating a mistaken impression when a person just glances at it. This sort of mistake is virtually impossible to make in Python.
- Use of indentation to define blocks forces you to maintain code formatting standards you probably should be using anyway.
On the negative side:
- Many programmers don’t like to be forced to do things a certain way. They tend to have strong opinions about what looks good and what doesn’t, and they don’t like to be shoehorned into a specific choice.
- Some editors insert a mix of space and tab characters to the left of indented lines, which makes it difficult for the Python interpreter to determine indentation levels. On the other hand, it is frequently possible to configure editors not to do this. It generally isn’t considered desirable to have a mix of tabs and spaces in source code anyhow, no matter the language.
Like it or not, if you’re programming in Python, you’re stuck with the off-side rule. All control structures in Python use it, as you will see in several future tutorials.
For what it’s worth, many programmers who have been used to languages with more traditional means of block definition have initially recoiled at Python’s way but have gotten comfortable with it and have even grown to prefer it.
The else
and elif
Clauses
Now you know how to use an if
statement to conditionally execute a single statement or a block of several statements. It’s time to find out what else you can do.
Sometimes, you want to evaluate a condition and take one path if it is true but specify an alternative path if it is not. This is accomplished with an else
clause:
if<expr>:<statement(s)>else:<statement(s)>
If <expr>
is true, the first suite is executed, and the second is skipped. If <expr>
is false, the first suite is skipped and the second is executed. Either way, execution then resumes after the second suite. Both suites are defined by indentation, as described above.
In this example, x
is less than 50
, so the first suite (lines 4 to 5) are executed, and the second suite (lines 7 to 8) are skipped:
1 >>> x=20 2 3 >>> ifx<50: 4 ... print('(first suite)') 5 ... print('x is small') 6 ... else: 7 ... print('(second suite)') 8 ... print('x is large') 9 ...10 (first suite)11 x is small
Here, on the other hand, x
is greater than 50
, so the first suite is passed over, and the second suite executed:
1 >>> x=120 2 >>> 3 >>> ifx<50: 4 ... print('(first suite)') 5 ... print('x is small') 6 ... else: 7 ... print('(second suite)') 8 ... print('x is large') 9 ...10 (second suite)11 x is large
There is also syntax for branching execution based on several alternatives. For this, use one or more elif
(short for else if) clauses. Python evaluates each <expr>
in turn and executes the suite corresponding to the first that is true. If none of the expressions are true, and an else
clause is specified, then its suite is executed:
if<expr>:<statement(s)>elif<expr>:<statement(s)>elif<expr>:<statement(s)>...else:<statement(s)>
An arbitrary number of elif
clauses can be specified. The else
clause is optional. If it is present, there can be only one, and it must be specified last:
>>> name='Joe'>>> ifname=='Fred':... print('Hello Fred')... elifname=='Xander':... print('Hello Xander')... elifname=='Joe':... print('Hello Joe')... elifname=='Arnold':... print('Hello Arnold')... else:... print("I don't know who you are!")...Hello Joe
At most, one of the code blocks specified will be executed. If an else
clause isn’t included, and all the conditions are false, then none of the blocks will be executed.
Note: Using a lengthy if
/elif
/else
series can be a little inelegant, especially when the actions are simple statements like print()
. In many cases, there may be a more Pythonic way to accomplish the same thing.
Here’s one possible alternative to the example above using the dict.get()
method:
>>> names={... 'Fred':'Hello Fred',... 'Xander':'Hello Xander',... 'Joe':'Hello Joe',... 'Arnold':'Hello Arnold'... }>>> print(names.get('Joe',"I don't know who you are!"))Hello Joe>>> print(names.get('Rick',"I don't know who you are!"))I don't know who you are!
Recall from the tutorial on Python dictionaries that the dict.get()
method searches a dictionary for the specified key and returns the associated value if it is found, or the given default value if it isn’t.
An if
statement with elif
clauses uses short-circuit evaluation, analogous to what you saw with the and
and or
operators. Once one of the expressions is found to be true and its block is executed, none of the remaining expressions are tested. This is demonstrated below:
>>> var# Not definedTraceback (most recent call last):
File "<pyshell#58>", line 1, in <module>varNameError: name 'var' is not defined>>> if'a'in'bar':... print('foo')... elif1/0:... print("This won't happen")... elifvar:... print("This won't either")...foo
The second expression contains a division by zero, and the third references an undefined variable var
. Either would raise an error, but neither is evaluated because the first condition specified is true.
One-Line if
Statements
It is customary to write if <expr>
on one line and <statement>
indented on the following line like this:
if<expr>:<statement>
But it is permissible to write an entire if
statement on one line. The following is functionally equivalent to the example above:
if<expr>:<statement>
There can even be more than one <statement>
on the same line, separated by semicolons:
if<expr>:<statement_1>;<statement_2>;...;<statement_n>
But what does this mean? There are two possible interpretations:
If
<expr>
is true, execute<statement_1>
.Then, execute
<statement_2> ... <statement_n>
unconditionally, irrespective of whether<expr>
is true or not.If
<expr>
is true, execute all of<statement_1> ... <statement_n>
. Otherwise, don’t execute any of them.
Python takes the latter interpretation. The semicolon separating the <statements>
has higher precedence than the colon following <expr>
—in computer lingo, the semicolon is said to bind more tightly than the colon. Thus, the <statements>
are treated as a suite, and either all of them are executed, or none of them are:
>>> if'f'in'foo':print('1');print('2');print('3')...123>>> if'z'in'foo':print('1');print('2');print('3')...
Multiple statements may be specified on the same line as an elif
or else
clause as well:
>>> x=2>>> ifx==1:print('foo');print('bar');print('baz')... elifx==2:print('qux');print('quux')... else:print('corge');print('grault')...quxquux>>> x=3>>> ifx==1:print('foo');print('bar');print('baz')... elifx==2:print('qux');print('quux')... else:print('corge');print('grault')...corgegrault
While all of this works, and the interpreter allows it, it is generally discouraged on the grounds that it leads to poor readability, particularly for complex if
statements. PEP 8 specifically recommends against it.
As usual, it is somewhat a matter of taste. Most people would find the following more visually appealing and easier to understand at first glance than the example above:
>>> x=3>>> ifx==1:... print('foo')... print('bar')... print('baz')... elifx==2:... print('qux')... print('quux')... else:... print('corge')... print('grault')...corgegrault
If an if
statement is simple enough, though, putting it all on one line may be reasonable. Something like this probably wouldn’t raise anyone’s hackles too much:
debugging=True# Set to True to turn debugging on....ifdebugging:print('About to call function foo()')foo()
Conditional Expressions
Python supports one additional decision-making entity called a conditional expression. (It is also referred to as a conditional operator or ternary operator in various places in the Python documentation.) Conditional expressions were proposed for addition to the language in PEP 308 and green-lighted by Guido in 2005.
In its simplest form, the syntax of the conditional expression is as follows:
<expr1>if<conditional_expr>else<expr2>
This is different from the if
statement forms listed above because it is not a control structure that directs the flow of program execution. It acts more like an operator that defines an expression. In the above example, <conditional_expr>
is evaluated first. If it is true, the expression evaluates to <expr1>
. If it is false, the expression evaluates to <expr2>
.
Notice the non-obvious order: the middle expression is evaluated first, and based on that result, one of the expressions on the ends is returned. Here are some examples that will hopefully help clarify:
>>> raining=False>>> print("Let's go to the",'beach'ifnotrainingelse'library')Let's go to the beach>>> raining=True>>> print("Let's go to the",'beach'ifnotrainingelse'library')Let's go to the library>>> age=12>>> s='minor'ifage<21else'adult'>>> s'minor'>>> 'yes'if('qux'in['foo','bar','baz'])else'no''no'
Note: Python’s conditional expression is similar to the <conditional_expr> ? <expr1> : <expr2>
syntax used by many other languages—C, Perl and Java to name a few. In fact, the ?:
operator is commonly called the ternary operator in those languages, which is probably the reason Python’s conditional expression is sometimes referred to as a ternary operator.
You can see in PEP 308 that the <conditional_expr> ? <expr1> : <expr2>
syntax was considered for Python but ultimately rejected in favor of the syntax shown above.
A common use of the conditional expression is to select variable assignment. For example, suppose you want to find the larger of two numbers. Of course, there is a built-in function max()
that does just this (and more) that you could use. But suppose you want to write your own code from scratch.
You could use a standard if
statement with an else
clause:
>>> ifa>b:... m=a... else:... m=b...
But a conditional expression is shorter and arguably more readable as well:
>>> m=aifa>belseb
Remember that the conditional expression behaves like an expression syntactically. It can be used as part of a longer expression. The conditional expression has lower precedence than virtually all the other operators, so parentheses are needed to group it by itself.
In the following example, the +
operator binds more tightly than the conditional expression, so 1 + x
and y + 2
are evaluated first, followed by the conditional expression. The parentheses in the second case are unnecessary and do not change the result:
>>> x=y=40>>> z=1+xifx>yelsey+2>>> z42>>> z=(1+x)ifx>yelse(y+2)>>> z42
If you want the conditional expression to be evaluated first, you need to surround it with grouping parentheses. In the next example, (x if x > y else y)
is evaluated first. The result is y
, which is 40
, so z
is assigned 1 + 40 + 2
= 43
:
>>> x=y=40>>> z=1+(xifx>yelsey)+2>>> z43
If you are using a conditional expression as part of a larger expression, it probably is a good idea to use grouping parentheses for clarification even if they are not needed.
Conditional expressions also use short-circuit evaluation like compound logical expressions. Portions of a conditional expression are not evaluated if they don’t need to be.
In the expression <expr1> if <conditional_expr> else <expr2>
:
- If
<conditional_expr>
is true,<expr1>
is returned and<expr2>
is not evaluated. - If
<conditional_expr>
is false,<expr2>
is returned and<expr1>
is not evaluated.
As before, you can verify this by using terms that would raise an error:
>>> 'foo'ifTrueelse1/0'foo'>>> 1/0ifFalseelse'bar''bar'
In both cases, the 1/0
terms are not evaluated, so no exception is raised.
Conditional expressions can also be chained together, as a sort of alternative if
/elif
/else
structure, as shown here:
>>> s=('foo'if(x==1)else... 'bar'if(x==2)else... 'baz'if(x==3)else... 'qux'if(x==4)else... 'quux'... )>>> s'baz'
It’s not clear that this has any significant advantage over the corresponding if
/elif
/else
statement, but it is syntactically correct Python.
The pass
Statement
Occasionally, you may find that you want to write what is called a code stub: a placeholder for where you will eventually put a block of code that you haven’t implemented yet.
In languages where token delimiters are used to define blocks, like the curly braces in Perl and C, empty delimiters can be used to define a code stub. For example, the following is legitimate Perl or C code:
# This is not Python
if (x)
{
}
Here, the empty curly braces define an empty block. Perl or C will evaluate the expression x
, and then even if it is true, quietly do nothing.
Because Python uses indentation instead of delimiters, it is not possible to specify an empty block. If you introduce an if
statement with if <expr>:
, something has to come after it, either on the same line or indented on the following line.
Consider this script foo.py
:
ifTrue:print('foo')
If you try to run foo.py
, you’ll get this:
C:\Users\john\Documents\Python\doc>python foo.py
File "foo.py", line 3 print('foo') ^IndentationError: expected an indented block
The pass
statement solves this problem in Python. It doesn’t change program behavior at all. It is used as a placeholder to keep the interpreter happy in any situation where a statement is syntactically required, but you don’t really want to do anything:
ifTrue:passprint('foo')
Now foo.py
runs without error:
C:\Users\john\Documents\Python\doc>python foo.py
foo
Conclusion
With the completion of this tutorial, you are beginning to write Python code that goes beyond simple sequential execution:
- You were introduced to the concept of control structures. These are compound statements that alter program control flow—the order of execution of program statements.
- You learned how to group individual statements together into a block or suite.
- You encountered your first control structure, the
if
statement, which makes it possible to conditionally execute a statement or block based on evaluation of program data.
All of these concepts are crucial to developing more complex Python code.
The next two tutorials will present two new control structures: the while
statement and the for
statement. These structures facilitate iteration, execution of a statement or block of statements repeatedly.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
PyCharm: PyCharm 2018.3 EAP 1
Development on PyCharm 2018.3 is currently in progress, but if you can’t wait for the release date, you can try our early access preview (EAP) builds from now on. The first one in this series is available for download on our website now.
New in This Version
Better Completion for SQL Aggregation Functions
Most Python applications will connect to a database at some point. In PyCharm Professional Edition you get schema-aware SQL completion — it doesn’t just complete SQL, but it also helps you navigate your database. In PyCharm 2018.3 we’ve made a small improvement: when calling an aggregation function that takes a number, the number columns are the first columns suggested by the IDE.
Take Files Directly from Another Branch
Although in an ideal world all commits are perfectly atomic, the real world is a little messier. Sometimes you made improvements to a certain file in a feature branch, and you’d then like to take that file (but only that file) from that branch while working in another branch. By choosing ‘compare with…’ that other branch in the branches popup (VCS | Git | Branches), selecting the file on the files tab, and choosing ‘Get from Branch’ you can now fix this situation with PyCharm.
Further Improvements
- Various small improvements have been made to our support for pipenv. For example, if you install a package using the PyCharm interface while you have the
Pipfile
open in the editor, you’ll now see thePipfile
be updated when the installation completes.
Interested?
Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.
If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.
PyCharm 2018.3 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.3 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.
All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.
Davy Wybiral: Puck.js: Javascript+Bluetooth===Awesome
Codementor: How to run and schedule Python scripts on Raspberry Pi
Codementor: Delete files in uploadcare — python
Python Engineering at Microsoft: Python in Visual Studio Code – August 2018 Release
We are pleased to announce that the August 2018 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the marketplace, or install it directly from the extension gallery in Visual Studio Code. You can learn more about Python support in Visual Studio Code in the VS Code documentation.
In this release we have closed a total of 38 issues including the stable release of our ptvsd 4 debugger, improvements to the language server preview, and other fixes.
Faster, more reliable debugging with ptvsd 4
In this release we are updating all users to the ptvsd 4.1.1 version of our Python debugger, providing a significant improvement to debugging performance and stability over the previous ptvsd 3.0 version. We originally announced an opt-in preview of ptvsd 4 in the February release of the Python extension, and have been continuing to improve on it based on user feedback. The new debug engine is built on top of the open source pydevd, which has allowed us to take advantage of its superior performance and support for third party libraries.
The new Python debugger supports the Logpoints feature added in the March iteration of VS Code. Logpoints allow you to essentially add print statements without having to stop execution. You can right-click on the margin and select "Add Logpoint...", and then type in your message:
Remote debugging is easier to use and improved; previously you had to install the exact version of ptvsd used in VS Code on the remote server, and you needed to modify your code to enable the debugger to attach.
Now you can install any 4.x version of ptvsd and can enable attach from the command line. To install ptvsd and start remote debugging from the command line:
pip install --upgrade ptvsd python3 -m ptvsd --host 1.2.3.4 --port 3000 -m myproject
Check out our updated remote debugging documentation for more information.
Once the server starts you can attach to it from VS Code. We are continuing to make improvements to the debugger, so stay tuned in our future releases.
Improvements to the Language Server Preview
In the July release of the Python extension we added a preview of the Microsoft Python Language Server, our Python analysis engine from Visual Studio hosted inside of VS Code. This allowed us to provide faster & richer completions including support for typeshed definitions. We have made the following improvements in this release:
- Language server now populates document outline with all symbols instead of just top-level ones. (#2050)
- Fixed issue in the language server when documentation for a function always produced "Documentation is still being calculated, please try again soon". (#2179)
- Fix null reference exception in the language server causing server initialization to fail. The exception happened when search paths contained a folder that did not exist. (#2017)
- Fixed language server issue when it could enter infinite loop reloading modules. (#2207)
- Language server now correctly handles
with
statement when__enter__
is declared in a base class. (#2240) - Fixed issue in the language server when typing dot under certain conditions produced null reference exception. (#2262)
- Language server now correctly merges data from typeshed and the Python library. (#2345)
- Code lenses now appear for unit tests when using the language server (#1948)
Various Fixes and Enhancements
We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. The full list of improvements is listed in our changelog, some notable improvements are:
- Ensure test count values in the status bar represent the correct number of tests that were discovered and run. (#2143)
- Ensure workspace
pipenv
environment is not labeled as avirtual env
. (#2223) - Fix
visualstudio_py_testLauncher
to stop breaking out of test discovery too soon. (#2241) - Fix error when switching from new language server to the old
Jedi
language server. (#2281) - Ensure stepping out of debugged code does not take user into
PTVSD
debugger code. (#767)
Be sure to download the Python extension for VS Code now to try out the above improvements. If you run into any issues be sure to file an issue on the Python VS Code GitHub page.
PyCharm: PyCharm 2018.2.3
PyCharm 2018.2.3 is now available, with some small improvements. You can download this version from our website.
New in This Version
- A number of improvements and fixes for the integrated Python console
- Fixes for the new Quick Documentation window which was new in Pycharm 2018.2
- Fixes for few false positives: install an already existing package intention, type checks
- And much more, including improvements from the IntelliJ platform, WebStorm, and DataGrip, see the release notes for details.
Download PyCharm 2018.2.3 from our website
Try the Next Version
If you’d like to already try some additional improvements, you can try the release candidate of PyCharm 2018.2.4. This is available on our confluence page.
New in the RC
- Various small pipenv improvements
- A bug in our pytest with fixtures support was fixed: previously, if
yield
statements were used in the fixture, PyCharm would assume that the return type of the function was aGenerator
. Now, the correct return type is inferred, preventing false positives. - And more, see the release notes
Download the RC from our confluence page
If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm RC versions, and stay up to date. You can find the installation instructions on our website.
The release candidate (RC) is not an early access program (EAP) build, and does not bundle an EAP license. If you get PyCharm Professional Edition RC, you will either need a currently active PyCharm subscription, or you will receive a 30-day free trial.
EuroPython: EuroPython 2018: Videos for Friday available
We are pleased to announce the third and last batch of cut videos from EuroPython 2018 in Edinburgh, Scotland, UK.
EuroPython 2018 YouTube Playlist
In the last batch, we have included all videos for Friday, July 27 2018, the third conference day.
In total, we now have more than 130 videos available for you to watch.
All EuroPython videos, including the ones from previous conferences, are available on our EuroPython YouTube Channel.
Enjoy,
–
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/
Mike Driscoll: Python 101: Episode #23 – Working with XML
Learn the basics of Python’s built-in XML modules, minidom and ElementTree. You can read the chapter this is based on here: http://python101.pythonlibrary.org/chapter23_xml.html or get the book from Leanpub: https://leanpub.com/python_101
Randy Zwitch: Mirroring an FTP Using lftp and cron
As my Developer Advocate role leads me to doing more and more Sysadmin/Data Engineer type of work, I continuously find myself looking for more efficient ways of copying data folders to where I need them. While there are a lot of great GUI ETL tools out there, for me the simplest and fastest way tends to be using linux utilities. Here’s how to mirror an FTP using lftp, with a cron repeater every five minutes.
Data are on an FTP, Need Further Processing
The problem I have is data that exists on a remote FTP, but are in a binary format that is incompatible with loading directly into MapD. My current plan is to use Python to convert the binary format into CSV, but with the data on a server that I don’t control, I need to make a copy somewhere else.
It’s also the case that the data are roughly 300GB per day, streaming in at various intervals across the day, so I need to make sure that any copying I do is thoughtful. Downloading 300GB of data per day is bad enough, doing it multiple times even worse!
lftp mirror
to the Rescue!
The best choice in my case seems to be copying the files onto a VM I own. lftp has an option mirror
to do just that. Here is the one-liner I’m using:
lftp -e"mirror -c -e --parallel=20 --verbose /pub/data/nccf/com/nwm/prod /nwmftp/prod;quit;" ftp.government.gov
lftp -e
: Execute command in quotes. In this case, the FTP allows anonymous access, so no user/pw arguments neededmirror
: Mirror command for lftp-c
: If download fails for whatever reason, keep trying (c = “continue”)-e
: Delete files on remote that are no longer on source (i.e. keep folders in perfect sync)--parallel
: Allow multiple connections for parallel downloading of multiple files--verbose
: Print lots of messages, helpful for debugging
With all of the flags in place, the last two arguments are the source (remote FTP) and destination (my VM) directories. Finally, I add a quit
statement to exit lftp
once the mirror process is over. This is mostly hygiene since I plan to run this on a cron scheduler and don’t want to leave the sessions open.
Run This Every Five Minutes, Forever
cron
really is one of the greatest timesavers ever, especially in that it allows super-repetitive work to be automated away, usually with a single line. Here is the line I added after calling crontab -e
on the command-line:
*/5 **** /home/username/pull_from_ftp.sh
Quite simply, “every 5 minutes, run pull_from_ftp.sh”. Creating pull_from_ftp.sh
is as straightforward as creating a text file:
#!/bin/bash
lftp -e"mirror -c -e --parallel=20 --verbose /pub/data/nccf/com/nwm/prod /nwmftp/prod;quit;" ftp.government.gov
That’s It? YES!
With just a few characters short of a full tweet, you can mirror an entire folder from an FTP, automatically. lftp
in combination with cran
helped me factor out hundreds of lines of pre-existing Python code, which not only removed untested, copy-pasted code from the workflow but also added parallel downloading, increasing data throughput.
Not bad for a couple of free Linux utilities :)
Guido van Rossum: So you want to learn Python?
Recently I got a review copy of "Hello World", and a colleague kindly lent me his copy of "Practical Programming". I think it's interesting to compare the two a bit, since they both claim to be teaching Python programming to people who haven't programmed before. And yet their audiences are totally different!
"Hello World", published by Manning, is written by Warren Sande and his son Carter. The subtitle is "Computer programming for kids and other beginners", but I think if you're not a kid any more you might get annoyed by the rather popular writing style. If you are a kid, well, you will probably enjoy a book written with you in mind, and you will learn plenty. The only prerequisites are reading and typing skills, a computer that wasn't built in the stone age, and a desire to learn more about what goes on inside that computer. The book uses short chapters with lots of illustrations, often cartoons and jokes. There are lots of opportunities to try out the material and learn that way. Each chapter ends with a review section, some tests, and more experiments to try. The book pays plenty of attention to typical "gotchas", so that if you get stuck at some point, there probably is help nearby to get you unstuck.
"Practical Programming" is written by Jennifer Campbell, Paul Gries, Jason Montojo, and Greg Wilson. This a team composed of three university professors and a former student of theirs. Their purported goal is to teach Computer Science (with Capital Letters), and Python is merely a teaching vehicle. But they spend about half of the book on Python itself, covering roughly the same material as any introduction to Python, including "Hello World". Their intended audience is clearly more mature than that of the Sandes, and I would think that Carter Sande and his friends would have a hard time staying focused on the material as presented by Campbell et al. -- their illustrations and diagrams are more functional but a lot less fun.
Both books present a number of projects and running examples. Again, the difference in audience makes it likely that if you love one, you'll hate the other, and vice versa. "Hello World" uses examples from computer games. The games are extremely simple though: modern computer games are some of the most complex system around, and you can't expect to approach them using PyGame and a couple hundred lines of Python. "Practical Programming" takes its example from scientific data processing with an environmental touch: for example, a numerical series is presented as whale sightings over the years and 2-dimensional data is taken from deforestation data. No doubt this is done in an attempt to appeal to a certain kind of student, though the number of potential applications is so large that some students might just as well be turned off by the specific set of choices.
In the end, "Hello World" will leave the reader with a fair amount of practical Python experience, enough to get them started on the long road to becoming a programmer if they are so inclined, or at least enough to give them some idea of what it is that programmers do. "Practical Programming" tries to go further: it presents some well-known algorithms (there's even a discussion of MergeSort), and it has introductory chapters on topics like object-oriented programming and databases. The overall focus is still on being able to use all this new knowledge in one's professional life, and I hesitate to agree with the authors' apparent view that it teaches "Computer Science". Calling it "Computer Use" would cover the contents better, I think, and that's more in line with the series title as well ("The Pragmatic Programmers", also the publisher).
So, how do you learn about Computer Science? Some would no doubt recommend "Structure and Interpretation of Computer Programs" by Abelson and Sussman here. (Someone sent me review copy of that book too.) But really, SICP (as it is often referred to) has its own agenda: convincing the reader that the most important thing computers can do is interpreting computer programs. This agenda has arguably caused the proliferation of Scheme implementations and indoctrinated many young minds with certain ideas about how to design and implement programming languages. But personally, I recommend you go straight to the source. After all these years, there is still no substitute for Knuth.
[UPDATE: fixed book titles as commenters pointed out my typos.]
Python Bytes: #94 Why don't you like notebooks?
Brett Cannon: Setting expectations for open source participation
(If you would prefer to watch a 30 minute talk on this same content, you can watch my Python US 2018 keynote)
Who am I? [1]
To start, I want to provide my bonafides so you know I'm not some crank spouting some armchair advice with nothing to back up their opinion other than having an internet connection.
For a day job I am the dev lead for the Python extension for Visual Studio Code. Originally the extension was an open source, personal project, but then Microsoft decided it would be a good idea to support the most popular extension for VS Code, so we hired its creator (Don Jayamanne), put me on the team, and then we re-launched the extension as an official open source project from Microsoft (and now we are a team of four developers and a product manager). So I participate in what I would call corporate open source for a living because even if we never received a single outside contribution ever again, the extension's development will continue thanks to Microsoft.
One day a week for a day job plus plenty of spare time I'm also a core developer on the Python programming language. I have had commit privileges to Python since April 18, 2003, so over 15 years at the time of writing this. So I also participate in community open source where if people who weren't paid to work on Python stopped contributing, the project would collapse (I'm considered extremely lucky to get a day/week of paid work time to spend on Python, so if it weren't for volunteers there would be no Python programming language).
I've also been lucky enough to have contributed a patch in some form to around 90 projects (depending on how you count). Now a lot of those are documentation changes where I probably fixed a spelling or grammar mistake, but the key point is that I have been exposed to a lot of open source teams.
So, to summarize: I contribute to and manage a corporate open source project for a living, I have been actively contributing to a major open source project for over 15 years, and I have contributed to about 90 other projects. I would like to think I know a little bit about how open source works. 😉 Hopefully that means what I am about to say comes from an informed place and you don't view me as a total crank.
What is the purpose of open source? [2]
When you first release open source code to the world, it's a code dump. With zero participation from others you're essentially setting the code free into the world, but it's very much a monologue.
But the instant you get that first contribution to your code, it becomes an open source project with an open source community of two (if you have users who are just providing feedback then it's still a community of users, it just isn't driven by open source yet). And once you get over that initial hurdle of your first external contribution, your project grows in purpose from being a way to share code with the world to being about two things:
- Collaborating on the maintenance of the project
- Having fun
Going forward, your open source project is about balancing those two things. If you're not collaborating with others then you're back to a code dump in which case you don't have to care about collaborating. And if you're not having fun then you will find yourself wanting to do something else, which means you will no longer want to participate. In both cases, though, bit rot can very easily set in. While participating in an open source project ebbs and flows can occur between these two goals, if you ever lack in one of them then the project will collapse.
How do you sustain open source? [3]
Now that we know that the purpose of an open source project and community is to sustain the project and have fun, how do you accomplish that? First, you need people.
Initially your project is going to be very small, so getting more people to help out is great simply because every little bit helps. But as the project grows you also need to attract new people in order to replace people who have left the project. People have children, suffer from burn-out, change jobs, or die. There are various reasons people stop contributing to open source, but it happens and so if you don't bring new people in then your project's participants will eventually dwindle down to zero.
But you also can't forget about those that already participate. Retaining folks must also be considered as people build up knowledge and unique skills for a project. If all of the preexisting participants left en masse then it would take awhile to bring new participants to the same level as those that left.
The trick is balancing both of these needs. For instance, existing participants probably like the way the current workflow works. But then new participants might want some newer approach that the project has not taken on yet. Do you keep things as-is to satisfy current participants or do you change things to try to attract new participants?
In case it wasn't obvious, making sure open source works is hard. 😉
So what's the goal (and why are we failing at it)? [4]
For me, the overall goal of open source is to attract and retain people to help maintain an open source project while enjoying the experience. I don't think this is a radical or controversial perspective of what the purpose of OSS is. We all want open source to be sustainable and that requires people who enjoy doing open source. If we don't have this then there simply won't be people willing to work on open source because the enjoyment has been sucked out of the experience for them.
Unfortunately we are failing at meeting this laudable goal of making open source sustainable through keeping it enjoyable to do so. An example I like to give is a tweet written by Cory Benfield when he was the maintainer of hyper-h2, urllib3, and requests:
First, I hope everyone who reads that feels somewhat disturbed by the fact that trying to do a nice thing by working on open source has led someone to becoming a worse person. That is not a reasonable thing to ask of anyone and is definitely not sustainable as it doesn't exactly make one happy to work on open source.
Look at it from the perspective of Cory's partner. By letting him spend time on open source she has ended up with a worse partner in life. How is that a reasonable trade-off? As a community we drove Cory into a position where his partner could very reasonably say "it's either open source or me", and I suspect Cory's partner would win (and rightfully so). It is simply not acceptable that Cory could even conceivably be put in that position due to him trying to do a nice thing by maintaining some open source project(s).
And this is not an isolated case. I had to take the entire month of October off in 2016 to prevent burn-out from open source. I actually now take a full month off annually from volunteering as an "open source detox" to fight off any potential burnout. I also take at least one day off every weekend from open source work to guarantee that I at least get one day a week that isn't impacted by open source.
So how have we ended up in this position where we are failing at meeting this reasonable goal of making open source sustainable by keeping it fun to do? Why are volunteers or people paid by someone other than ourselves actually disliking working on open source to the point of it making them worse people? Why do I have to take one day off a week from open source to make sure it doesn't ruin every day of my life? It all comes down to how people treat each other in open source.
Now I don't think everyone who treats someone else poorly does it on purpose. I think a large part of the problem is most people have not stopped and thought about what open source is and how that should guide how they interact with others involved in it. Open source is more than just some free source code; there are real people involved here and that simple fact cannot be ignored. Unfortunately, I think a decent amount of people do forget this by not thinking about two key details.
EVERYTHING in open source has a cost [5]
While open source, by definition, is monetarily free, that does not mean that the production of it is free. Even when someone volunteers their time, there is a cost of their time and effort.
Every time I spend work time on open source that's my employer choosing to donate my time. But it's also my choice to work for an employer that lets me put any effort into open source. If I'm not enjoying that work then that's unpleasurable effort which makes me either want to change my job or quit (just like any other job you don't enjoy doing).
And whenever I volunteer my time for open source, that is me choosing to divert my personal time from doing something else. Whether that's spending time with family, friends or doing something else that I find fun and entertaining, by spending my personal time on open source I am giving up time that I could have spent on other things. I am literally choosing open source over everything else in my life when I choose to work on it in my spare time.
And I only have so much time to give. Not only am I not spending time with my wife and cat when I contribute to open source in my spare time, but I'm choosing to give up a part of the finite time I have to be alive on top of it. While I haven't quite hit the midpoint of my expected lifespan, I will eventually die and so choosing to spend what personal time I have left being alive probably shouldn't be squandered doing something that I do not enjoy.
And this is why whenever anyone says something is "just" a quick fix is not taking into consideration what they are truly asking someone to give up for their benefit. The cost of asking someone to do something for you is disproportionate to the cost of what you're asking the other person to give up on your behalf. And this applies to everything, including email where the cost of you sending an email has a huge cost on the world's time as you're potentially asking hundreds of people to spend what little time they have to be alive to read what you have written. You have to always consider whether what you're asking of others is reasonable.
To reiterate, just because open source software is free for you doesn't mean someone else hasn't paid some price on your behalf to get you that code.
Everything in open source should be a series of kindnesses [6]
Since open source is paid for by others in effort and time, that would suggest that there really shouldn't be any expectations on your part from what you get out of an open source project that you choose to use. You could view everything in OSS as a kindness that someone has done for you and others. Putting things into that perspective takes away the feeling of expectation and entitlement. This frames open source participation as someone doing something nice for the community and project, as someone having done a kindness for you.
As soon as you start demanding or expecting something from open source you have stopped viewing it as it was intended, and that distortion can be poisonous. When I choose to donate my precious time to open source I do it voluntarily as a nice thing that I enjoy doing. I didn't do it because someone demanded it of me, and the instant I feel that my time is not appropriately appreciated as the gift that it is, I stop enjoying doing open source. And when someone stops enjoying their contributions to open source, they burn out and quit.
Taking the altruistic view of open source keeps things grounded and healthy. Viewing open source as a kindness someone else has done for you gives the appropriate perspective that this is something nice and no one has any expectations from it. It's like when I hold the door open for someone. Ultimately I don't expect anything in return (although a "thanks" is always appreciated). And the person passing through the door doesn't expect anything else from me either. But when someone in open source makes demands it's like the person passing through the door criticizing how I held the door open. It's pointless and simply leads to people no longer being willing to hold open doors for others.
To reiterate, you should view open source as a series of kind acts people have done altruistically. That means you cannot make any demands or hold any expectations of an open source project. Or as Evan Czaplicki, the creator of Elm, put it, "Remember that people do free work because it is fun, not to get stressed by strangers".
Typical scenarios in open source [7]
Knowing that open source always has a cost and should be viewed as a series of kindnesses, I want to go through some typical scenarios that occur in open source to point out how people accidentally stop viewing open source as kindnesses. I'm going to talk from the perspective of me as an OSS project maintainer, and that of Stuart, an external participant to a project (and who happens to be the person who gave me the idea of framing these examples in this fashion). I will signal who is doing a kindness for whom using arrows, e.g. "Stuart → Brett" signifies that Stuart did a favour for me.
Using open source [8]
Brett → Stuart
It can be like someone leaving out a pamphlet you find offensive.
When I release open source, I'm doing a favour for you. I obviously didn't have to release any source code that I produced for others to use for free, but I do it as a kindness for you so you can benefit from what I have already done (later on we will see how Stuart can give back by doing me a kindness by helping with the project).
For the longest time I didn't think giving away source code could really go badly. But then when a large open source project had a scandal break out about a tasteless joke made in their docs I realized that open source projects can be hurtful, discriminatory, etc. from something as simple as their documentation. Luckily this does not seem to be common, but when a project explicitly tries to exclude or is hurtful towards others then it's no longer being released as a kindness.
Providing feedback [9]
Stuart → Brett
It can be like someone telling you, "you're stupid".
Any piece of positive feedback is a wonderful kindness to receive. I'm always elated when someone tells me that they enjoy using Python, how it made things easier for them, etc. Positive feedback is always welcome (and unfortunately can be somewhat of a rarity).
There's also constructive feedback. That is also an appreciated kindness as that leads to improvement. The key here, though, is that the feedback must be constructive, not negative. If you don't have feedback that can help improve the project then your other option is to simply admit the project doesn't meet your needs.
But negative feedback is never necessary. If you called a project stupid it helps no one. It doesn't help me because I can't act on being called stupid. And it isn't a kindness because you could have simply said that my project wasn't helpful to you for whatever reason (e.g. didn't meet your needs, doesn't have the stability you require, etc.). Receiving negative feedback simply makes me come to regret spending time away from my loved ones to work on open source. If you don't understand why something is the way it is you can always ask why instead of making a declaration that something is stupid.
And people forget that someone put their time and effort into that project that is being called stupid (remember that open source always has a cost). For example, an article once reached the front page of Hacker News which started off by saying the author liked Python ... but then they proceeded to point out all the "stupid" and "dumb" parts of the language (most of which, by the way, were because of backwards-compatibility). As I read the article I was able to point out which "stupid" features were mine and which were done by friends of mine. Essentially the author wrote an entire article calling me and my friends stupid. I don't think the author meant to be insulting to me, but they were and so is everyone else who chooses to call something I put my time and effort into stupid or dumb or some other needlessly negative insult. (When confronted with the fact that he was being unnecessarily negative, the author claimed he didn't expect the article to be widely shared; the internet is public, so always assume the entire world will read what you post, especially by the people who will take the greatest offense.)
And this extends to bug reports and feature requests. It's a kindness when you report a bug so that I have a chance to try and fix it. But just because you reported a bug doesn't mean I specifically owe you anything. I should try to thank you, but there is no contract here that says I have to address your bug or feature request. My kindness was to provide my time and effort for the open source project up until now, but there's no expectation beyond that of my time and effort. Open source really should be viewed as self-service unless I choose to help further. If something doesn't work for you then you are free to take the code and modify it to meet your needs; that's a key tenant of open source. So the idea that any open source project owes anyone anything is a misunderstanding of what open source fundamentally is.
Submitting a contribution [10]
Stuart → Brett
It can be like someone trying to give you a puppy you didn't ask for or saying, "I'm cleaning up your mess".
When you submit a change to something, I'm sure your intentions are good. You have put in the effort to try and improve the project somehow and that's appreciated. But where this tends to go wrong is when people don't take into account the fact that your considerations for a change are often very different from mine. Think of a contribution like a puppy: you might view it as this cute, wonderful thing you're giving me while I'm looking at it as over a decade of feeding, walking, and vet bills.
Let's say you submit a contribution and I accept it. What that means is that I am now expected to maintain that code for a decade or more (Guido released Python to the world in February 1991 which predates Linux's release in August of that year). So while you might feel done with your contribution once it's accepted, for me it's the start of having to care and worry about it for potentially decades.
And then there's the impact on the Python community. People who identify as a Python software developer number well into the millions. Toss in people who use Python but don't identify themselves as software developers and we are probably talking about an 8-figure number. That means I have to consider the impact of your change on tens of millions of users and indirectly billions of people using Python software because it is guaranteed to break something when you are dealing with that sort of scale (the fact that Instagram's 1 billion users are using a Django web site means your change could prevent you and everyone else from seeing cute cat photos someday). And then there's the fact that your change may have just made literally tons of physical books in bookstores around the world obsolete; something else I have to consider.
Assuming your contribution makes sense, you also need to pay attention to how you communicate. It is not uncommon for someone to submit a change and point out how they are trying to "fix this broken/wrong thing". I don't think people necessarily intend to be mean, but realize that when you say something like that you are telling me I failed and you're trying to clean up my mess. The put-down of saying I did a bad job does not motivate me to want to review your change, nor does it motivate me to want to contribute further if my work is just going to be viewed as bad.
A better way to phrase it is to say you're trying to help the project out. As stated earlier, open source is about trying to come together to help maintain a project, so if you phrase things around that common goal then your contribution won't come off as hostile.
Lastly, realize that contributing is not a right. I am under no obligation to accept your or anyone else's contributions. We are all doing this open source thing to try and help each other out, but if I have to say "no" to you, that does not mean you are allowed to yell at me about blocking you from contributing. Remember, part of my work for the project that we are both trying to help out is to say "no" as necessary. I understand that getting to contribute to a project can be a very proud moment (I know I felt that when I got my first contribution into CPython), but please realize that yelling at me over it is not going to help get your contribution in and it will upset me and contribute to my burn-out (which means less people to look at your next contribution). Also realize that even as a core developer, I don't get all of my changes accepted into Python; sometimes I make changes that people disagree with and they end up being reverted.
Contribution feedback [11]
Brett → Stuart
It can be like someone saying, "you're doing it wrong" or reacting with "why don't you love me?!?"
When responding to a submitted contribution, maintainers should try to say "thank you". It's a simple acknowledgment of the time and effort someone put in to try and help out the project everyone involved in is attempting to keep going. It can be hard to have enough time to say it for every contribution, but it is a good goal to strive for.
Feedback for a contribution should also be constructive. Telling someone they are doing it wrong helps no one; it's just as bad as when someone says your project is stupid. The goal of working with someone on a contribution should be:
- To help them get their contribution to a place where you're comfortable in accepting it.
- Help them learn how to contribute again in the future more effectively.
- Pass on some knowledge to the contributor.
In other words the goal is to get the contribution in, make it easier for the contributor to contribute again in the future, and to teach someone something new.
Maintenance [12]
Brett ⇄ Guido
It can be like arguing about politics with friends.
While everything I have discussed so far is how maintainers of a project and everyone else can clash, issues also arise among maintainers themselves. I have actually come close to tears due to an argument I had over email with a fellow Python core developer. If anything it's actually harder to deal with hostility and negativity from fellow maintainers because not only are they typically very passionate about the project, but you are also most likely going to be working with them for quite a while so it can lead to stress whenever you end up interacting with them in a negative way.
Much like anyone else on the internet, maintainers should try to be self-aware enough to know when something has upset them so they can step away from the keyboard and gather their composure first. Open source is typically not a time-driven thing, so there's almost never a need to respond immediately to that email, tweet, etc. Waiting an hour or twenty-four before responding is never a bad thing and in general leads to a more level-headed response.
Scale & the abuse of maintainers [13]
I am well-aware that this blog post paints things as much worse for maintainers than external contributors and project participants, and that's because it is. Scale factors easily lead to this being a much bigger issue for maintainers than for those only casually contributing to or using a project.
Take Python for example. There are currently 91 people who have commit rights to the CPython repository. The last time I calculated the number of people who call themselves a "Python developer" I got 7 million. To have an easy number to work with that also includes every user of Python who doesn't call themselves a programmer (e.g. scientists probably typically don't label themselves a developer first and a scientist second), let's round up to 10 million. Now let's say that 1% of Python users are mean (which I suspect is way too low to be accurate; would you say that only 1 out of every 100 people at your workplace is mean?). That's 100,000 people I potentially have to contend with who are quite willing to be mean to me.
And we don't say this often enough, but maintainers get psychologically abused by people. Insulting me and a project that I have put 15 years of my life into is not exactly being done to help my mental health 😉. And it's very easy to end up in a situation where people abuse me because most interactions I have with people are either about a question they had, a problem they ran into, or a change they want to see happen; these are not interactions to talk about the weather outside or to thank me, but about something people want to see changed.
And the abuse does pile up. Some work has shown that if you don't have a 5:1 ratio of positive to negative interactions with your spouse when arguing you're heading towards divorce due to the mental toll arguments end up having on you. And from personal experience I would say that ratio is probably a little low when trying to counter-act a negative interaction with someone online (personally it feels more like 10:1). I actually on occasion go online to e.g. Twitter and say I just had a bad interaction knowing that it will lead to some nice comments in order to help cope with the negativity I'm struggling with.
Basically it's really unfortunate that negativity is driven by reactions while positivity is driven by reflection. I sometimes wish there was #ThankfulThursday or something, similar to #FlashbackFriday so there was something people could react to for thanking people for what they do instead of hoping someone stops and thinks about thanking others.
We all need to act nice towards one another
I think if everyone would just act nice towards one another with the proper perspective of what open source is and how it functions then most of the frustrations people have and the burn-out that occurs would drop significantly. As I said earlier, I have found that phrasing everything in open source as being a kindness really helps put things in the proper perspective. To make sure what I mean by a "kindness" is understood and why I think this is a healthy perspective to take, I wanted to clarify these points.
A kindness should not come with an expectation of something in return [14]
I used to say everyone should view open source as a series of favours. But as time went on I realized that favours easily end up with a connotation of expecting something in return. For instance, how often have you heard someone say, "If I do this favour for you, will you do this other thing for me in return?" This makes favours a pay-it-forward situation where you are doing something with the expectation of it being repaid later.
I switched to using kindnesses because being kind in the cultures I'm familiar with has no expectation of something in return. This does away with unhealthy expectations that people doing something for them has an expectation of something in return. It sets everyone's expectations for the kindness appropriately; that the person doing the kindness expects to be treated nicely (like any other interaction in the world), but otherwise there is no expectation of getting something in return for that kindness.
For instance, if I view sending a project a pull request as a kindness then I have no expectation of what will eventually happen to that pull request (e.g. accepted, rejected, languishing, etc.). It's reasonable to expect I will be treated nicely, but beyond that I have to go into the situation with the expectation that nothing will ever happen to my pull request and the only thing I got out of it was the experience of writing the code.
And the same goes the other way. If someone sends me a pull request and I ask for changes, my review was a kindness to the pull request creator to give them feedback to help them get their PR accepted and maybe teach them something. If they never come back and update their PR, then that's fine as I didn't do the review with an expectation that it would necessarily go anywhere.
Everyone should act like everything is being done as a kindness for them [15]
When you expand what acts should be treated as kindnesses to all acts, it has an interesting effect that no one has a reason to get angry or upset anymore when something doesn't happen. While you still have a right to expect to be treated nicely (as you should be as a reaction to any kindness done for you), once you let go of any presumption on your part about what will happen, things become a lot less stressful for everyone involved. And when the stress in a situation drops, it tends to actually lead to more motivation to help because the whole situation becomes much more pleasant for all involved.
Now when I point out that I think open source should be viewed this way, inevitably someone asks about how I would expect anyone to be motivated to work on open source. The thing is that if you're involved with open source long enough you end up noticing that people are quite happy to help because they want to help. Remember, open source is done by people giving something away for free because they choose to; you could say you're dealing with a bunch of digital hippies. 😄 In other words you don't need to worry about open source collapsing if everyone operated as if everything was just a kindness because plenty of people already operate like that. It's actually those who don't come in with that expectation who end up "poisoning the well" as it were and inevitably driving away those that would have happily volunteered their time and effort, leading to just those with the improper motivations left to keep a project running (which will inevitably fail due to no one being motivated to help out).
How should we act towards one another? [16]
In case I haven't made it clear enough, I think we should all act towards each other with kindness. If we were all open, considerate, and respectful towards one another (as the PSF Community Code of Conduct suggests), then I think open source would go much more smoothly:
- Open to people doing a kindness
- Considerate about the kindnesses being done
- Respectful of what eventually is done with your kindness
It's sort of a three-way handshake of kindness (much like TCP's three-way handshake). When you do a kindness you start the conversation, the recipient of your kindness should be open and considerate to that kindness in response, and then you should be respectful of what the recipient ultimately decides to do with your kindness as the acknowledgment.
Being rude/mean is also never needed. If you look at the purpose of yelling, being rude, or being mean you will notice that their purpose is to try and intimidate the person you are arguing with into being too uncomfortable or frustrated to continue. It has nothing to do with the points being made in the discussion but instead to try and be exclusionary without directly addressing the contents of the discussion. In other words being rude is using bullying tactics as a way of being intolerant of ideas you happen to not like and can't address directly and fairly.
If I ran the HR department of the internet [17]
It's one thing to accept that people should treat all of this as acts of kindness and be nice accordingly, but how do you help make sure you act nicely when communicating?
Assume you're asking me a favour
While this blog post has emphasized considering everything as a kindness, to help you phrase things for effective communication, assume you're asking me for a favour. For instance, if you have a bug you want fixed then assume when you file that bug that you are asking me to do you a favour and fix it. Since I'm under no obligation to fix the bug, treating the request as you asking a favour of me helps make sure you don't phrase the bug report as a demand for a fix.
Also assume you're asking your favour of me in-person. It's very easy to forget tone when writing text online, so pretending you're talking to the person at e.g. PyCon might help put you in the right frame of mind.
Assume your boss will read what you say
Make sure that whatever you write your boss and employer will be happy with it. When you do things for work in open source you are representing your company so it does matter how you communicate. I for one do pay attention to what companies people work for and how they treat others in open source (and from conversations with other project maintainers I am definitely not alone in doing this). I also have a pretty good memory so assume I won't forget which companies have an employee that has treated me poorly.
Assume your family will read what you say
Finally, assume your entire family will read what you say. Would your partner or parents approve of how you communicated? If you have kids, would you like them to pick up on how you communicate? Would your children approve of how you're communicating? If the answer is "no" then consider rephrasing how you are asking for something.
My thinking is that somehow you will care either how your boss, family, or I think of you and so you will stop and pay attention to how you are communicating.
Pay for open source with kindness [18]
In case the message in this post wasn't obvious, being nice to one another and realizing that open source is fundamentally self-service would lead to open source being more sustainable and pleasant to participate in. We shouldn't be putting the significant others of open source participants in a position where they have to have a chat about how open source is making their partner a worse person.
When I have talked about this publicly I commonly get two questions. First is what about blunt cultures? Should they be penalized by not inherently teaching members of that culture to be as kind/soft in delivery as others? To that I first say we should all give people the benefit of the doubt. So if someone comes online and discovers that they are more blunt than others expect, they should learn that fact in a kind manner and not a rude one (remember, responding to rudeness with more rudeness never solves the problem, it just aggravates it). The other thing I say is realize there's a difference between being blunt and being rude, e.g. saying "I don't think that's correct" is very different than saying "that's not correct"; one is a blunt opinion while another is someone claiming something is a fact when it isn't. While I would argue it'd make discourse among strangers easier by rephrasing things as questions for clarification, I wouldn't say bluntness is the same as rudeness.
The other question I get is about someone having to be the "tone police" and how do you prevent abuse? First, to get the point across as to why we need someone setting standards, I ask if the person would like it IF I ALWAYS TALKED LIKE THIS TO THEM, YOU BLOODY JERK! Doesn't exactly feel good, does it? 😉 So there is obviously some level of acceptability that we expect people to adhere to already.
As for the abuse worry, I think it's a red herring. Rudeness is never necessary to make your point so why do you care so much if you're told you can't be rude? For instance, do you really have to yell to make your point? Usually yelling is simply a way to try to bully the person you're arguing with, so it's simply an exclusionary tactic by trying to scare off the other person by talking louder than them or making them too uncomfortable to continue. With inclusion as a goal, allowing a communication style that provides no benefit but to exclude others simply does not need to be allowed. And if you object to rules from some principle you hold, then I'm sorry but then open source is probably not for you.
Participation in open source is a privilege, not a right. Some people forget this and make platitudes about free speech and such. The problem is people forget that in open source you can simply fork a project and go create your own community with your own participation requirements; you are never beholden to a project such that you can't ever leave. In other words this is not the equivalent of the government you live under restricting your ability to communicate since you can't "fork" your country and start a new one without a rebellion. You choose to be in a community, which means you choose to participate under their rules and expectations.
Going forward
I am hopeful that we can solve this problem of burn-out and making open source unwelcoming/uncomfortable for people as the solution isn't unknown. But I am worried that due to the social difficulties of getting people to change their general attitudes that fixing this problem is not easy. I'm even more worried for other open source communities, though, because if our great community is struggling with this I can only imagine how bad it is in other places. 😞
But we must start somewhere. So the next time you see someone being rude, kindly let them know that is how it's being perceived so they have a chance to change their behaviour. And when you interact with an open source project, realize that everyone is just trying to do kindnesses for others so don't go in with any set expectations or demands of others. If we could help others learn how to communicate better and not have unreasonable expectations then this grand social experiment we call open source will have a chance to continue to succeed.