Semaphore Community: Testing Python Applications with Pytest

July 20, 2017, 5:22 am

≫ Next: Dan Crosta: PyGotham Talk Voting is Open!

≪ Previous: PythonClub - A Brazilian collaborative blog about Python: Peewee - Um ORM Python minimalista

This article is brought with ❤ to you by Semaphore.

Introduction

Testing applications has become a standard skill set required for any competent developer today. The Python community embraces testing, and even the Python standard library has good inbuilt tools to support testing. In the larger Python ecosystem, there are a lot of testing tools. Pytest stands out among them due to its ease of use and its ability to handle increasingly complex testing needs.

This tutorial will demonstrate how to write tests for Python code with pytest, and how to utilize it to cater for a wide range of testing scenarios.

Prerequisites

This tutorial uses Python 3, and we will be working inside a virtualenv.
Fortunately for us, Python 3 has inbuilt support for creating virtual environments.
To create and activate a virtual environment for this project, let's run the following commands:

mkdir pytest_project
cd pytest_project
python3 -m venv pytest-env

This creates a virtual environment called pytest-env in our working directory.

To begin using the virtualenv, we need to activate it as follows:

source pytest-env/bin/activate

As long as the virtualenv is active, any packages we install will be installed in our virtual environment, rather than in the global Python installation.

To get started, let's install pytest in our virtualenv.

pip install pytest

Basic Pytest Usage

We will start with a simple test. Pytest expects our tests to be located in files whose names begin with test_ or end with _test.py. Let's create a file called test_capitalize.py, and inside it we will write a function called capital_case which should take a string as its argument, and should return a capitalized version of the string. We will also write a test, test_capital_case to ensure that the function does what it says. We prefix our test function names with test_, since this is what pytest expects our test functions to be named.

# test_capitalize.pydefcapital_case(x):returnx.capitalize()deftest_capital_case():assertcapital_case('semaphore')=='Semaphore'

The immediately noticeable thing is that pytest uses a plain assert statement, which is much easier to remember and use compared to the numerous assertSomething functions found in unittest.

To run the test, execute the pytest command:

pytest

We should see that our first test passes.

A keen reader will notice that our function could lead to a bug. It does not check the type of the argument to ensure that it is a string. Therefore, if we passed in a number as the argument to the function, it would raise an exception.

We would like to handle this case in our function by raising a custom exception with a friendly error message to the user.

Let's try to capture this in our test:

# test_capitalize.pyimportpytestdeftest_capital_case():assertcapital_case('semaphore')=='Semaphore'deftest_raises_exception_on_non_string_arguments():withpytest.raises(TypeError):capital_case(9)

The major addition here is the pytest.raises helper, which asserts that our function should raise a TypeError in case the argument passed is not a string.

Running the tests at this point should fail with the following error:

def capital_case(x):
>       return x.capitalize()
E       AttributeError: 'int' object has no attribute 'capitalize'

Since we've verified that we have not handled such a case, we can go ahead and fix it.

In our capital_case function, we should check that the argument passed is a string or a string subclass before calling the capitalize function. If it is not, we should raise a TypeError with a custom error message.

# test_capitalize.pydefcapital_case(x):ifnotisinstance(x,str):raiseTypeError('Please provide a string argument')returnx.capitalize()

When we rerun our tests, they should be passing once again.

Using Pytest Fixtures

In the following sections, we will explore some more advanced pytest features. To do this, we will need a small project to work with.

We will be writing a wallet application that enables its users to add or spend money in the wallet. It will be modeled as a class with two instance methods: spend_cash and add_cash.

We'll get started by writing our tests first. Create a file called test_wallet.py in the working directory, and add the following contents:

# test_wallet.pyimportpytestfromwalletimportWallet,InsufficientAmountdeftest_default_initial_amount():wallet=Wallet()assertwallet.balance==0deftest_setting_initial_amount():wallet=Wallet(100)assertwallet.balance==100deftest_wallet_add_cash():wallet=Wallet(10)wallet.add_cash(90)assertwallet.balance==100deftest_wallet_spend_cash():wallet=Wallet(20)wallet.spend_cash(10)assertwallet.balance==10deftest_wallet_spend_cash_raises_exception_on_insufficient_amount():wallet=Wallet()withpytest.raises(InsufficientAmount):wallet.spend_cash(100)

First things first, we import the Wallet class and the InsufficientAmount exception that we expect to raise when the user tries to spend more cash than they have in their wallet.

When we initialize the Wallet class, we expect it to have a default balance of 0. However, when we initialize the class with a value, that value should be set as the wallet's initial balance.

Moving on to the methods we plan to implement, we test that the add_cash method correctly increments the balance with the added amount. On the other hand, we are also ensuring that the spend_cash method reduces the balance by the spent amount, and that we can't spend more cash than we have in the wallet. If we try to do so, an InsufficientAmount exception should be raised.

Running the tests at this point should fail, since we have not created our Wallet class yet. We'll proceed with creating it. Create a file called wallet.py, and we will add our Wallet implementation in it. The file should look as follows:

# wallet.pyclassInsufficientAmount(Exception):passclassWallet(object):def__init__(self,initial_amount=0):self.balance=initial_amountdefspend_cash(self,amount):ifself.balance<amount:raiseInsufficientAmount('Not enough available to spend {}'.format(amount))self.balance-=amountdefadd_cash(self,amount):self.balance+=amount

First of all, we define our custom exception, InsufficientAmount, which will be raised when we try to spend more money than we have in the wallet. The Wallet class then follows. The constructor accepts an initial amount, which defaults to 0 if not provided. The initial amount is then set as the balance.

In the spend_cash method, we first check that we have a sufficient balance. If the balance is lower than the amount we intend to spend, we raise the InsufficientAmount exception with a friendly error message.

The implementation of add_cash then follows, which simply adds the provided amount to the current wallet balance.

Once we have this in place, we can rerun our tests, and they should be passing.

pytest -q test_wallet.py

.....
5 passed in 0.01 seconds

Refactoring our Tests with Fixtures

You may have noticed some repetition in the way we initialized the class in each test. This is where pytest fixtures come in. They help us set up some helper code that should run before any tests are executed, and are perfect for setting up resources that are needed by the tests.

Fixture functions are created by marking them with the @pytest.fixture decorator. Test functions that require fixtures should accept them as arguments. For example, for a test to receive a fixture called wallet, it should have an argument with the fixture name, i.e. wallet.

Let's see how this works in practice. We will refactor our previous tests to use test fixtures where appropriate.

# test_wallet.pyimportpytestfromwalletimportWallet,InsufficientAmount@pytest.fixturedefempty_wallet():'''Returns a Wallet instance with a zero balance'''returnWallet()@pytest.fixturedefwallet():'''Returns a Wallet instance with a balance of 20'''returnWallet(20)deftest_default_initial_amount(empty_wallet):assertempty_wallet.balance==0deftest_setting_initial_amount(wallet):assertwallet.balance==20deftest_wallet_add_cash(wallet):wallet.add_cash(80)assertwallet.balance==100deftest_wallet_spend_cash(wallet):wallet.spend_cash(10)assertwallet.balance==10deftest_wallet_spend_cash_raises_exception_on_insufficient_amount(empty_wallet):withpytest.raises(InsufficientAmount):empty_wallet.spend_cash(100)

In our refactored tests, we can see that we have reduced the amount of boilerplate code by making use of fixtures.

We define two fixture functions,wallet and empty_wallet, which will be responsible for initializing the Wallet class in tests where it is needed, with different values.

For the first test function, we make use of the empty_wallet fixture, which provided a wallet instance with a balance of 0 to the test.
The next three tests receive a wallet instance initialized with a balance of 20. Finally, the last test receives the empty_wallet fixture. The tests can then make use of the fixture as if it was created inside the test function, as in the tests we had before.

Rerun the tests to confirm that everything works.

Utilizing fixtures helps us de-duplicate our code. If you notice a case where a piece of code is used repeatedly in a number of tests, that might be a good candidate to use as a fixture.

Some Pointers on Test Fixtures

Here are some pointers on using test fixtures:

Each test is provided with a newly-initialized Wallet instance, and not one that has been used in another test.
It is a good practice to add docstrings for your fixtures. To see all the available fixtures, run the following command:

pytest --fixtures

This lists out some inbuilt pytest fixtures, as well as our custom fixtures. The docstrings will appear as the descriptions of the fixtures.

wallet
    Returns a Wallet instance with a balance of 20
empty_wallet
    Returns a Wallet instance with a zero balance

Parametrized Test Functions

Having tested the individual methods in the Wallet class, the next step we should take is to test various combinations of these methods. This is to answer questions such as "If I have an initial balance of 30, and spend 20, then add 100, and later on spend 50, how much should the balance be?"

As you can imagine, writing out those steps in the tests would be tedious, and pytest provides quite a delightful solution: Parametrized test functions

To capture a scenario like the one above, we can write a test:

# test_wallet.py@pytest.mark.parametrize("earned,spent,expected",[(30,10,20),(20,2,18),])deftest_transactions(earned,spent,expected):my_wallet=Wallet()my_wallet.add_cash(earned)my_wallet.spend_cash(spent)assertmy_wallet.balance==expected

This enables us to test different scenarios, all in one function. We make use of the @pytest.mark.parametrize decorator, where we can specify the names of the arguments that will be passed to the test function, and a list of arguments corresponding to the names.

The test function marked with the decorator will then be run once for each set of parameters.

For example, the test will be run the first time with the earned parameter set to 30, spent set to 10, and expected set to 20. The second time the test is run, the parameters will take the second set of arguments. We can then use these parameters in our test function.

This elegantly helps us capture the scenario:

My wallet initially has 0,
I add 30 units of cash to the wallet,
I spend 10 units of cash, and
I should have 20 units of cash remaining after the two transactions.

This is quite a succinct way to test different combinations of values without writing a lot of repeated code.

Combining Test Fixtures and Parametrized Test Functions

To make our tests less repetitive, we can go further and combine test fixtures and parametrize test functions. To demonstrate this, let's replace the wallet initialization code with a test fixture as we did before. The end result will be:

# test_wallet.py@pytest.fixturedefmy_wallet():'''Returns a Wallet instance with a zero balance'''returnWallet()@pytest.mark.parametrize("earned,spent,expected",[(30,10,20),(20,2,18),])deftest_transactions(my_wallet,earned,spent,expected):my_wallet.add_cash(earned)my_wallet.spend_cash(spent)assertmy_wallet.balance==expected

We will create a new fixture called my_wallet that is exactly the same as the empty_wallet fixture we used before. It returns a wallet instance with a balance of 0. To use both the fixture and the parametrized functions in the test, we include the fixture as the first argument, and the parameters as the rest of the arguments.

The transactions will then be performed on the wallet instance provided by the fixture.

You can try out this pattern further, e.g. with the wallet instance with a non-empty balance and with other different combinations of the earned and spent amounts.

Continuous Testing on Semaphore CI

Next, let's add continuous testing to our application using SemaphoreCI to ensure that we don't break our code when we make new changes.

Make sure you've committed everything on Git, and push your repository to GitHub or Bitbucket, which will enable Semaphore to fetch your code. Next, sign up for a free Semaphore account, if you don't have one already. Once you've confirmed your email, it's time to create a new project.

Follow these steps to add the project to Semaphore:

Once you're logged into Semaphore, navigate to your list of projects and click the "Add New Project" button:
Next, select the account where you wish to add the new project.
Select the repository that holds the code you'd like to build:
Select the branch you would like to build. The master branch is the default.
Configure your project as shown below:
Once your build has run, you should see a successful build that should look something like this:

In a few simple steps, we've set up continuous testing.

Summary

We hope that this article has given you a solid introduction to pytest, which is one of the most popular testing tools in the Python ecosystem. It's extremely easy to get started with using it, and it can handle most of what you need from a testing tool.

You can check out the complete code on GitHub.

Please reach out with any questions or feedback you may have in the comments section below.

This article is brought with ❤ to you by Semaphore.

↧

Dan Crosta: PyGotham Talk Voting is Open!

July 20, 2017, 7:00 am

≫ Next: Codementor: How to Deploy a Django App on Heroku Easily | Codementor

≪ Previous: Semaphore Community: Testing Python Applications with Pytest

For the first time, the PyGotham program committee is looking for you, our potential attendees, speakers, and community, to help us shape the conference by voting on the 195 talk proposals we've received. We're going to hold open voting until August 7th, after which the Program Committee will use the votes to inform our final selections for the conference.

How You Can Help

We want PyGotham to reflect the interests and desires of our community, so we ask that you share your time to help review the talk submissions. We're asking a single, simple question for each talk: would you like to see this talk in the final PyGotham schedule? You can give each talk either a +1 ("I would definitely like to see this talk"), a 0 ("I have no preference on this talk"), or a -1 ("I do not think this talk should be in PyGotham").

You can sign up for an account and begin voting at vote.pygotham.org. The talk review web site will present you with talks in random order, omitting the ones you have already voted on. For each talk, you will see this form:

+1/0/-1 voting form

Be sure to click "Save Vote" to make sure your vote is recorded. Once you do, a button will appear to jump to the next proposal.

Many thanks to Ned Jackson Lovely for sharing progcom, the US PyCon talk voting app, which we are using (with light adaptation) for PyGotham. Thanks Ned!

↧

Codementor: How to Deploy a Django App on Heroku Easily | Codementor

July 20, 2017, 7:53 am

≫ Next: PyCharm: PyCharm 2017.2 RC

≪ Previous: Dan Crosta: PyGotham Talk Voting is Open!

Heroku is a contained-based cloud platform for deploying, managing, and scalling applications. Even though there are other similar platforms, such as OpenShift by Red Hat, Windows Azure, Amazon Web Ser...

↧

PyCharm: PyCharm 2017.2 RC

July 20, 2017, 8:37 am

≫ Next: Reuven Lerner: Globbing and Python’s “subprocess” module

≪ Previous: Codementor: How to Deploy a Django App on Heroku Easily | Codementor

We’ve been putting the finishing touches on PyCharm 2017.2, and we have a release candidate ready! Go get it on our website

Fixes since the last EAP:

Docker Compose on Windows issues with malformatted environment variables
Various issues in Django project creation
Incorrect “Method may be static” inspection
AttributeError during package installation
And a couple more, see the release notes for details

As this is a release candidate, it does not come with a 30 day EAP license. If you don’t have a license for PyCharm Professional Edition you can use a trial license.

Even though this is not called an EAP version anymore, our EAP promotion still applies! If you find any issues in this version and report them on YouTrack, you can win prizes in our EAP competition.

To get all EAP builds as soon as we publish them, set your update channel to EAP (go to Help | Check for Updates, click the ‘Updates’ link, and then select ‘Early Access Program’ in the dropdown). If you’d like to keep all your JetBrains tools up to date, try JetBrains Toolbox!

-PyCharm Team
The Drive to Develop

↧

Reuven Lerner: Globbing and Python’s “subprocess” module

July 20, 2017, 12:12 pm

≫ Next: Brad Lucas: Python 3

≪ Previous: PyCharm: PyCharm 2017.2 RC

Python’s “subprocess” module makes it really easy to invoke an external program and grab its output. For example, you can say

import subprocess
print(subprocess.check_output('ls'))

and the output is then

$ ./blog.py
b'blog.py\nblog.py~\ndictslice.py\ndictslice.py~\nhexnums.txt\nnums.txt\npeanut-butter.jpg\nregexp\nshowfile.py\nsieve.py\ntest.py\ntestintern.py\n'

subprocess.check_output returns a bytestring with the filenames on my desktop. To deal with them in a more serious way, and to have the ASCII 10 characters actually function as newlines, I need to invoke the “decode” method, which results in a string:

output = subprocess.check_output('ls').decode('utf-8')
print(output)

This is great, until I want to pass one or more arguments to my “ls” command. My first attempt might look like this:

output = subprocess.check_output('ls -l').decode('utf-8')
print(output)

But I get the following output:

$ ./blog.py
Traceback (most recent call last):
 File "./blog.py", line 5, in <module>
 output = subprocess.check_output('ls -l').decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
 **kwargs).stdout
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 403, in run
 with Popen(*popenargs, **kwargs) as process:
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 707, in __init__
 restore_signals, start_new_session)
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1333, in _execute_child
 raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls -l'

The most important part of this error message is the final line, in which the system complains that I cannot find the program “ls -l”. That’s right — it thought that the command + option was a single program name, and failed to find that program.

Now, before you go and complain that this doesn’t make any sense, remember that filenames may contain space characters. And that there’s no difference between a “command” and any other file, except for the way that it’s interpreted by the operating system. It might be a bit weird to have a command whose name contains a space, but that’s a matter of convention, not technology.

Remember, though, that when a Python program is invoked, we can look at sys.argv, a list of the user’s arguments. Always, sys.argv[0] is the program’s name itself. We can thus see an analog here, in that when we invoke another program, we also need to pass that program’s name as the first element of a list, and the arguments as subsequent list elements.

In other words, we can do this:

output = subprocess.check_output(['ls', '-l']).decode('utf-8')
print(output)

and indeed, we get the following:

$ ./blog.py
total 88
-rwxr-xr-x 1 reuven 501 126 Jul 20 21:43 blog.py
-rwxr-xr-x 1 reuven 501 24 Jul 20 21:31 blog.py~
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py
-rwxr-xr-x 1 reuven 501 397 Jun 8 14:47 dictslice.py~
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py

So far, so good. Notice that check_output can thus get either a string or a list as its first argument. If we pass a list, we can pass additional arguments, as well:

output = subprocess.check_output(['ls', '-l', '-F']).decode('utf-8')
print(output)

As a result of adding the “-F’ flag, we now get a file-type indicator at the end of every filename:

$ ls -l -F
total 80
-rwxr-xr-x 1 reuven 501 137 Jul 20 21:44 blog.py*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py*
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp/
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py*

It’s at this point that we might naturally ask: What if I want to get a file listing of one of my Python programs? I can pass a filename as an argument, right? Of course:

output = subprocess.check_output(['ls', '-l', '-F', 'sieve.py']).decode('utf-8')
print(output)

And the output is:

-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*

Perfect!

Now, what if I want to list all of the Python programs in this directory? Given that this is a natural and everyday thing we do on the command line, I give it a shot:

output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')
print(output)

And the output is:

$ ./blog.py
ls: cannot access '*.py': No such file or directory
Traceback (most recent call last):
 File "./blog.py", line 5, in <module>
 output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
 **kwargs).stdout
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
 output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ls', '-l', '-F', '*.py']' returned non-zero exit status 2.

Oh, no! Python thought that I was trying to find the literal file named “*.py”, which clearly doesn’t exist.

It’s here that we discover that when Python connects to external programs, it does so on its own, without making use of the Unix shell’s expansion capabilities. Such expansion, which is often known as “globbing,” is available via the Python “glob” module in the standard library. We could use that to get a list of files, but it seems weird that when I invoke a command-line program, I can’t rely on it to expand the argument.

But wait: Maybe there is a way to do this! Many functions in the “subprocess” module, including check_output, have a “shell” parameter whose default value is “False”. But if I set it to “True”, then a Unix shell is invoked between Python and the command we’re running. The shell will surely expand our star, and let us list all of the Python programs in the current directory, right?

Let’s see:

output = subprocess.check_output(['ls', '-l', '-F', '*.py'], shell=True).decode('utf-8')
print(output)

And the results:

$ ./blog.py
blog.py
blog.py~
dictslice.py
dictslice.py~
hexnums.txt
nums.txt
peanut-butter.jpg
regexp
showfile.py
sieve.py
test.py
testintern.py

Hmm. We didn’t get an error. But we also didn’t get what we wanted. This is mighty strange.

The solution, it turns out, is to pass everything — command and arguments, including the *.py — as a single string, and not as a list. When you’re invoking commands with shell=True, you’re basically telling Python that the shell should break apart your arguments and expand them. If you pass a list to the shell, then the parsing is done the wrong number of times, and in the wrong places, and you get the sort of mess I showed above. And indeed, with shell=True and a string as the first argument, subprocess.check_output does the right thing:

output = subprocess.check_output('ls -l -F *.py', shell=True).decode('utf-8')
print(output)

And the output from our program is:

$ ./blog.py
-rwxr-xr-x 1 reuven 501 141 Jul 20 22:03 blog.py*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py*
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py*

The bottom line is that you can get globbing to work when invoking commands via subprocess.check_output. But you need to know what’s going on behind the scenes, and what shell=True does (and doesn’t) do, to make it work.

The post Globbing and Python’s “subprocess” module appeared first on Lerner Consulting Blog.

↧

Brad Lucas: Python 3

July 19, 2017, 9:00 pm

≫ Next: The Digital Cat: Refactoring with tests in Python: a practical example

≪ Previous: Reuven Lerner: Globbing and Python’s “subprocess” module

Installing

On a Mac here is an easy method to get Python 3 installed.

$ brew install python3

After this step you'll have your original Python 2.7 version as python and your new Python 3 version as python3.

Virtual Environments

I've been in the habit of creating a virtual environment in a directory called env under each project I'm working on. With 2.7 I was using virtualenv. With Python 3 installed this would be a conflict. Nicely, Python3 has it's own method for creating virtual environments through python itself. Simply pass -m venv followed by the name of the virtual environment and you are good to go.

$ python3 -m venv env

Even more nice is the results where after activating your environment you'll find that the mappings for pip3 and python3 are pip and python.

$ source env/bin/activate
(env) $ pip --version
pip 9.0.1 from /Users/brad/tmp/env/lib/python3.6/site-packages (python 3.6)
(env) $ python --version
Python 3.6.1

The Digital Cat: Refactoring with tests in Python: a practical example

July 21, 2017, 1:30 am

≫ Next: Sandipan Dey: SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)

≪ Previous: Brad Lucas: Python 3

This post contains a step-by-step example of a refactoring session guided by tests. When dealing with untested or legacy code refactoring is dangerous and tests can help us do it the right way, minimizing the amount of bugs we introduce, and possibly completely avoiding them.

Refactoring is not easy. It requires a double effort to understand code that others wrote, or that we wrote in the past, and moving around parts of it, simplifying it, in one word improving it, is by no means something for the faint-hearted. Like programming, refactoring has its rules and best practices, but it can be described as a mixture of technique, intuition, experience, risk.

Programming, after all, is craftsmanship.

The starting point

The simple use case I will use for this post is that of a service API that we can access, and that produces data in JSON format, namely a list of elements like the one shown here

{'age':20,'surname':'Frazier','name':'John','salary':'£28943'}

Once we convert this to a Python data structure we obtain a list of dictionaries, where 'age' is an integer, and the remaining fields are strings.

Someone then wrote a class that computes some statistics on the input data. This class, called DataStats, provides a single method stats(), whose inputs are the data returned by the service (in JSON format), and two integers called iage and isalary. Those, according to the short documentation of the class, are the initial age and the initial salary used to compute the average yearly increase of the salary on the whole dataset.

The code is the following

importmathimportjsonclassDataStats:defstats(self,data,iage,isalary):# iage and isalary are the starting age and salary used to# compute the average yearly increase of salary.# Compute average yearly increaseaverage_age_increase=math.floor(sum([e['age']foreindata])/len(data))-iageaverage_salary_increase=math.floor(sum([int(e['salary'][1:])foreindata])/len(data))-isalaryyearly_avg_increase=math.floor(average_salary_increase/average_age_increase)# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))max_salary=[eforeindataife['salary']==threshold]# Compute min salarysalaries=[int(d['salary'][1:])fordindata]min_salary=[eforeindataife['salary']=='£{}'.format(str(min(salaries)))]returnjson.dumps({'avg_age':math.floor(sum([e['age']foreindata])/len(data)),'avg_salary':math.floor(sum([int(e['salary'][1:])foreindata])/len(data)),'avg_yearly_increase':yearly_avg_increase,'max_salary':max_salary,'min_salary':min_salary})

The goal

It is fairly easy, even for the untrained eye, to spot some issues in the previous class. A list of the most striking ones is

The class exposes a single method and has no __init__(), thus the same functionality could be provided by a single function.
The stats() method is too big, and performs too many tasks. This makes debugging very difficult, as there is a single inextricable piece of code that does everything.
There is a lot of code duplication, or at least several lines that are very similar. Most notably the two operations '£' + str(max(salaries)) and '£{}'.format(str(min(salaries))), the two different lines starting with salaries =, and the several list comprehensions.

So, since we are going to use this code in some part of our Amazing New Project™, we want to possibly fix these issues.

The class, however, is working perfectly. It has been used in production for many years and there are no known bugs, so our operation has to be a refactoring, which means that we want to write something better, preserving the behaviour of the previous object.

The path

In this post I want to show you how you can safely refactor such a class using tests. This is different from TDD, but the two are closely related. The class we have has not been created using TDD, as there are no tests, but we can use tests to ensure its behaviour is preserved. This should therefore be called Test Driven Refactoring (TDR).

The idea behind TDR is pretty simple. First, we have to write a test that checks the behaviour of some code, possibly a small part with a clearly defined scope and output. This is a posthumous (or late) unit test, and it simulates what the author of the code should have provided (cough cough, it was you some months ago...).

Once you have you unit test you can go and modify the code, knowing that the behaviour of the resulting object will be the same of the previous one. As you can easily understand, the effectiveness of this methodology depends strongly on the quality of the tests themselves, possibly more than when developing with TDD, and this is why refactoring is hard.

Caveats

Two remarks before we start our first refactoring. The first is that such a class could easily be refactored to some functional code. As you will be able to infer from the final result there is no real reason to keep an object-oriented approach for this code. I decided to go that way, however, as it gave me the possibility to show a design pattern called wrapper, and the refactoring technique that leverages it.

The second remark is that in pure TDD it is strongly advised not to test internal methods, that is those methods that do not form the public API of the object. In general, we identify such methods in Python by prefixing their name with an underscore, and the reason not to test them is that TDD wants you to shape objects according to the object-oriented programming methodology, which considers objects as behaviours and not as structures. Thus, we are only interested in testing public methods.

It is also true, however, that sometimes even tough we do not want to make a method public, that method contains some complex logic that we want to test. So, in my opinion the TDD advice should sound like "Test internal methods only when they contain some non-trivial logic".

When it comes to refactoring, however, we are somehow deconstructing a previously existing structure, and usually we end up creating a lot of private methods to help extracting and generalising parts of the code. My advice in this case is to test those methods, as this gives you a higher degree of confidence in what you are doing. With experience you will then learn which tests are required and which are not.

Setup of the testing environment

Clone this repository and create a virtual environment. Activate it and install the required packages with

pip install -r requirements.txt

The repository already contains a configuration file for pytest and you should customise it to avoid entering your virtual environment directory. Go and fix the norecursedirs parameter in that file, adding the name of the virtual environment you just created; I usually name my virtual environments with a venv prefix, and this is why that variable contains the entry venv*.

At this point you should be able to run pytest -svv in the parent directory of the repository (the one that contains pytest.ini), and obtain a result similar to the following

==========================test session starts==========================
platform linux -- Python 3.5.3, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
cachedir: .cache
rootdir: datastats, inifile: pytest.ini
plugins: cov-2.5.1
collected 0items====================== no tests ran in 0.00 seconds======================

The given repository contains two branches. master is the one that you are into, and contains the initial setup, while develop points to the last step of the whole refactoring process. Every step of this post contains a reference to the commit that contains the changes introduced in that section.

Step 1 - Testing the endpoints

Commit: 27a1d8c

When you start refactoring a system, regardless of the size, you have to test the endpoints. This means that you consider the system as a black box (i.e. you do not know what is inside) and just check the external behaviour. In this case we can write a test that initialises the class and runs the stats() method with some test data, possibly real data, and checks the output. Obviously we will write the test with the actual output returned by the method, so this test is automatically passing.

Querying the server we get the following data

test_data=[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"},{"id":2,"name":"Mikayla","surname":"Henry","age":49,"salary":"£67137"},{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}]

and calling the stats() method with that output, with iage set to 20, and isalary set to 20000, we get the following JSON result

{'avg_age':62,'avg_salary':55165,'avg_yearly_increase':837,'max_salary':[{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}],'min_salary':[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"}]}

Caveat: I'm using a single very short set of real data, namely a list of 3 dictionaries. In a real case I would test the black box with many different use cases, to ensure I am not just checking some corner case.

The test is the following

importjsonfromdatastats.datastatsimportDataStatsdeftest_json():test_data=[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"},{"id":2,"name":"Mikayla","surname":"Henry","age":49,"salary":"£67137"},{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}]ds=DataStats()assertds.stats(test_data,20,20000)==json.dumps({'avg_age':62,'avg_salary':55165,'avg_yearly_increase':837,'max_salary':[{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}],'min_salary':[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"}]})

As said before, this test is obviously passing, having been artificially constructed from a real execution of the code.

Well, this test is very important! Now we know that if we change something inside the code, altering the behaviour of the class, at least one test will fail.

Step 2 - Getting rid of the JSON format

Commit: 65e2997

The method returns its output in JSON format, and looking at the class it is pretty evident that the conversion is done by json.dumps().

The structure of the code is the following

classDataStats:defstats(self,data,iage,isalary):[code_part_1]returnjson.dumps({[code_part_2]})

Where obviously code_part_2 depends on code_part_1. The first refactoring, then, will follow this procedure

We write a test called test__stats() for a _stats() method that is supposed to return the data as a Python structure. We can infer this latter manually from the JSON or running json.loads() from a Python shell. The test fails.
We duplicate the code of the stats() method that produces the data, putting it in the new _stats() method. The test passes.

classDataStats:def_stats(parameters):[code_part_1]return[code_part_2]defstats(self,data,iage,isalary):[code_part_1]returnjson.dumps({[code_part_2]})

We remove the duplicated code in stats() replacing it with a call to _stats()

classDataStats:def_stats(parameters):[code_part_1]return[code_part_2]defstats(self,data,iage,isalary):returnjson.dumps(self._stats(data,iage,isalary))

At this point we could refactor the initial test test_json() that we wrote, but this is an advanced consideration, and I'll leave it for some later notes.

So now the code of our class looks like this

classDataStats:def_stats(self,data,iage,isalary):# iage and isalary are the starting age and salary used to# compute the average yearly increase of salary.# Compute average yearly increaseaverage_age_increase=math.floor(sum([e['age']foreindata])/len(data))-iageaverage_salary_increase=math.floor(sum([int(e['salary'][1:])foreindata])/len(data))-isalaryyearly_avg_increase=math.floor(average_salary_increase/average_age_increase)# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))max_salary=[eforeindataife['salary']==threshold]# Compute min salarysalaries=[int(d['salary'][1:])fordindata]min_salary=[eforeindataife['salary']=='£{}'.format(str(min(salaries)))]return{'avg_age':math.floor(sum([e['age']foreindata])/len(data)),'avg_salary':math.floor(sum([int(e['salary'][1:])foreindata])/len(data)),'avg_yearly_increase':yearly_avg_increase,'max_salary':max_salary,'min_salary':min_salary}defstats(self,data,iage,isalary):returnjson.dumps(self._stats(data,iage,isalary))

and we have two tests that check the correctness of it.

Step 3 - Refactoring the tests

Commit: d619017

It is pretty clear that the test_data list of dictionaries is bound to be used in every test we will perform, so it is high time we moved that to a global variable. There is no point now in using a fixture, as the test data is just static data.

We could also move the output data to a global variable, but the upcoming tests are not using the whole output dictionary any more, so we can postpone the decision.

The test suite now looks like

importjsonfromdatastats.datastatsimportDataStatstest_data=[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"},{"id":2,"name":"Mikayla","surname":"Henry","age":49,"salary":"£67137"},{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}]deftest_json():ds=DataStats()assertds.stats(test_data,20,20000)==json.dumps({'avg_age':62,'avg_salary':55165,'avg_yearly_increase':837,'max_salary':[{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}],'min_salary':[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"}]})deftest__stats():ds=DataStats()assertds._stats(test_data,20,20000)=={'avg_age':62,'avg_salary':55165,'avg_yearly_increase':837,'max_salary':[{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}],'min_salary':[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"}]}

Step 4 - Isolate the average age algorithm

Commit: 9db1803

Isolating independent features is a key target of software design. Thus, our refactoring shall aim to disentangle the code dividing it into small separated functions.

The output dictionary contains five keys, and each of them corresponds to a value computed either on the fly (for avg_age and avg_salary) or by the method's code (for avg_yearly_increase, max_salary, and min_salary). We can start replacing the code that computes the value of each key with dedicated methods, trying to isolate the algorithms.

To isolate some code, the first thing to do is to duplicate it, putting it into a dedicated method. As we are refactoring with tests, the first thing is to write a test for this method.

deftest__avg_age():ds=DataStats()assertds._avg_age(test_data)==62

We know that the method's output shall be 62 as that is the value we have in the output data of the original stats() method. Please note that there is no need to pass iage and isalary as they are not used in the refactored code.

The test fails, so we can dutifully go and duplicate the code we use to compute 'avg_age'

def_avg_age(self,data):returnmath.floor(sum([e['age']foreindata])/len(data))

and once the test passes we can replace the duplicated code in _stats() with a call to _avg_age()

return{'avg_age':self._avg_age(data),'avg_salary':math.floor(sum([int(e['salary'][1:])foreindata])/len(data)),'avg_yearly_increase':yearly_avg_increase,'max_salary':max_salary,'min_salary':min_salary}

Checking after that that no test is failing. Well done! We isolated the first feature, and our refactoring produced already three tests.

Step 5 - Isolate the average salary algorithm

Commit: 4122201

The avg_salary key works exactly like the avg_age, with different code. Thus, the refactoring process is the same as before, and the result should be a new test__avg_salary() test

deftest__avg_salary():ds=DataStats()assertds._avg_salary(test_data)==55165

a new _avg_salary() method

def_avg_salary(self,data):returnmath.floor(sum([int(e['salary'][1:])foreindata])/len(data))

and a new version of the final return value

return{'avg_age':self._avg_age(data),'avg_salary':self._avg_salary(data),'avg_yearly_increase':yearly_avg_increase,'max_salary':max_salary,'min_salary':min_salary}

Step 6 - Isolate the average yearly increase algorithm

Commit: 4005145

The remaining three keys are computed with algorithms that, being longer than one line, couldn't be squeezed directly in the definition of the dictionary. The refactoring process, however, does not really change; as before, we first test a helper method, then we define it duplicating the code, and last we call the helper removing the code duplication.

For the average yearly increase of the salary we have a new test

deftest__avg_yearly_increase():ds=DataStats()assertds._avg_yearly_increase(test_data,20,20000)==837

a new method that passes the test

def_avg_yearly_increase(self,data,iage,isalary):# iage and isalary are the starting age and salary used to# compute the average yearly increase of salary.# Compute average yearly increaseaverage_age_increase=math.floor(sum([e['age']foreindata])/len(data))-iageaverage_salary_increase=math.floor(sum([int(e['salary'][1:])foreindata])/len(data))-isalaryreturnmath.floor(average_salary_increase/average_age_increase)

and a new version of the _stats() method

def_stats(self,data,iage,isalary):# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))max_salary=[eforeindataife['salary']==threshold]# Compute min salarysalaries=[int(d['salary'][1:])fordindata]min_salary=[eforeindataife['salary']=='£{}'.format(str(min(salaries)))]return{'avg_age':self._avg_age(data),'avg_salary':self._avg_salary(data),'avg_yearly_increase':self._avg_yearly_increase(data,iage,isalary),'max_salary':max_salary,'min_salary':min_salary}

Please note that we are not solving any code duplication but the ones that we introduce to refactor. The first achievement we should aim to is to completely isolate independent features.

Step 7 - Isolate max and min salary algorithms

Commit: 17b2413

When refactoring we shall always do one thing at a time, but for the sake of conciseness, I'll show here the result of two refactoring steps at once. I'll recommend the reader to perform them as independent steps, as I did when I wrote the code that I am posting below.

The new tests are

deftest__max_salary():ds=DataStats()assertds._max_salary(test_data)==[{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}]deftest__min_salary():ds=DataStats()assertds._min_salary(test_data)==[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"}]

The new methods in the DataStats class are

def_max_salary(self,data):# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))return[eforeindataife['salary']==threshold]def_min_salary(self,data):# Compute min salarysalaries=[int(d['salary'][1:])fordindata]return[eforeindataife['salary']=='£{}'.format(str(min(salaries)))]

and the _stats() method is now really tiny

def_stats(self,data,iage,isalary):return{'avg_age':self._avg_age(data),'avg_salary':self._avg_salary(data),'avg_yearly_increase':self._avg_yearly_increase(data,iage,isalary),'max_salary':self._max_salary(data),'min_salary':self._min_salary(data)}

Step 8 - Reducing code duplication

Commit: b559a5c

Now that we have the main tests in place we can start changing the code of the various helper methods. These are now small enough to allow us to change the code without further tests. While this can be true in this case, however, in general there is no definition of what "small enough" means, as there is no real definition of what "unit test" is. Generally speaking you should be confident that the change that you are doing is covered by the tests that you have. Weren't this the case, you'd better add one or more tests until you feel confident enough.

The two methods _max_salary() and _min_salary() share a great deal of code, even though the second one is more concise

def_max_salary(self,data):# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))return[eforeindataife['salary']==threshold]def_min_salary(self,data):# Compute min salarysalaries=[int(d['salary'][1:])fordindata]return[eforeindataife['salary']=='£{}'.format(str(min(salaries)))]

I'll start by making explicit the threshold variable in the second function. As soon as I change something, I'll run the tests to check that the external behaviour did not change.

def_max_salary(self,data):# Compute max salarysalaries=[int(e['salary'][1:])foreindata]threshold='£'+str(max(salaries))return[eforeindataife['salary']==threshold]def_min_salary(self,data):# Compute min salarysalaries=[int(d['salary'][1:])fordindata]threshold='£{}'.format(str(min(salaries)))return[eforeindataife['salary']==threshold]

Now, it is pretty evident that the two functions are the same but for the min() and max() functions. They still use different variable names and different code to format the threshold, so my first action is to even out them, copying the code of _min_salary() to _max_salary() and changing min() to max()

def_max_salary(self,data):# Compute max salarysalaries=[int(d['salary'][1:])fordindata]threshold='£{}'.format(str(max(salaries)))return[eforeindataife['salary']==threshold]def_min_salary(self,data):# Compute min salarysalaries=[int(d['salary'][1:])fordindata]threshold='£{}'.format(str(min(salaries)))return[eforeindataife['salary']==threshold]

Now I can create another helper called _select_salary() that duplicates that code and accepts a function, used instead of min() or max(). As I did before, first I duplicate the code, and then remove the duplication by calling the new function.

After some passages, the code looks like this

def_select_salary(self,data,func):salaries=[int(d['salary'][1:])fordindata]threshold='£{}'.format(str(func(salaries)))return[eforeindataife['salary']==threshold]def_max_salary(self,data):returnself._select_salary(data,max)def_min_salary(self,data):returnself._select_salary(data,min)

I noticed then a code duplication between _avg_salary() and _select_salary():

def_avg_salary(self,data):returnmath.floor(sum([int(e['salary'][1:])foreindata])/len(data))

def_select_salary(self,data,func):salaries=[int(d['salary'][1:])fordindata]

and decided to extract the common algorithm in a method called _salaries(). As before, I write the test first

deftest_salaries():ds=DataStats()assertds._salaries(test_data)==[27888,67137,70472]

then I implement the method

def_salaries(self,data):return[int(d['salary'][1:])fordindata]

and eventually I replace the duplicated code with a call to the new method

def_salaries(self,data):return[int(d['salary'][1:])fordindata]

def_select_salary(self,data,func):threshold='£{}'.format(str(func(self._salaries(data))))return[eforeindataife['salary']==threshold]

While doing this I noticed that _avg_yearly_increase() contains the same code, and fix it there as well.

def_avg_yearly_increase(self,data,iage,isalary):# iage and isalary are the starting age and salary used to# compute the average yearly increase of salary.# Compute average yearly increaseaverage_age_increase=math.floor(sum([e['age']foreindata])/len(data))-iageaverage_salary_increase=math.floor(sum(self._salaries(data))/len(data))-isalaryreturnmath.floor(average_salary_increase/average_age_increase)

It would be useful at this point to store the input data inside the class and to use it as self.data instead of passing it around to all the class's methods. This however would break the class's API, as currently DataStats is initialised without any data. Later I will show how to introduce changes that potentially break the API, and briefly discuss the issue. For the moment, however, I'll keep changing the class without modifying the external interface.

It looks like age has the same code duplication issues as salary, so with the same procedure I introduce the _ages() method and change the _avg_age() and _avg_yearly_increase() methods accordingly.

Speaking of _avg_yearly_increase(), the code of that method contains the code of the _avg_age() and _avg_salary() methods, so it is worth replacing it with two calls. As I am moving code between existing methods, I do not need further tests.

def_avg_yearly_increase(self,data,iage,isalary):# iage and isalary are the starting age and salary used to# compute the average yearly increase of salary.# Compute average yearly increaseaverage_age_increase=self._avg_age(data)-iageaverage_salary_increase=self._avg_salary(data)-isalaryreturnmath.floor(average_salary_increase/average_age_increase)

Step 9 - Advanced refactoring

Commit: cc0b0a1

The initial class didn't have any __init__() method, and was thus missing the encapsulation part of the object-oriented paradigm. There was no reason to keep the class, as the stats() method could have easily been extracted and provided as a plain function.

This is much more evident now that we refactored the method, because we have 10 methods that accept data as a parameter. I would be nice to load the input data into the class at instantiation time, and then access it as self.data. This would greatly improve the readability of the class, and also justify its existence.

If we introduce a __init__() method that requires a parameter, however, we will change the class's API, breaking the compatibility with every other code that imports and uses it. Since we want to keep it, we have to devise a way to provide both the advantages of a new, clean class and of a stable API. This is not always perfectly achievable, but in this case the Adapter design pattern (also known as Wrapper) can perfectly solve the issue.

The goal is to change the current class to match the new API, and then build a class that wraps the first one and provides the old API. The strategy is not that different from what we did previously, only this time we will deal with classes instead of methods. With a stupendous effort of my imagination I named the new class NewDataStats. Sorry, but sometimes you just have to get the job done.

The first things, as happens very often with refactoring, is to duplicate the code, and when we insert new code we need to have tests that justify it. The tests will be the same as before, as the new class shall provide the same functionalities as the previous one, so I just create a new file, called test_newdatastats.py and start putting there the first test test_init().

importjsonfromdatastats.datastatsimportNewDataStatstest_data=[{"id":1,"name":"Laith","surname":"Simmons","age":68,"salary":"£27888"},{"id":2,"name":"Mikayla","surname":"Henry","age":49,"salary":"£67137"},{"id":3,"name":"Garth","surname":"Fields","age":70,"salary":"£70472"}]deftest_init():ds=NewDataStats(test_data)assertds.data==test_data

This test doesn't pass, and the code that implements the class is very simple

classNewDataStats:def__init__(self,data):self.data=data

Now I can start an iterative process:

I will copy one of the tests of DataStats and adapt it to NewDataStats
I will copy come code from DataStats to NewDataStats, adapting it to the new API and making it pass the test.

At this point iteratively removing methods from DataStats and replacing them with a call to NewDataStats would be overkill. I'll show you in the next section why, and what we can do to avoid that.

An example of the resulting tests for NewDataStats is the following

deftest_ages():ds=NewDataStats(test_data)assertds._ages()==[68,49,70]

and the code that passes the test is

def_ages(self):return[d['age']fordinself.data]

Once finished, I noticed that, as now methods like _ages() do not require an input parameter any more, I can convert them to properties, changing the tests accordingly.

@propertydef_ages(self):return[d['age']fordinself.data]

It is time to replace the methods of DataStats with calls to NewDataStats. We could do it method by method, bu actually the only thing that we really need is to replace stats(). So the new code is

defstats(self,data,iage,isalary):nds=NewDataStats(data)returnnds.stats(iage,isalary)

And since all the other methods are no more used we can safely delete them, checking that the tests do not fail. Speaking of tests, removing method will make many tests of DataStats fail, so we need to remove them.

classDataStats:defstats(self,data,iage,isalary):nds=NewDataStats(data)returnnds.stats(iage,isalary)

Final words

I hope this little tour of a refactoring session didn't result too trivial, and helped you to grasp the basic concepts of this technique. If you are interested in the subject I'd strongly recommend the classic book by Martin Fowler "Refactoring: Improving the Design of Existing Code", which is a collection of refactoring patterns. The reference language is Java, but the concepts are easily adapted to Python.

Feedback

Feel free to use the blog Google+ page to comment the post. Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

↧

Sandipan Dey: SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)

July 21, 2017, 10:47 am

≫ Next: Catalin George Festila: The pyquery python module.

≪ Previous: The Digital Cat: Refactoring with tests in Python: a practical example

This appeared as a project in the edX course DelftX: MathMod1x Mathematical Modelling Basics and the project report can be found here. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Summary In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 is going to be simulated. … Continue reading

↧

Catalin George Festila: The pyquery python module.

July 22, 2017, 4:11 am

≫ Next: Catalin George Festila: Python Qt4 - part 001.

≪ Previous: Sandipan Dey: SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)

This tutorial is about pyquery python module and python 2.7.13 version.
First I used pip command to install it.

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyquery
Collecting pyquery
  Downloading pyquery-1.2.17-py2.py3-none-any.whl
Requirement already satisfied: lxml>=2.1 in c:\python27\lib\site-packages (from pyquery)
Requirement already satisfied: cssselect>0.7.9 in c:\python27\lib\site-packages (from pyquery)
Installing collected packages: pyquery
Successfully installed pyquery-1.2.17

I try to install with pip and python 3.4 version but I got errors.
The development team tells us about this python module:
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Let's try a simple example with this python module.
The base of this example is find links by html tag.

from pyquery import PyQuery

seeds = [
    'https://twitter.com',
    'http://google.com'
]

crawl_frontiers = []

def start_crawler():
    crawl_frontiers = crawler_seeds()

    print(crawl_frontiers)

def crawler_seeds():
    frontiers = []
    for index, seed in enumerate(seeds):
        frontier = {index: read_links(seed)}
        frontiers.append(frontier)

    return frontiers

def read_links(seed):
    crawler = PyQuery(seed)
    return [crawler(tag_a).attr("href") for tag_a in crawler("a")]

start_crawler()

The read_links function take links from seeds array.
To do that, I need to read the links and put in into another array crawl_frontiers.
The frontiers array is used just for crawler process.
Also this simple example allow you to understand better the arrays.
You can read more about this python module here .

↧

Catalin George Festila: Python Qt4 - part 001.

July 22, 2017, 4:11 am

≫ Next: Catalin George Festila: Make one executable from a python script.

≪ Previous: Catalin George Festila: The pyquery python module.

Today I started with PyQt4 and python version :

Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32

To install PyQt4 I used this link to take the executable named: PyQt4-4.11.4-gpl-Py2.7-Qt4.8.7-x32.exe.
The name of this executable shows us: can be used with python 2.7.x versions and come with Qt4.8.7 for our 32 bit python.
I start with a default Example class to make a calculator interface with PyQt4.
This is my example:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
from PyQt4 import QtGui

"""
Qt.Gui calculator example
"""

class Example(QtGui.QWidget):

    def __init__(self):
        super(Example, self).__init__()

        self.initUI()

    def initUI(self):
 title = QtGui.QLabel('Title')
        titleEdit = QtGui.QLineEdit()
        grid = QtGui.QGridLayout()
 grid.setSpacing(10)

 grid.addWidget(title, 0, 0)

 grid.addWidget(titleEdit,0,1,1,4)

        self.setLayout(grid)

        names = ['Cls', 'Bck', 'OFF',
                 '/', '.', '7', '8',
                '9', '*', 'SQR', '3',
                 '4', '5', '-', '=',
                '0', '1', '2', '+']

        positions = [(i,j) for i in range(1,5) for j in range(0,5)]

        for position, name in zip(positions, names):

            if name == '':
                continue
            button = QtGui.QPushButton(name)
            grid.addWidget(button, *position)

        self.move(300, 250)
        self.setWindowTitle('Calculator')
        self.show()

def main():
    app = QtGui.QApplication(sys.argv)
    ex = Example()
    sys.exit(app.exec_())

if __name__ == '__main__':
    main()

The example is simple.
First you need a QGridLayout - this make a matrix.
I used labels, line edit and buttons all from QtGui: QLabel, QLineEdit and QPushButton.
First into this matrix - named grid is: Title and edit area named titleEdit.
This two is added to the grid - matrix with addWidget.
The next step is to put all the buttons into one array.
This array will be add to the grid matrix with a for loop.
To make this add from array to matrix I used the zip function.
The zip function make an iterator that aggregates elements from each of the iterables.
Also I set the title to Calculator with setWindowTitle.
I have not implemented the part of the events and the calculation.
The main function will start the interface by using the QApplication.
The goal of this tutorial was the realization of the graphical interface with PyQt4.
This is the result of my example:

↧

Catalin George Festila: Make one executable from a python script.

July 22, 2017, 4:12 am

≫ Next: Catalin George Festila: About py-translate python module.

≪ Previous: Catalin George Festila: Python Qt4 - part 001.

The official website of this tool tells us:
PyInstaller bundles a Python application and all its dependencies into a single package. The user can run the packaged app without installing a Python interpreter or any modules. PyInstaller supports Python 2.7 and Python 3.3+, and correctly bundles the major Python packages such as numpy, PyQt, Django, wxPython, and others.

PyInstaller is tested against Windows, Mac OS X, and Linux. However, it is not a cross-compiler: to make a Windows app you run PyInstaller in Windows; to make a Linux app you run it in Linux, etc. PyInstaller has been used successfully with AIX, Solaris, and FreeBSD, but is not tested against them.
The manual of this tool can be see it here.

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyinstaller
Collecting pyinstaller
  Downloading PyInstaller-3.2.1.tar.bz2 (2.4MB)
    100% |################################| 2.4MB 453kB/s
....
Collecting pypiwin32 (from pyinstaller)
  Downloading pypiwin32-219-cp27-none-win32.whl (6.7MB)
    100% |################################| 6.7MB 175kB/s
...
Successfully installed pyinstaller-3.2.1 pypiwin32-219

Also this will install PyWin32 python module.
Let's make one test python script and then to make it executable.
I used this python script to test it:

from tkinter import Tk, Label, Button

class MyFirstGUI:
    def __init__(self, master):
        self.master = master
        master.title("A simple GUI")

        self.label = Label(master, text="This is our first GUI!")
        self.label.pack()

        self.greet_button = Button(master, text="Greet", command=self.greet)
        self.greet_button.pack()

        self.close_button = Button(master, text="Close", command=master.quit)
        self.close_button.pack()

    def greet(self):
        print("Greetings!")

root = Tk()
my_gui = MyFirstGUI(root)
root.mainloop()

The output of the command of pyinstaller:

C:\Python27\Scripts>pyinstaller.exe   --onefile --windowed ..\tk_app.py
92 INFO: PyInstaller: 3.2.1
92 INFO: Python: 2.7.13
93 INFO: Platform: Windows-10-10.0.14393
93 INFO: wrote C:\Python27\Scripts\tk_app.spec
95 INFO: UPX is not available.
96 INFO: Extending PYTHONPATH with paths
['C:\\Python27', 'C:\\Python27\\Scripts']
96 INFO: checking Analysis
135 INFO: checking PYZ
151 INFO: checking PKG
151 INFO: Building because toc changed
151 INFO: Building PKG (CArchive) out00-PKG.pkg
213 INFO: Redirecting Microsoft.VC90.CRT version (9, 0, 21022, 8) -> (9, 0, 30729, 9247)
2120 INFO: Building PKG (CArchive) out00-PKG.pkg completed successfully.
2251 INFO: Bootloader c:\python27\lib\site-packages\PyInstaller\bootloader\Windows-32bit\runw.exe
2251 INFO: checking EXE
2251 INFO: Rebuilding out00-EXE.toc because tk_app.exe missing
2251 INFO: Building EXE from out00-EXE.toc
2267 INFO: Appending archive to EXE C:\Python27\Scripts\dist\tk_app.exe
2267 INFO: Building EXE from out00-EXE.toc completed successfully.

Then I run the executable output:

C:\Python27\Scripts>C:\Python27\Scripts\dist\tk_app.exe

C:\Python27\Scripts>

...and working well.

The output file come with this icon:

Also you can make changes by using your icons or set the type of this file, according to VS_FIXEDFILEINFO structure.
You need to have the icon file and / or version.txt file for VS_FIXEDFILEINFO structure.
Let's see the version.txt file:

# UTF-8
#
# For more details about fixed file info 'ffi' see:
# http://msdn.microsoft.com/en-us/library/ms646997.aspx
VSVersionInfo(
  ffi=FixedFileInfo(
    # filevers and prodvers should be always a tuple with four items: (1, 2, 3, 4)
    # Set not needed items to zero 0.
    filevers=(2017, 1, 1, 1),
    prodvers=(1, 1, 1, 1),
    # Contains a bitmask that specifies the valid bits 'flags'
    mask=0x3f,
    # Contains a bitmask that specifies the Boolean attributes of the file.
    flags=0x0,
    # The operating system for which this file was designed.
    # 0x4 - NT and there is no need to change it.
    OS=0x4,
    # The general type of file.
    # 0x1 - the file is an application.
    fileType=0x1,
    # The function of the file.
    # 0x0 - the function is not defined for this fileType
    subtype=0x0,
    # Creation date and time stamp.
    date=(0, 0)
    ),
  kids=[
    StringFileInfo(
      [
      StringTable(
        u'040904b0',
        [StringStruct(u'CompanyName', u'python-catalin'),
        StringStruct(u'ProductName', u'test'),
        StringStruct(u'ProductVersion', u'1, 1, 1, 1'),
        StringStruct(u'InternalName', u'tk_app'),
        StringStruct(u'OriginalFilename', u'tk_app.exe'),
        StringStruct(u'FileVersion', u'2017, 1, 1, 1'),
        StringStruct(u'FileDescription', u'test tk'),
        StringStruct(u'LegalCopyright', u'Copyright 2017 free-tutorials.org.'),
        StringStruct(u'LegalTrademarks', u'tk_app is a registered trademark of catafest.'),])
      ]),
    VarFileInfo([VarStruct(u'Translation', [0x409, 1200])])
  ]
)

Now you can use this command for tk_app.py and version.txt files from the C:\Python27 folder:

 pyinstaller.exe --onefile --windowed --version-file=..\version.txt ..\tk_app.py

Let's see this info into the executable file:

If you wand to change the icon then you need to add the --icon=tk_app.ico, where tk_app.ico is the new icon of the executable.

↧

Catalin George Festila: About py-translate python module.

July 22, 2017, 4:46 am

≫ Next: Weekly Python StackOverflow Report: (lxxxiii) stackoverflow python report

≪ Previous: Catalin George Festila: Make one executable from a python script.

This python module is used for translating text in the terminal.
You can read and see examples with this API on this web page.
Features

Fast! Translate an entire book in less than 5 seconds.
Made for Python 3 but still works on Python 2
Fast and easy to install, easy to use
Supports translation from any language
Highly composable interface, the power of Unix pipes and filters.
Simple API and documentation

Installation

C:\>cd Python27

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install py-translate
Collecting py-translate
  Downloading py_translate-1.0.3-py2.py3-none-any.whl (61kB)
    100% |################################| 61kB 376kB/s
Installing collected packages: py-translate
Successfully installed py-translate-1.0.3

C:\Python27\Scripts>

Let's test it with a simple example:

>>> import translate
>>> dir(translate)
['TestLanguages', 'TestTranslator', '__author__', '__build__', '__builtins__', '__copyright__', '__doc__', '__file__', '__license__', '__name__', '__package__', '__path__', '__title__', '__version__', 'accumulator', 'coroutine', 'coroutines', 'languages', 'print_table', 'push_url', 'set_task', 'source', 'spool', 'tests', 'translation_table', 'translator', 'write_stream']
>>> from translate import translator
>>> translator('ro', 'en', 'Consider ca dezvoltarea personala este un pas important')
[[[u'I think personal development is an important step', u'Consider ca dezvoltarea personala este un pas important', None, None, 0]], None, u'ro']
>>>

↧

Weekly Python StackOverflow Report: (lxxxiii) stackoverflow python report

July 22, 2017, 12:55 pm

≫ Next: Patricio Paez: Concatenating strings with punctuation

≪ Previous: Catalin George Festila: About py-translate python module.

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-07-22 19:52:51 GMT

↧

Patricio Paez: Concatenating strings with punctuation

July 22, 2017, 4:21 pm

≫ Next: Jaysinh Shukla: PyDelhi Conf 2017: A beautiful conference happened in New Delhi, India

≪ Previous: Weekly Python StackOverflow Report: (lxxxiii) stackoverflow python report

Creating strings of the form “a, b, c, and d” from a list [‘a’, ‘b’, ‘c’, ‘d’] is a task I faced some time ago, as I needed to include such strings in some HTML documents. The “,” and the “and” are included according to the amount of elements. [‘a’, ‘b’] yields “a and b“, [‘a’] yields “a” for example. In a recent review to the code, I changed the method from using string concatenation:

if len(items) > 1:
    text = items[0]
    for item in items[1:-1]:
        text += ', ' + item
    text += ' and ' + items[-1]
else:
    text = items[0]

to the use of slicing of the items list, addition of the resulting sublists and str.join to include the punctuation:

first = items[:1]
middle = items[1:-1]
last = items[1:][-1:]
first_middle = [', '.join(first + middle)]
text = ' and '.join(first_middle + last)

The old method requires an additonal elif branch to work when items is an empty list; the new method returns an empty string if the items list is empty. I share this tip in case it is useful to someone else.

↧

Jaysinh Shukla: PyDelhi Conf 2017: A beautiful conference happened in New Delhi, India

July 21, 2017, 5:44 am

≫ Next: Trey Hunner: Craft Your Python Like Poetry

≪ Previous: Patricio Paez: Concatenating strings with punctuation

PyDelhi Conf
2017

TL;DR

PyDelhi conf 2017 was a two-day conference which featured workshops, dev sprints, both full-length and lightning talks. There were workshop sessions without any extra charges. Delhiites should not miss the chance to attend this conference in future. I conducted a workshop titled “Tango with Django” helping beginners to understand the Django web framework.

Detailed Review

About the PyDelhi community

PyDelhi
Community

PyDelhi conf 2017 volunteers

The PyDelhi community was known as NCR Python Users Group before few years. This community is performing a role of an umbrella organization for other FLOSS communities across New Delhi, India. They are actively arranging monthly meetups on interesting topics. Last PyCon India which is a national level conference of Python programming language was impressively organized by this community. This year too they took the responsibility of managing it. I am very thankful to this community for their immense contribution to this society. If you are around New Delhi, India then you should not miss the chance to attend their meetups. This community has great people who are always happy to mentor.

PyDelhi conf 2017

Conference T-shirt

PyDelhi conf is a regional level conference of Python programming language organized by PyDelhi community. It is their second year organizing this conference. Last year it was located at JNU University. This year it happened at IIM, Lucknow campus based in Noida, New Delhi, India. I enjoyed various talks which I will mention later here, a workshops section because I was conducting one and some panel discussions because people involved were having a good level of experience. 80% of the time slot was divided equally between 30 minutes talk and 2-hour workshop section. 10% were given to panel discussions and 10% was reserved for lightning talks. The dev sprints were happening in parallel with the conference. The early slot was given to workshops for both the days. One large conference hall was located on a 2nd floor of the building and two halls at the ground floor. Food and beverages were served on the base floor.

Panel discussion

Panel Discussion

Desk

Registration desk

Lunch

Tea break

Keynote speakers

Mr. Richardo Rocha

Mr. Ricardo Rocha: Mr. Rocha is a software engineer at CERN. I got some time to talk with him post-conference. We discussed his responsibilities at CERN. I was impressed when he explained how he is managing infrastructure with his team. On inquiring opportunities available at CERN he mentioned that the organization is always looking for the talented developers. New grads can keep an eye on various Summer Internship Programs which are very similar to Google Summer of Code program.

Mr. Chris Stucchio

Mr. Chris Stucchio: Mr. Stucchio is director of Data Science at Wingify/ VWO. I found him physically fit compared to other software developers (mostly of India). I didn’t get much time to have a word with him.

Interesting Talks

Because I took the wrong metro train, I was late for the inaugural ceremony. I also missed a keynote given by Mr. Rocha. Below talks were impressively presented at the conference.

Let’s talk about GIL by Mr. Amit Kumar:Mr. Kumar discussed various ways to trace threads first and then moved the track towards Global Interpreter Lock. He described why the GIL is important in CPython.
Concurrency in Python 3.0 world - Oh my! by Mr. Anand Pillai:Mr. Pillai is well experienced in programming using Python language. I like getting his advices on various programming topics. He explained how async IO can be leveraged to boost your programs. I got few correct references on understanding latest API of the async library.
Property based testing 101 by Aniket Maithani: I always enjoy chit chatting with Mr. Maithani during conferences. He has jolly nature and always prepared with one liner. He discussed various strategies for generating demo data for test cases. I was amazed by his references, tips and tricks for generating test data.

I love discussing with people rather than sit in on sessions. With that ace-reason, I always lose some important talks presented at the conference. I do not forget to watch them once they are publicly available. This year I missed following talks.

Volunteer Party

I got a warm invitation by the organizers to join the volunteer party, but I was little tensed about my session happening on the next day. So, I decided to go home and improve the slides. I heard from friends that the party was awesome!

My workshop session

Tango with Django

Me conducting workshop

I conducted a workshop on Django web framework. “Tango with Django” was chosen as a title with a thought of attracting beginners. I believe this title is already a name of famous book solving the same purpose.

Dev sprints

Me hacking at dev sprints section

The dev sprints were happening parallel with the conference. Mr. Pillai was representing Junction. I decided to test few issues of CPython but didn’t do much. There were a bunch of people hacking but didn’t find anything interesting. The quality of chairs was so an impressive that I have decided to buy the same for my home office.

Why attend this conference?

Free Workshops: The conference has great slot of talks and workshops. Workshops are being conducted by field experts without expecting any other fees. This can be one of the great advantages you leverage from this conference.
Student discounts: If you are a student then you will receive a discount on the conference ticket.
Beginner friendly platform: If you are novice speaker than you will get mentorship from this community. You can conduct a session for beginners.
Networking: You will find senior employees of tech giants, owner of innovative start-ups and professors from well-known universities participating in this conference. It can be a good opportunity for you to network with them.

What was missing?

Lecture hall arrangement: It was difficult to frequently travel to the second floor and come back to the ground floor. I found most people were spending their time on the ground floor rather than attending talks going on upstairs.
No corporate stalls: Despite having corporate sponsors like Microsoft I didn’t find any stall of any company.
The venue for dev sprints: The rooms were designed for teleconference containing circularly arranged wooden tables. This was not creating a collaborative environment. Involved projects were not frequently promoted during the conference.

Thank you PyDelhi community!

I would like to thank all the known, unknown volunteers who performed their best in arranging this conference. I am encouraging PyDelhi community for keep organizing such an affable conference.

Proofreaders: Mr. Daniel Foerster, Mr. Dhavan Vaidya, Mr. Sayan Chowdhury, Mr. Trent Buck

↧

Trey Hunner: Craft Your Python Like Poetry

July 23, 2017, 10:00 am

≫ Next: NumFOCUS: Meet our GSoC Students Part 3: Matplotlib, PyMC3, FEniCS, MDAnalysis, Data Retriever, & Gensim

≪ Previous: Jaysinh Shukla: PyDelhi Conf 2017: A beautiful conference happened in New Delhi, India

Line length is a big deal… programmers argue about it quite a bit. PEP 8, the Python style guide, recommends a 79 character maximum line length but concedes that a line length up to 100 characters is acceptable for teams that agree to use a specific longer line length.

So 79 characters is recommended… but isn’t line length completely obsolete? After all, programmers are no longer restricted by punch cards, teletypes, and 80 column terminals. The laptop screen I’m typing this on can fit about 200 characters per line.

Line length is not obsolete

Line length is not a technical limitation: it’s a human-imposed limitation. Many programmers prefer short lines because long lines are hard to read. This is true in typography and it’s true in programming as well.

Short lines are easier to read.

In the typography world, a line length of 55 characters per line is recommended for electronic text (see line length on Wikipedia). That doesn’t mean we should use a 55 character limit though; typography and programming are different.

Python isn’t prose

Python code isn’t structured like prose. English prose is structured in flowing sentences: each line wraps into the next line. In Python, statements are somewhat like sentences, meaning each sentence begins at the start of each line.

Python code is more like poetry than prose. Poets and Python programmers don’t wrap lines once they hit an arbitrary length; they wrap lines when they make sense for readability and beauty.

I stand amid the roar Of a surf-tormented shore, And I hold within my hand
Grains of the golden sand— How few! yet how they creep Through my fingers to
the deep, While I weep—while I weep! O God! can I not grasp Them with a
tighter clasp? O God! can I not save One from the pitiless wave? Is all that we
see or seem But a dream within a dream?

Don’t wrap lines arbitrarily. Craft each line with care to help readers experience your code exactly the way you intended.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand—
How few! yet how they creep
Through my fingers to the deep,
While I weep—while I weep!
O God! can I not grasp
Them with a tighter clasp?
O God! can I not save
One from the pitiless wave?
Is all that we see or seem
But a dream within a dream?

Examples

It’s not possible to make a single rule for when and how to wrap lines of code. PEP8 discusses line wrapping briefly, but it only discusses one case of line wrapping and three different acceptable styles are provided, leaving the reader to choose which is best.

Line wrapping is best discussed through examples. Let’s look at a few examples of long lines and few variations for line wrapping for each.

Example: Wrapping a Comprehension

This line of code is over 79 characters long:

employee_hours=[schedule.earliest_hourforemployeeinself.public_employeesforscheduleinemployee.schedules]

Here we’ve wrapped that line of code so that it’s two shorter lines of code:

employee_hours=[schedule.earliest_hourforemployeeinself.public_employeesforscheduleinemployee.schedules]

We’re able to insert that line break in this line because we have an unclosed square bracket. This is called an implicit line continuation. Python knows we’re continuing a line of code whenever there’s a line break inside unclosed square brackets, curly braces, or parentheses.

This code still isn’t very easy to read because the line break was inserted arbitrarily. We simply wrapped this line just before a specific line length. We were thinking about line length here, but we completely neglected to think about readability.

This code is the same as above, but we’ve inserted line breaks in very particular places:

employee_hours=[schedule.earliest_hourforemployeeinself.public_employeesforscheduleinemployee.schedules]

We have two lines breaks here and we’ve purposely inserted them before our for clauses in this list comprehension.

Statements have logical components that make up a whole, the same way sentences have clauses that make up the whole. We’ve chosen to break up this list comprehension by inserting line breaks between these logical components.

Here’s another way to break up this statement:

employee_hours=[schedule.earliest_hourforemployeeinself.public_employeesforscheduleinemployee.schedules]

Which of these methods you prefer is up to you. It’s important to make sure you break up the logical components though. And whichever method you choose, be consistent!

Example: Function Calls

This is a Django model field with a whole bunch of arguments being passed to it:

default_appointment=models.ForeignKey(othermodel='AppointmentType',null=True,on_delete=models.SET_NULL,related_name='+')

We’re already using an implicit line continuation to wrap these lines of code, but again we’re wrapping this code at an arbitrary line length.

Here’s the same Django model field with one argument specific per line:

default_appointment=models.ForeignKey(othermodel='AppointmentType',null=True,on_delete=models.SET_NULL,related_name='+')

We’re breaking up the component parts (the arguments) of this statement onto separate lines.

We could also wrap this line by indenting each argument instead of aligning them:

default_appointment=models.ForeignKey(othermodel='AppointmentType',null=True,on_delete=models.SET_NULL,related_name='+')

Notice we’re also leaving that closing parenthesis on its own line. We could additionally add a trailing comma if we wanted:

default_appointment=models.ForeignKey(othermodel='AppointmentType',null=True,on_delete=models.SET_NULL,related_name='+',)

Which of these is the best way to wrap this line?

Personally for this line I prefer that last approach: each argument on its own line, the closing parenthesis on its own line, and a comma after each argument.

It’s important to decide what you prefer, reflect on why you prefer it, and always maintain consistency within each project/file you create. And keep in mind that consistence of your personal style is less important than consistency within a single project.

Example: Chained Function Calls

Here’s a long line of chained Django queryset methods:

books=Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title')

Notice that there aren’t parenthesis around this whole statement, so the only place we can currently wrap our lines is inside those parenthesis. We could do something like this:

books=Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title')

But that looks kind of weird and it doesn’t really improve readability.

We could add backslashes at the end of each line to allow us to wrap at arbitrary places:

books=Book.objects\
.filter(author__in=favorite_authors)\
.select_related('author','publisher')\
.order_by('title')

This works, but PEP8 recommends against this.

We could wrap the whole statement in parenthesis, allowing us to use implicit line continuation wherever we’d like:

books=(Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title'))

It’s not uncommon to see extra parenthesis added in Python code to allow implicit line continuations.

That indentation style is a little odd though. We could align our code with the parenthesis instead:

books=(Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title'))

Although I’d probably prefer to align the dots in this case:

books=(Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title'))

A fully indentation-based style works too (we’ve also moved objects to its own line here):

books=(Book.objects.filter(author__in=favorite_authors).select_related('author','publisher').order_by('title'))

There are yet more ways to resolve this problem. For example we could try to use intermediary variables to avoid line wrapping entirely.

Chained methods pose a different problem for line wrapping than single method calls and require a different solution. Focus on readability when picking a preferred solution and be consistent with the solution you pick. Consistency lies at the heart of readability.

Example: Dictionary Literals

I often define long dictionaries and lists defined in Python code.

Here’s a dictionary definition that has been over multiple lines, with line breaks inserted as a maximum line length is approached:

MONTHS={'January':1,'February':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12}

Here’s the same dictionary with each key-value pair on its own line, aligned with the first key-value pair:

MONTHS={'January':1,'February':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12}

And the same dictionary again, with each key-value pair indented instead of aligned (with a trailing comma on the last line as well):

MONTHS={'January':1,'February':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12,}

This is the strategy I prefer for wrapping long dictionaries and lists. I very often wrap short dictionaries and lists this way as well, for the sake of readability.

Python is Poetry

The moment of peak readability is the moment just after you write a line of code. Your code will be far less readable to you one day, one week, and one month after you’ve written it.

When crafting Python code, use spaces and line breaks to split up the logical components of each statement. Don’t write a statement on a single line unless it’s already very clear. If you break each line over multiple lines for clarity, lines length shouldn’t be a major concern because your lines of code will mostly be far shorter than 79 characters already.

Make sure to craft your code carefully as you write it because your future self will have a much more difficult time cleaning it up than you will right now. So take that line of code you just wrote and carefully add line breaks to it.

↧

NumFOCUS: Meet our GSoC Students Part 3: Matplotlib, PyMC3, FEniCS, MDAnalysis, Data Retriever, & Gensim

July 23, 2017, 10:00 am

≫ Next: Mike Driscoll: Python is #1 in 2017 According to IEEE Spectrum

≪ Previous: Trey Hunner: Craft Your Python Like Poetry

↧

Mike Driscoll: Python is #1 in 2017 According to IEEE Spectrum

July 23, 2017, 11:53 am

≫ Next: Kevin Dahlhausen: Using Beets from 3rd Party Python Applications

≪ Previous: NumFOCUS: Meet our GSoC Students Part 3: Matplotlib, PyMC3, FEniCS, MDAnalysis, Data Retriever, & Gensim

It’s always fun to see what languages are considered to be in the top ten. This year, IEEE Spectrum named Python as the #1 language in the Web and Enterprise categories. Some of the Python community over at Reddit think that the scoring of the languages are flawed because Javascript is below R in web programming. That gives me pause as well. Frankly I don’t really see how anything is above Javascript when it comes to web programming.

Regardless, it’s still interesting to read through the article.

Python One of Eight Languages to Have on Resume in 2016
Python Most Popular University Teaching Language?
Most Popular Language on CodeEval.com? Python!

↧

Kevin Dahlhausen: Using Beets from 3rd Party Python Applications

July 23, 2017, 1:47 pm

≫ Next: Full Stack Python: How to Add Hosted Monitoring to Flask Web Applications

≪ Previous: Mike Driscoll: Python is #1 in 2017 According to IEEE Spectrum

I am thinking of using Beets as music library to update a project. The only example of using it this way is in the source code of the Beets command-line interface. That code is well-written but does much more than I need so I decided to create a simple example of using Beets in a 3rd party application.

The hardest part turned out to be determining how to create a proper configuration pro grammatically. The final code is short:

        config["import"]["autotag"] = False
        config["import"]["copy"] = False
        config["import"]["move"] = False
        config["import"]["write"] = False
        config["library"] = music_library_file_name
        config["threaded"] = True

This will create a configuration that keeps the music files in place and does not attempt to autotag them.

Importating files requires one to subclass importer.ImportSession. A simple importer that serves to import files and not change them is:

    class AutoImportSession(importer.ImportSession):
        "a minimal session class for importing that does not change files"

        def should_resume(self, path):
            return True

        def choose_match(self, task):
            return importer.action.ASIS

        def resolve_duplicate(self, task, found_duplicates):
            pass

        def choose_item(self, task):
            return importer.action.ASIS

That’s the trickiest part of it. The full demo is:


# Copyright 2017, Kevin Dahlhausen
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.

from beets import config
from beets import importer
from beets.ui import _open_library

class Beets(object):
    """a minimal wrapper for using beets in a 3rd party application
       as a music library."""

    class AutoImportSession(importer.ImportSession):
        "a minimal session class for importing that does not change files"

        def should_resume(self, path):
            return True

        def choose_match(self, task):
            return importer.action.ASIS

        def resolve_duplicate(self, task, found_duplicates):
            pass

        def choose_item(self, task):
            return importer.action.ASIS

    def __init__(self, music_library_file_name):
        """ music_library_file_name = full path and name of
            music database to use """
        "configure to keep music in place and do not auto-tag"
        config["import"]["autotag"] = False
        config["import"]["copy"] = False
        config["import"]["move"] = False
        config["import"]["write"] = False
        config["library"] = music_library_file_name
        config["threaded"] = True

        # create/open the the beets library
        self.lib = _open_library(config)

    def import_files(self, list_of_paths):
        """import/reimport music from the list of paths.
            Note: This may need some kind of mutex as I
                  do not know the ramifications of calling
                  it a second time if there are background
                  import threads still running.
        """
        query = None
        loghandler = None  # or log.handlers[0]
        self.session = Beets.AutoImportSession(self.lib, loghandler,
                                               list_of_paths, query)
        self.session.run()

    def query(self, query=None):
        """return list of items from the music DB that match the given query"""
        return self.lib.items(query)

if __name__ == "__main__":

    import os

    # this demo places music.db in same lib as this file and
    # imports music from <this dir>/Music
    path_of_this_file = os.path.dirname(__file__)
    MUSIC_DIR = os.path.join(path_of_this_file, "Music")
    LIBRARY_FILE_NAME = os.path.join(path_of_this_file, "music.db")

    def print_items(items, description):
        print("Results when querying for "+description)
        for item in items:
            print("   Title: {} by '{}' ".format(item.title, item.artist))
            print("      genre: {}".format(item.genre))
            print("      length: {}".format(item.length))
            print("      path: {}".format(item.path))
        print("")

    demo = Beets(LIBRARY_FILE_NAME)

    # import music - this demo does not move, copy or tag the files
    demo.import_files([MUSIC_DIR, ])

    # sample queries:
    items = demo.query()
    print_items(items, "all items")

    items = demo.query(["artist:heart,", "title:Hold", ])
    print_items(items, 'artist="heart" or title contains "Hold"')

    items = demo.query(["genre:Hard Rock"])
    print_items(items, 'genre = Hard Rock')

I hope this helps. Turns out it is easy to use beets in other apps.

↧

Full Stack Python: How to Add Hosted Monitoring to Flask Web Applications

July 22, 2017, 9:00 pm

≫ Next: Codementor: Managing Data in Golang Using Gorm - Part 1

≪ Previous: Kevin Dahlhausen: Using Beets from 3rd Party Python Applications

How do you know whether your application is running properly with minimal errors after building and deploying it? The fastest and easiest way to monitor your operational Flask web application is to integrate one of the many available fantastic hosted monitoring tools.

In this post we will quickly add Rollbar monitoring to catch errors and visualize our application is running properly.

Our Tools

We can use either Python 2 or 3 to build this tutorial, but Python 3 is strongly recommended for all new applications. I used Python 3.6.2 to execute my code. We will also use the following application dependencies throughout the post:

Flask web framework, version 0.12.2
pyrollbar monitoring instrumentation library, version 0.13.12
blinker for signaling support in Flask applications so pyrollbar can report on all errors
A free Rollbar account where we will send error data and view it when it is captured
pip and the virtualenv virtual environment library, which come packaged with Python 3, to install and isolate the Flask and Rollbar libraries from other Python projects you are working on

If you need help getting your development environment configured before running this code, take a look at this guide for setting up Python 3 and Flask on Ubuntu 16.04 LTS.

All code in this blog post is available open source under the MIT license on GitHub under the monitor-flask-apps directory of the blog-code-examples repository. Use and abuse the source code as you desire for your own applications.

Installing Dependencies

Change into the directory where you keep your Python virtualenvs. Create a new virtual environment for this project using the following command.

python3 -m venv monitorflask

Activate the virtualenv.

source monitorflask/bin/activate

The command prompt will change after activating the virtualenv:

Activating our Python virtual environment on the command line.

Remember that you need to activate the virtualenv in every new terminal window where you want to use the virtualenv to run the project.

Flask, Rollbar and Blinker can now be installed into the now-activated virtualenv.

pip install flask==0.12.2 rollbar==0.13.12 blinker==1.4

Our required dependencies should be installed within our virtualenv after a short installation period. Look for output like the following to confirm everything worked.

Installing collected packages: blinker, itsdangerous, click, MarkupSafe, Jinja2, Werkzeug, Flask, idna, urllib3, chardet, certifi, requests, six, rollbar
  Running setup.py install for blinker ... done
  Running setup.py install for itsdangerous ... done
  Running setup.py install for MarkupSafe ... done
  Running setup.py install for rollbar ... done
Successfully installed Flask-0.12.2 Jinja2-2.9.6 MarkupSafe-1.0 Werkzeug-0.12.2 blinker-1.4 certifi-2017.4.17 chardet-3.0.4 click-6.7 idna-2.5 itsdangerous-0.24 requests-2.18.1 rollbar-0.13.12 six-1.10.0 urllib3-1.21.1

Now that we have our Python dependencies installed into our virtualenv we can create the initial version of our application.

Building Our Flask App

Create a folder for your project named monitor-flask-apps. Change into the folder and then create a file named app.py with the following code.

importrefromflaskimportFlask,render_template,Responsefromwerkzeug.exceptionsimportNotFoundapp=Flask(__name__)MIN_PAGE_NAME_LENGTH=2@app.route("/<string:page>/")defshow_page(page):try:valid_length=len(page)>=MIN_PAGE_NAME_LENGTHvalid_name=re.match('^[a-z]+$',page.lower())isnotNoneifvalid_lengthandvalid_name:returnrender_template("{}.html".format(page))else:msg="Sorry, couldn't find page with name {}".format(page)raiseNotFound(msg)except:returnResponse("404 Not Found")if__name__=="__main__":app.run(debug=True)

The above application code has some standard Flask imports so we can create a Flask web app and render template files. We have a single function named show_page to serve a single Flask route. show_page checks if the URL path contains only lowercase alpha characters for a potential page name. If the page name can be found in the templates folder then the page is rendered, otherwise an exception is thrown that the page could not be found. We need to create at least one template file if our function is ever going to return a non-error reponse.

Save app.py and make a new subdirectory named templates under your project directory. Create a new file named battlegrounds.html and put the following Jinja2 template markup into it.

<!DOCTYPE html><html><head><title>You found the Battlegrounds GIF!</title></head><body><h1>PUBG so good.</h1><imgsrc="https://media.giphy.com/media/3ohzdLMlhId2rJuLUQ/giphy.gif"></body></html>

The above Jinja2 template is basic HTML without any embedded template tags. The template creates a very plain page with a header description of "PUBG so good" and a GIF from this excellent computer game.

Time to run and test our code. Change into the base directory of your project where app.py file is located. Execute app.py using the python command as follows (make sure your virtualenv is still activated in the terminal where you are running this command):

python app.py

The Flask development server should start up and display a few lines of output.

Run the Flask development server locally.

What happens when we access the application running on localhost port 5000?

Testing our Flask application at the base URL receives an HTTP 404 error.

HTTP status 404 page not found, which is what we expected because we only defined a single route and it did not live at the base path.

We created a template named battlegrounds.html that should be accessible when we go to localhost:5000/battlegrounds/.

Testing our Flask application at /battlegrounds/ gets the proper template with a GIF.

The application successfully found the battlegrounds.html template but that is the only one available. What if we try localhost:5000/fullstackpython/?

If no template is found we receive a 500 error.

HTTP 500 error. That's no good.

The 404 and 500 errors are obvious to us right now because we are testing the application locally. However, what happens when the app is deployed and a user gets the error in their own web browser? They will typically quit out of frustration and you will never know what happened unless you add some error tracking and application monitoring.

We will now modify our code to add Rollbar to catch and report those errors that occur for our users.

Handling Errors

Head to Rollbar's homepage so we can add their hosted monitoring tools to our oft-erroring Flask app.

Rollbar homepage in the web browser.

Click the "Sign Up" button in the upper right-hand corner. Enter your email address, a username and the password you want on the sign up page.

Enter your basic account information on the sign up page.

After the sign up page you will see the onboarding flow where you can enter a project name and select a programming language. For project name enter "Battlegrounds" and select that you are monitoring a Python app.

Create a new project named 'Battlegrounds' and select Python as the programming language.

Press the "Continue" button at the bottom to move along. The next screen shows us a few quick instructions to add monitoring to our Flask application.

Set up your project using your server-side access token.

Let's modify our Flask application to test whether we can properly connect to Rollbar's service. Change app.py to include the following highlighted lines.

~~importosimportre~~importrollbarfromflaskimportFlask,render_template,Responsefromwerkzeug.exceptionsimportNotFoundapp=Flask(__name__)MIN_PAGE_NAME_LENGTH=2~~@app.before_first_request~~defadd_monitoring():~~rollbar.init(os.environ.get('ROLLBAR_SECRET'))~~rollbar.report_message('Rollbar is configured correctly')@app.route("/<string:page>/")defshow_page(page):try:valid_length=len(page)>=MIN_PAGE_NAME_LENGTHvalid_name=re.match('^[a-z]+$',page.lower())isnotNoneifvalid_lengthandvalid_name:returnrender_template("{}.html".format(page))else:msg="Sorry, couldn't find page with name {}".format(page)raiseNotFound(msg)except:returnResponse("404 Not Found")if__name__=="__main__":app.run(debug=True)

We added a couple of new imports, os and rollbar. os allows us to grab environment variable values, such as our Rollbar secret key. rollbar is the library we installed earlier. The two lines below the Flask app instantiation are to initialize Rollbar using the Rollbar secret token and send a message to the service that it started correctly.

The ROLLBAR_SECRET token needs to be set in an environment variable. Save an quit the app.py. Run export ROLLBAR_SECRET='token here' on the command line where your virtualenv is activated. This token can be found on the Rollbar onboarding screen.

I typically store all my environment variables in a file like template.env and invoke it from the terminal using the . ./template.env command. Make sure to avoid committing your secret tokens to a source control repository, especially if the repository is public!

After exporting your ROLLBAR_SECRET key as an environment variable we can test that Rollbar is working as we run our application. Run it now using python:

python app.py

Back in your web browser press the "Done! Go to Dashboard" button. Don't worry about the "Report an Error" section code, we can get back to that in a moment.

If the event hasn't been reported yet we'll see a waiting screen like this one:

Waiting for data on the dashboard.

Once Flask starts up though, the first event will be populated on the dashboard.

First event populated on our dashboard for this project.

Okay, our first test event has been populated, but we really want to see all the errors from our application, not a test event.

Testing Error Handling

How do we make sure real errors are reported rather than just a simple test event? We just need to add a few more lines of code to our app.

importosimportreimportrollbar~~importrollbar.contrib.flaskfromflaskimportFlask,render_template,Response~~fromflaskimportgot_request_exceptionfromwerkzeug.exceptionsimportNotFoundapp=Flask(__name__)MIN_PAGE_NAME_LENGTH=2@app.before_first_requestdefadd_monitoring():rollbar.init(os.environ.get('ROLLBAR_SECRET'))~~## delete the next line if you dont want this event anymorerollbar.report_message('Rollbar is configured correctly')~~got_request_exception.connect(rollbar.contrib.flask.report_exception,app)@app.route("/<string:page>/")defshow_page(page):try:valid_length=len(page)>=MIN_PAGE_NAME_LENGTHvalid_name=re.match('^[a-z]+$',page.lower())isnotNoneifvalid_lengthandvalid_name:returnrender_template("{}.html".format(page))else:msg="Sorry, couldn't find page with name {}".format(page)raiseNotFound(msg)except:~~rollbar.report_exc_info()returnResponse("404 Not Found")if__name__=="__main__":app.run(debug=True)

The above highlighted code modifies the application so it reports all Flask errors as well as our HTTP 404 not found issues that happen within the show_page function.

Make sure your Flask development server is running and try to go to localhost:5000/b/. You will receive an HTTP 404 exception and it will be reported to Rollbar. Next go to localhost:5000/fullstackpython/ and an HTTP 500 error will occur.

You should see an aggregation of errors as you test out these errors:

Rollbar dashboard showing aggregations of errors.

Woohoo, we finally have our Flask app reporting all errors that occur for any user back to the hosted Rollbar monitoring service!

What's Next?

We just learned how to catch and handle errors with Rollbar as a hosted monitoring platform in a simple Flask application. Next you will want to add monitoring to your more complicated web apps. You can also check out some of Rollbar's more advanced features such as:

There is a lot more to learn about web development and deployments so keep learning by reading up on Flask and other web frameworks such as Django, Pyramid and Sanic. You can also learn more about integrating Rollbar with Python applications via their Python documentation.

Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.

See something wrong in this blog post? Fork this page's source on GitHub and submit a pull request with a fix.

↧