PyCoder’s Weekly: Issue #400 (Dec. 24, 2019)

December 24, 2019, 11:30 am

≫ Next: Erik Marsja: Pipx: Installing, Uninstalling, & Upgrading Python Packages in Virtual Envs

≪ Previous: PyPy Development: PyPy 7.3.0 released

#400 – DECEMBER 24, 2019
View in Browser »

Python Dictionary Iteration: Advanced Tips & Tricks

In this intermediate-level course, you’ll take a deep dive into how to iterate through a dictionary in Python. Dictionaries are a fundamental data structure, and you’ll be able to solve a wide variety of programming problems by iterating through them.
REAL PYTHONvideo

Introduction to ASGI: Emergence of an Async Python Web Ecosystem

“If you were thinking Python had been getting locked into data science, think again! Python web development is back with an async spin, and it’s exciting.” Great writeup!
FLORIMOND MANCA

Scout APM for Python

Check out Scout’s developer-friendly application performance monitoring solution for Python. Scout continually tracks down N+1 database queries, sources of memory bloat, performance abnormalities, and more. Get back to coding with Scout →
SCOUT APMsponsor

12 Trending Alternatives for Distributing Python Applications in 2020

A look at the various systems to choose from for packaging and distributing Python code.
CRISTIAN MEDINA• Shared by Cristian Medina

NumPy, SciPy, and Pandas: Correlation With Python

Learn what correlation is and how you can calculate it with Python. You’ll use SciPy, NumPy, and Pandas correlation methods to calculate three different correlation coefficients. You’ll also see how to visualize data, regression lines, and correlation matrices with Matplotlib.
REAL PYTHON

Python Performance Tips

Various tips and tricks that help improve the performance of your Python programs.
SKIP MONTANARO

Django Security Releases Issued: 3.0.1, 2.2.9, and 1.11.27

Addresses CVE-2019-19844: Potential account hijack via password reset form.
DJANGOPROJECT.COM

Python 3.8.1 Released

PYTHON.ORG

Python Jobs

Articles & Tutorials

Running Python With Docker: How to Try the Latest CPython Release

Learn how to run different Python versions in Docker. By following the examples, you’ll see how you can play with the latest development version of Python, and how to use Dockerfiles to set up Python environments and package your own scripts.
REAL PYTHON

How to Make Python Wait

“For many types of applications, at times it is necessary to pause the running of the program until some external condition occurs. […] In this article I’m going to show you a few different ways to wait.”
MIGUEL GRINBERG

Python Tricks: A Buffet of Awesome Python Features

Discover Python’s best practices with simple examples and start writing even more beautiful + Pythonic code. “Python Tricks: The Book” shows you exactly how. You’ll master intermediate and advanced-level features in Python with practical examples and a clear narrative. Get the book + video bundle 33% off →
DAN BADERsponsor

Python’s Built In IDE Isn’t Just Sitting IDLE

An episode about the IDLE package built into Python and how it reduces the friction associated with learning to program by having an easy to use IDE out of the box.
PYTHONPODCAST.COMpodcast

Creating Interactive Dashboards From Jupyter Notebooks

This article discusses how to build an interactive dashboard to analyze reddit content and display interactive graphs of the result using Voilà.
CHRIS MOFFITT

Prioritizing Simplicity in Your Python Code

This is part of a series about the Zen of Python. This article focuses on the third and fourth principles: simplicity and complexity.
MOSHE ZADKA

Working With Redis in Python With Django

This post introduces you to Redis as a key-value store and uses it in a Django project to explore its functionality.
ROBLEY GORI

Dependency Injection in Python With Pinject

Learn the basic principles of Dependency Injection and how to implement it in Python using the Pinject library.
PEPY.TECH• Shared by Petru Rares Sincraian

Precise Unit Tests With PyHamcrest

Hamcrest is a Python framework designed to make test assertions easier to write and more precise.
MOSHE ZADKA

Mocking Python Like a Boss

The Mock Generator is a library to simplify and shorten the time it takes to write Python mocks.
PETER KOGAN• Shared by Peter Kogan

Create a Simple Telegram Bot With Telethon

Quick tutorial for creating your own bot for the Telegram messenger.
MISHA BEHERSKY• Shared by Misha Behersky

Guide to Python Import Statements

How to resolve common importing problems in Python 2 and 3.
CHRIS YEH• Shared by Jonathan Willitts

High-Level Intro to “SOLID” OOP Design Principles

LARA KATTAN

Projects & Code

nlp-recipes: Natural Language Processing Best Practices & Examples

GITHUB.COM/MICROSOFT

assembly: Pythonic OOP Web Framework Built on Flask

GITHUB.COM/MARDIX

pretty-errors: Prettier Python Stacktraces

PYPI.ORG

osxphotos: Access to macOS Photos.app Library With Python

GITHUB.COM/RHETTBULL

wemake-python-package: Cookiecutter Template for Creating New Python Packages

GITHUB.COM/WEMAKE-SERVICES

django-admin-shell: Python Shell for Django Admin Site

GITHUB.COM/DJK2

pylightxl: Light Weight Excel Read/Writer With No Dependencies

PYPI.ORG

Typical: Data-Validation Library Using Python 3 Type Annotations

PYTHON-TYPICAL.ORG• Shared by Sean Stewart

Events

SPb Python Drinkup

December 26, 2019
MEETUP.COM

PythOnRio Meetup

December 28, 2019
PYTHON.ORG.BR

Python Sheffield

December 31, 2019
GOOGLE.COM

Heidelberg Python Meetup

January 1, 2020
MEETUP.COM

STL Python

January 1, 2020
MEETUP.COM

PiterPy Breakfast

January 1, 2020
TIMEPAD.RU

Happy Pythoning!
This was PyCoder’s Weekly Issue #400.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

↧

Erik Marsja: Pipx: Installing, Uninstalling, & Upgrading Python Packages in Virtual Envs

December 25, 2019, 1:06 am

≫ Next: Andre Roberge: Xmas present from Thonny

≪ Previous: PyCoder’s Weekly: Issue #400 (Dec. 24, 2019)

The post Pipx: Installing, Uninstalling, & Upgrading Python Packages in Virtual Envs appeared first on Erik Marsja.

In this post, we will learn how to use pipx. Specifically, we will learn how to use pipx to install Python packages. We will learn how to install pipx, use pipx to install packages, how to run Python packages from a temporary environment, how to uninstall packages, and upgrade packages using pipx.

What is Pipx?

Pipx is a Python package management tool much like pip. This tool, on the other hand, enables us to install and run Python packages. What is really neat with pipx is that it installs the Python packages in an isolated virtual environment (see also pipenv). It will also let us run the Python packages without activating the environment. This means that we can install multiple versions of a Python package and still have access to it. Moreover, pipx enables us to run a Python package from a temporary environment.

How to Install Pipx

Now, we need to have Python 3.6+, pip and venv installed before installing pipx. To install pipx we just run pip: pip install –user pipx

Learn More about Installing Python Packages with Pip

How to Install a Python Package Using Pipx

In this section, we are going to learn how to install Python packages using pipx. More specifically, we are going to install the latest version of the command line tool dadjokes-cli.

Installing a Python Package with Pipx

Now, we are ready to install a package with pipx. It’s very easy and we just type the following command:

pipx install
dadjokes-cli

Now, when we have installed dadjokes we can run it from the terminal, or command prompt (windows). If we are running a Windows Machine we type the following in the command prompt.

dadjoke.exe

Note, if we were running Linux, for example, we’d, of course, skip the “.exe”-part.

Pipx list: See all Installed Python Packages

Now, if we need to see all the installed Python packages (i.e., installed with pipx) we can type pipx list.

This will give us a list of of all the Python packages we have installed using pipx.

Using Pipx to Run a Python App

Now, we cannot use pipx to run the dadjokes-cli app but, as can be seen in the example from their homepage, we can run a Python app. If we use the same example and run pycowsay: pipx run pycowsay.

Using pipx to run a Python app

How to Uninstall Python Packages with Pipx

Now, if we want to uninstall a Python package we have installed using Pipx we can just type pipx uninstall dadjokes-cli.

Furthermore, it is also possible to uninstall all packages by typing pipx uninstall-all.

Upgrading Python Packages with Pipx

Finally, we are going to learn how to upgrade packages we have installed using pipx. This is very simple. If we want to upgrade dadjokes-cli we can run the following command:

pipx upgrade
dadjokes-cli

Of course, if we want to upgrade all Python packages we have installed we can just run pipx upgrade-all.

Conclusion

In this post, we have learned how to install Python packages using pipx. This tool makes it possible for us to install packages in isolated virtual environments. This way, our system may not break down as if we installed the Python packages using pip. Furthermore, we have also learned how to uninstall, and upgrade packages using pipx.

The post Pipx: Installing, Uninstalling, & Upgrading Python Packages in Virtual Envs appeared first on Erik Marsja.

↧

Andre Roberge: Xmas present from Thonny

December 25, 2019, 4:29 am

≫ Next: Artem Rys: Consider absl Python library to work with flags

≪ Previous: Erik Marsja: Pipx: Installing, Uninstalling, & Upgrading Python Packages in Virtual Envs

Today, a new version (3.2.5) of Thonny has been released. It incorporates support for Friendly-traceback (which needs to be installed separately). Currently, the download link on Thonny's homepage still links to version 3.2.4. The latest version can be found on Github.

Thonny is a fantastic IDE for beginners, especially those learning in a classroom environment, as it offers many useful tools that can be used effectively by teachers to demonstrate some programming concepts. Thonny is the work of Aivar Annamaa, who is apparently recognized as an excellent lecturer -- which does not suprise me given the thoughtful design of Thonny. He has been interviewed about Thonny on PythonPodcast.

Real Python has a brief course explaining how to use Thonny . While I understand the need for hidden paid content on Realy Python, I do find it unfortunate that most of the content about Thonny, a free IDE created by someone having no relationship to Real Python, is only available for those that pay a subscription. As I do not, I cannot vouch for the accuracy of the information given by Real Python about Thonny.

↧

Artem Rys: Consider absl Python library to work with flags

December 25, 2019, 5:38 am

≫ Next: John Cook: Calculating the period of Van der Pol oscillators

≪ Previous: Andre Roberge: Xmas present from Thonny

Consider using absl Python library to work with flags in your application

Continue reading on python4you »

↧

John Cook: Calculating the period of Van der Pol oscillators

December 26, 2019, 4:35 am

≫ Next: Stack Abuse: Heap Sort in Python

≪ Previous: Artem Rys: Consider absl Python library to work with flags

A few days ago I wrote about how to solve differential equations with SciPy’s ivp_solve function using Van der Pol’s equation as the example. Van der Pol’s equation is

${d^2x \over dt^2}-\mu(1-x^2){dx \over dt}+x= 0$

The parameter μ controls the amount of nonlinear damping. For any initial condition, the solution approach a periodic solution. The limiting periodic function does not depend on the initial condition [1] but does depend on μ. Here are the plots for μ = 0, 1, and 2 from the previous post.

Van der Pol oscillator solutions as a function of time

A couple questions come to mind. First, how quickly do the solutions become periodic? Second, how does the period depend on μ? To address these questions, we’ll use an optional argument to ivp_solve we didn’t need in the earlier post.

Using events in ivp_solve

For ivp_solve an event is a function of the time t and the solution y whose roots the solver will report. To determine the period, we’ll look at where the solution is zero; our event function is trivial since we want to find the roots of the solution itself.

Recall from the earlier post that we cast our second order ODE as a pair of first order ODEs, and so our solution is a vector, the function x and its derivative. So to find roots of the solution, we look at what the solver sees as the first component of the solver. So here’s our event function:

    def root(t, y): return y[0]

Let’s set μ = 2 and find the zeros of the solution over the interval [0, 40], starting from the initial condition x(0) = 1, x‘(0) = 0.

    mu = 2
    sol = solve_ivp(vdp, [0, 40], [1, 0], events=root)
    zeros = sol.t_events[0]

Here we reuse the vdp function from the previous post about the Van der Pol oscillator.

To estimate the period of the limit cycle we look at the spacing between zeros, and how that spacing is changing.

    spacing = zeros[1:] - zeros[:-1]
    deltas = spacing[1:] - spacing[:-1]

If we plot the deltas we see that the zero spacings quickly approach a constant value, the period of the limit cycle.

Van der pol period deltas

Theoretical results

If μ = 0 the Van der Pol oscillator reduces to a simple harmonic oscillator and the period is 2π. As μ increases, the period increases.

For relatively small μ we can calculate the period as above, but as μ increases this becomes more difficult numerically [2]. But one can easily show that the period is asymptotically

T ~ (3 – 2 log 2) μ

as μ goes to infinity. A more refined estimate due to Mary Cartwright is

T ~ (3 – 2 log 2) μ + 2π/μ^1/3

for large μ.

[1] There is a trivial solution, x = 0, corresponding to the initial conditions x(0) = x‘(0) = 0. Otherwise, every set of initial conditions leads to a solution that converges to the periodic attractor.

[2] To see why large values of μ are a problem numerically, here’s a plot of a solution for μ = 100.

Solution to Van der Pol for large damping parameter mu

The solution is differentiable everywhere, but the derivative changes so abruptly at the maxima and minima that it is discontinuous for practical purposes.

↧

Stack Abuse: Heap Sort in Python

December 26, 2019, 5:03 am

≫ Next: Python Data: Market Basket Analysis with Python and Pandas

≪ Previous: John Cook: Calculating the period of Van der Pol oscillators

Introduction

Heap Sort is another example of an efficient sorting algorithm. Its main advantage is that it has a great worst-case runtime of O(n*logn) regardless of the input data.

As the name suggests, Heap Sort relies heavily on the heap data structure - a common implementation of a Priority Queue.

Without a doubt, Heap Sort is one of the simplest sorting algorithms to implement and coupled with the fact that it's a fairly efficient algorithm compared to other simple implementations, it's a common one to encounter.

Heap Sort

Heap Sort works by "removing" elements from the heap part of the array one-by-one and adding them to the sorted part of the array. Before we get further into the explanation and revisit the heap data structure, we should mention a few attributes of Heap Sort itself.

It is an in-place algorithm, meaning that it requires a constant amount of additional memory, i.e. the memory needed doesn't depend on the size of the initial array itself, other than the memory needed to store that array.

For example, no copies of the original array are necessary, and there is no recursion and recursive call stacks. The simplest implementation of Heap Sort usually uses a second array to store the sorted values. We will be using this approach since it's a lot more intuitive and easy to follow in code, but it can be implemented completely in-place.

Heap Sort is unstable, meaning that it does not maintain the relative order of elements with equal values. This isn't an issue with primitive types (like integers and characters...) but it can be a problem when we sort complex types, like objects.

For example, imagine we have a custom class Person with the age and name fields, and several objects of that class in an array, including a person called "Mike" aged 19 and "David", also aged 19 - appearing in that order.

If we decided to sort that array of people by age, there would be no guarantee that "Mike" would appear before "David" in the sorted array, even though they appeared in that order in the initial array. It can happen, but it's not guaranteed.

Fun fact: Heap Sort is the sorting algorithm of choice in the Linux Kernel

The Heap Data Structure

Heaps are one of the most popular and heavily used data structures in computer science - not to mention very popular during Software Engineering interviews.

We'll talk of heaps keeping track of the smallest element (min-heap), but they can just as easily be implemented to keep track of the largest element (max-heap).

Simply put, a min-heap is a tree-based data structure in which every node is smaller that all of its children. Most often a binary tree is used. Heaps have three supported operations - delete_minimum(), get_minimum(), and add().

You can only delete the first element in the heap, after which it's "re-sorted". Heaps "re-sort" themselves after an element is added or removed, so that the the smallest element is always in the first position.

Note: This in no way means that heaps are sorted arrays. The fact that every node is smaller than its children isn't enough to guarantee that the whole heap is in ascending order.

Let's look at an example of a heap:

alt

As we can see, the above example does fit the description of a heap but is not sorted. We won't go into details of the heap implementation since that is not the focus of this article. The crucial advantage of the heap data structure we leverage when using it in Heap Sort is that the next smallest element is always the first element in the heap.

Note: Thanks to the way heaps sort elements after an element is removed, the complexity of the next smallest element moving to the first position, while keeping the array a heap still, takes O(logn) time, which is a highly efficient operation.

Implementation

Sorting Arrays

Python provides methods for creating and using heaps so we don't have to implement them ourselves:

heappush(list, item): Adds an element to the heap, and re-sorts it afterward so that it remains a heap. Can be used on an empty list.
heappop(list): Pops (removes) the first (smallest) element and returns that element. The heap remains a heap after this operation, so we don't have to call heapify().
heapify(list): Turns the given list into a heap. It is worth noting that this method exists even though we won't be using this since we don't want to change our original array.

Now that we know this, the implementation for Heap Sort is fairly straight-forward:

from heapq import heappop, heappush

def heap_sort(array):
    heap = []
    for element in array:
        heappush(heap, element)

    ordered = []

    # While we have elements left in the heap
    while heap:
        ordered.append(heappop(heap))

    return ordered

array = [13, 21, 15, 5, 26, 4, 17, 18, 24, 2]
print(heap_sort(array))

Output:

[2, 4, 5, 13, 15, 17, 18, 21, 24, 26]

As we can see, the heavy lifting is done with the heap data structure, all we have to do is add all the elements we need and remove them one by one. It's almost like a coin counting machine that sorts the inputted coins by their value and we can take them out afterwards.

Sorting Custom Objects

Things get a little more complicated when using custom classes. Usually, we advise against overriding comparison operators in classes for the purpose of using our sorting algorithms for them, and instead suggest rewriting the algorithm so that it takes a lambda function comparator instead.

However, since our implementation relies on the built-in heap methods, we can't do that here.

Python does provide the following methods:

heapq.nlargest(*n*, *iterable*, *key=None*): Returns a list with the n largest elements from the dataset defined by iterable.
heapq.nsmallest(*n*, *iterable*, *key=None*): Returns a list with the n smallest elements from the dataset defined by iterable.

Which we could use to simply get n = len(array) largest/smallest elements but the methods themselves do not use Heap Sort and are essentially equivalent to just calling the sorted() method.

The only solution we have left for custom classes is to actually override the comparison operators. This sadly limits us to only one type of comparison per class. In our example it limits us to sorting Movie objects by year.

However, it does let us demonstrate using Heap Sort on custom classes. Let's go ahead and define the Movie class:

from heapq import heappop, heappush

class Movie:
    def __init__(self, title, year):
        self.title = title
        self.year = year

    def __str__(self):
        return str.format("Title: {}, Year: {}", self.title, self.year)

    def __lt__(self, other):
        return self.year < other.year

    def __gt__(self, other):
        return other.__lt__(self)

    def __eq__(self, other):
        return self.year == other.year

    def __ne__(self, other):
        return not self.__eq__(other)

And now, let's slightly modify our heap_sort() function:

def heap_sort(array):
    heap = []
    for element in array:
        heappush(heap, element)

    ordered = []

    while heap:
        ordered.append(heappop(heap))

    return ordered

And finally, let's instantiate a few movies, put them in an array, and then sort them:

movie1 = Movie("Citizen Kane", 1941)
movie2 = Movie("Back to the Future", 1985)
movie3 = Movie("Forrest Gump", 1994)
movie4 = Movie("The Silence of the Lambs", 1991);
movie5 = Movie("Gia", 1998)

array = [movie1, movie2, movie3, movie4, movie5]

for movie in heap_sort(array):
    print(movie)

Output:

Title: Citizen Kane, Year: 1941
Title: Back to the Future, Year: 1985
Title: The Silence of the Lambs, Year: 1991
Title: Forrest Gump, Year: 1994
Title: Gia, Year: 1998

Comparison to Other Sorting Algorithms

One of the main reasons Heap Sort is still used fairly often, even though it's often outperformed by a well-implemented Quick Sort, is its reliability.

Heap Sort's main advantage here are the O(n*logn) upper bound as far as time complexity is concerned, and security concerns. Linux kernel developers give the following reasoning to using Heap Sort over Quick Sort:

Sorting time of Heap Sort is O(n*logn) both on average and worst-case. While qsort is about 20% faster on average, it suffers from an exploitable O(n*n) worst-case behavior and extra memory requirements that make it less suitable for kernel use.

Furthermore, Quick Sort behaves poorly in predictable situations, and given enough knowledge of the internal implementation, it could create a security risk (mainly DDoS attacks) since the bad O(n²) behavior could easily be triggered.

Another algorithm that Heap Sort is often compared to is Merge Sort, which has the same time complexity.

Merge Sort has the advantage of being stable and intuitively parallelizable, while Heap Sort is neither.

Another note is that Heap Sort is slower than Merge Sort in most cases, even though they have the same complexity, since Heap Sort has larger constant factors.

Heap Sort can, however, be implemented much more easily in-place than Merge Sort can, so it's preferred when memory is a more important factor than speed.

Conclusion

As we saw, Heap Sort isn't as popular as other efficient, general-purpose algorithms but its predictable behavior (other than being unstable) make it a great algorithm to use where memory and security are more important than slightly faster run-time.

It's really intuitive to implement and leveraging the built-in functionality provided with Python, all we essentially have to do is put the items in a heap and take them out - similar to a coin counter.

↧

Python Data: Market Basket Analysis with Python and Pandas

December 26, 2019, 6:45 am

≫ Next: Codementor: Decision Tree: Knowing The Every Possible Output

≪ Previous: Stack Abuse: Heap Sort in Python

If you’ve ever worked with retail data, you’ll most likely have run across the need to perform some market basket analysis (also called Cross-Sell recommendations). If you aren’t sure what market basket analysis is, I’ve provided a quick overview below.

What is Market Basket Analysis?

In the simplest of terms, market basket analysis looks at retail sales data and determines what products are purchased together. For example, if you sell widgets and want to be able to recommend similar products and/or products that are purchased together, you can perform this type of analysis to be able to understand what products should be recommended when a user views a widget.

You can think of this type of analysis as generating the following ‘rules’:

If widget A, then recommend widget B, C and F
If widget L, then recommend widget X, Y and R

With these rules, you can then build our recommendation engines for your website, store and salespeople to use when selling products to customers. Market Basket Analysis requires a large amount of transaction data to work well. If you have a large amount of transactional data, you should be able to run a market basket analysis with ease. if you want to learn more about Market Basket Analysis, here’s some additional reading.

In the remainder of this article, I show you how to do this type of analysis using python and pandas.

Market Basket Analysis with Python and Pandas

There are a few approaches that you can take for this type of analysis. You can use a pre-built library like MLxtend or you can build your own algorithm. I prefer the MLxtend library myself, but recently there’s been some memory issues using pandas and large datasets with MLxtend, so there have been times that I’ve needed to roll my own.

Below, I provide an example of using MLxtend as well as an example of how to roll your own analysis.

Market Basket Analysis with MLxtend

For this example, we’ll use the data set found here. This data-set contains enough data to be useful in understanding market basket analysis but isn’t too large that we can’t use MLxtend (because we can’t unstack the data, which is required to use MLxtend ).

To get started, you’ll need to have pandas and MLxtend installed:

pip install pandas mlxtend

Then, import your libraries:

import pandas as pd

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

Now, lets read in the data and then drop any rows that don’t have an invoice number. Lastly, we’ll convert the InvoiceNo column to a string. NOTE: I downloaded the data file from here and stored it in a subdirectory named data.

df = pd.read_excel('data/Online Retail.xlsx')
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')

In this data, there are some invoices that are ‘credits’ instead of ‘debits’ so we want to remove those. They are indentified with “C” in the InvoiceNo field. We can see an example of these types of invoices with the following:

df[df.InvoiceNo.str.contains('C', na=False)].head()

To remove these credit invoices, we can find all invoices with ‘C’ in them, and take the inverse of the results. That can be accomplished with the following line of code:

df = df[~df['InvoiceNo'].str.contains('C')]

Now, we are ready to start our market basket analysis. First, we’ll groupby the columns that we want to consider. For the purposes of this analysis, we’ll only look at the United Kingdom orders.

market_basket = df[df['Country'] =="United Kingdom"].groupby(
                ['InvoiceNo', 'Description'])['Quantity']

Next, we want to hot encode the data and get 1 transaction per row to prepare to run our mlxtend analysis.

market_basket = market_basket.sum().unstack().reset_index().fillna(0).set_index('InvoiceNo')

Let’s take a look at the output:

market_basket.head()

market basket analysis example

Looks like a bunch of zeros. What good is that? Well…its exactly what we want to see. We’ve encoded our data to show when a product is sold with another product. If there is a zero, that means those products haven’t sold together. Before we continue, we want to convert all of our numbers to either a 1 or a 0 (negative numbers are converted to zero, positive numbers are converted to 1). We can do this encoding step with the following function:

def encode_data(datapoint):
    if datapoint <= 0:
        return 0
    if datapoint >= 1:
        return 1

And now, we do our final encoding step:

market_basket = market_basket.applymap(encode_data)

Now, lets find out which items are frequently purchased together. We do this by applying the mlxtend apriori fuuinction to our dataset.

There one thing we need to think about first. the apriori function requires us to provide a minimum level of ‘support’. Support is defined as the percentage of time that an itemset appears in the dataset. If you set support = 50%, you’ll only get itemsets that appear 50% of the time. I like to set support to around 5% when starting out to be able to see some data/results and then adjust from there. Setting the support level to high could lead to very few (or no) results and setting it too low could require an enormous amount of memory to process the data.

In the case of this data, I originally set the min_support to 0.05 but didn’t receive any results, so I changed it to 0.03.

itemsets = apriori(market_basket, min_support=0.03, use_colnames=True)

The final step is to build your association rules using the mxltend association_rules function. You can set the metric that you are most interested in (either lift or confidence and set the minimum threshold for the condfidence level (called min_threshold). The min_threshold can be thought of as the level of confidence percentage that you want to return. For example, if you set min_threshold to 1, you will only see rules with 100% confidence. I usually set this to 0.7 to start with.

rules = association_rules(itemsets, metric="lift", min_threshold=0.5)

With this, we generate 16 rules for our market basket analysis.

MLxtend rules for market basket analysis

This gives us a good number of data points to look at for this analysis. Now, what does this tell us?

If you look in the antecedents column and the consequents column, you’ll see names of products. Each rule tells us that the antecedents is sold along with the consequents. You can use this information to build a cross-sell recommendation system that promotes these products with each other on your website (or in person when doing in-person sales).

Without knowing much more about the business that generated this data, we can’t really do much more with it. If you were using your own data, you’d be able to dig a bit deeper to find those rules with higher confidence and/or lift to help you understand the items that are sold together most often and start building strategies to promote those items (or other items if you are trying to grow sales in other areas of your business).

When can you not use MLxtend?

MLxtend can be used anytime you want and it is my preferred approach for market basket analysis. That said, there’s an issue (as of the date of this article) with using pandas with large datasets when performing the step of unstacking the data with this line:

market_basket = market_basket.sum().unstack().reset_index().fillna(0).set_index('InvoiceNo')

You can see the issue here.

When you run across this issue, you’ll need to find an approach to running a market basket analysis. You can probably find ways to work around the pandas unstack problem, but what I’ve done recently is just roll my own analysis (its actually pretty simple to do). That’s what I’ll show you below.

To get started, we need to import a few more libraries:

from itertools import combinations, groupby
from collections import Counter

Let’s use our original dataframe and assign it to a new df so we know we are working with a completely new data-set vs the above. We’ll use the same United Kingdom filter that we did before

df_manual = df[df['Country'] =="United Kingdom"]

Now, lets grab just the order data. For this,we’ll get the InvoiceNo and StockCode columns since all we care about is whether an item exists on an invoice. Remember, we’ve already removed the ‘credit’ invoices in the above steps so all we have are regular invoices. NOTE: There *will* be differences in the output of this approach vs MLxtend’s approach just like there will be differences in other approaches you might use for market basket analysis.

orders = df_manual.set_index('InvoiceNo')['StockCode']

Now that we have a pandas series of Items, Let’s calculate the item frequency and support values.

statistics = orders.value_counts().to_frame("frequency")
statistics['support']  = statistics / len(set(orders.index)) * 100

Let’s filter out any rows of data that doesn’t have support above our min_support level

min_support=0.03 # same value we used above.

items_above_support = statistics[statistics['support'] >= min_support].index
orders_above_support = orders[orders.isin(items_above_support)]

We next need to filter out orders that only had 1 items ordered on the invoice, since those items won’t provide any insight into our market basket analysis.

order_counts = orders.index.value_counts()
orders_over_two_index = order_counts[order_counts>=2].index
orders_over_two = orders[orders.index.isin(orders_over_two_index)]

Now, let’s calculate our stats dataframe again with this new order data-set.

statistics = orders_over_two.value_counts().to_frame("frequency")
statistics['support']  = statistics / len(set(orders_over_two.index)) * 100

Time to do the fun stuff. Calculating the itemsets / item pairs. We’ll create a function that will generate our itemsets and then send our new order dataset through the generator. Then, we calculate the frequency of each item with each other (named frequencyAC) as well as the support (named supportAC). Finally, we filter out the itemsets that are below our min_support level

def itemset_generator(orders):
    orders = orders.reset_index().values
    for order_id, order_object in groupby(orders, lambda x: x[0]):
        item_list = [item[1] for item in order_object]
        for item_pair in combinations(item_list, 2):
            yield item_pair

itemsets_gen = itemset_generator(orders_over_two)
itemsets  = pd.Series(Counter(itemsets_gen)).to_frame("frequencyAC")
itemsets['supportAC'] = itemsets['frequencyAC'] / len(orders_over_two_index) * 100
itemsets = itemsets[itemsets['supportAC'] >= min_support]

Finally, we can calculate our association rules. First, let’s unstack our itemsets and create the necessary data columns for support, lift, etc.

# Create table of association rules and compute relevant metrics
itemsets = itemsets.reset_index().rename(columns={'level_0': 'antecedents', 'level_1': 'consequents'})

itemsets = (itemsets
     .merge(statistics.rename(columns={'freq': 'freqA', 'support': 'antecedent support'}), left_on='antecedents', right_index=True)
     .merge(statistics.rename(columns={'freq': 'freqC', 'support': 'consequents support'}), left_on='consequents', right_index=True))


itemsets['confidenceAtoC'] = itemsets['supportAC'] / itemsets['antecedent support']
itemsets['confidenceCtoA'] = itemsets['supportAC'] / itemsets['consequents support']
itemsets['lift'] = itemsets['supportAC'] / (itemsets['antecedent support'] * itemsets['consequents support'])

itemsets=itemsets[['antecedents', 'consequents','antecedent support', 'consequents support', 'confidenceAtoC','lift']]

Finally, let’s look at our final rules. We want to look at only those items that have confidence > 0.5.

rules = itemsets
rules_over_50 = rules[(rules.confidenceAtoC >0.50)]
rules_over_50.set_index('antecedents',inplace=True)
rules_over_50.reset_index(inplace=True)
rules_over_50=rules_over_50.sort_values('lift', ascending=False)

Looking at the rules_over_50 data, we see our final set of rules using our ‘roll your own’ approach.

final rules for market basket

These rules are going to be a bit different than what we get with MLxtend, but that’s OK as it gives us another set of data to look at – and the only set of data to look at when your data is too large to use MLxtend. One extension to this approach would be to add in a step to replace the stockcode numbers with the item descriptions. I’ll leave it to you to do that work.

The post Market Basket Analysis with Python and Pandas appeared first on Python Data.

↧

Codementor: Decision Tree: Knowing The Every Possible Output

December 26, 2019, 8:13 am

≫ Next: Codementor: Python For Finance(Beginner): See Behind the FX rate

≪ Previous: Python Data: Market Basket Analysis with Python and Pandas

↧

Codementor: Python For Finance(Beginner): See Behind the FX rate

December 26, 2019, 1:59 pm

≫ Next: Python Data: Python Data Weekly Roundup – Dec 27 2019

≪ Previous: Codementor: Decision Tree: Knowing The Every Possible Output

Beginner Python Financial Project

↧

Python Data: Python Data Weekly Roundup – Dec 27 2019

December 27, 2019, 9:00 am

≫ Next: Talk Python to Me: #244 Top 10 Real Python Articles of 2019

≪ Previous: Codementor: Python For Finance(Beginner): See Behind the FX rate

In this week’s Python Data Weekly Roundup:

Picks On AI Trends from Data Natives 2019

This article provides a good overview of the Data Natives 2019 – Europe meeting and the main trends being discussed for 2020 and beyond. For example, topics such as “AI and its use in Healthcare” and “AI and Ethics” looked like good talks.

Ray for the Curious

An excellent review of “Ray”, a distributed computing system for python. Ray is:

is an open-source system for scaling Python applications from single machines to large clusters. Its design is driven by the unique needs of next-generation ML/AI systems, which face several unique challenges, including diverse computational patterns, management of distributed, evolving state, and the desire to address all those needs with minimal programming effort.

Develop an Intuition for Severely Skewed Class Distributions

As always, Jason Brownlee does a great job explaining to begin to build an intuition for identifying imbalanced and skewed distributions – and how to handle / manage those distributions. One of the most difficult things to do in data science / machine learning is to understand and manage data with different distributions. You can’t always apply a model to a data-set because the distribution of said data makes that model invalid.

Scatter Plot of Binary Classification Dataset With A 1 to 10 Class Distribution

Scatter Plot of Binary Classification Dataset With A 1 to 10 Class Distribution – from here.

Seven differences between academia and industry for building machine learning and deep learning models

It should be no surprise that academia and industry approach data science and machine learning differently. In this article, some differences are described- they include: Accuracy, Training vs Production, Engineering focus (e.g., end-to-end pipeline development) and more.

Hidden Technical Debt in Machine Learning Systems — PDF

A very good paper describing the challenges of technical debt with machine learning systems. The abstract:

Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.

Market Basket Analysis with Python and Pandas

A recent post I wrote describing how to perform market basket analysis using python and pandas. I provide a walk-through of using MLxtend’s apriori function as well as a ‘roll your own’ approach to market basket analysis.

The post Python Data Weekly Roundup – Dec 27 2019 appeared first on Python Data.

↧

Talk Python to Me: #244 Top 10 Real Python Articles of 2019

December 27, 2019, 12:00 am

≫ Next: Weekly Python StackOverflow Report: (ccviii) stackoverflow python report

≪ Previous: Python Data: Python Data Weekly Roundup – Dec 27 2019

We've come to the end of 2019. Python 2 has just a handful of days before it goes unsupported. And I've met up with Dan Bader from RealPython.com to look back at the year of Python articles on his website. We dive into the details behind 10 of his most important articles from the past year.

↧

Weekly Python StackOverflow Report: (ccviii) stackoverflow python report

December 28, 2019, 5:35 am

≫ Next: Armin Ronacher: Open Source Migrates With Emotional Distress

≪ Previous: Talk Python to Me: #244 Top 10 Real Python Articles of 2019

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-12-28 13:35:33 GMT

↧

Armin Ronacher: Open Source Migrates With Emotional Distress

December 27, 2019, 4:00 pm

≫ Next: Catalin George Festila: Python 3.7.5 : Fix to python language the GitHub project.

≪ Previous: Weekly Python StackOverflow Report: (ccviii) stackoverflow python report

Legacy code is bad and if you keep using it, it's really your own fault. There are many variations of the same thing floating around in Open Source communities and it always comes down to the same thing: at one point something is being declared old and it has to be replaced by something newer which is better. That better typically has some really good arguments on its side: we learned from our mistakes, it was wrong to begin with or something along the lines of it being impure or that it propagated bad ideas. Maybe that new thing only supports the newest TLS/SSL and you really should not longer be using the old versions because they are insecure.

Some communities as a whole for instance are suffering from this a whole lot. Every few years a library or the entire ecosystem of that community is thrown away and replaced by something new and support for the old one ends abruptly and arbitrarily. This has happened to the packaging ecosystem, the interpreter itself, modules in the standard library etc. How well this works out depends. Zope for instance never really recovered from it's Zope 2 / Zope 3 split. Perl didn't manage it's 5 / 6 split either. Both of those projects ended up with two communities as a result.

Many open source communities behave exactly the same way: they are replacing something with something else without a clear migration path. However some communities manage to survive some transitions like this.

This largely works because the way open source communities are managing migrations is by cheating and the currency of payment is emotional distress. Since typically money is not involved (at least not in the sense that a user would pay for the product directly) there is no obvious monetary impact of people not migrating. So if you cause friction in the migration process it won't hurt you as a library maintainer. If anything the churn of some users might actually be better in the long run because the ones that don't migrate are likely also some of the ones that are the most annoying in the issue tracker. In fact Open Source ecosystems manage these migrations largely by trading their general clout for support of a large part of their user base to become proponents for a migration to the updated ecosystems. Open Source projects nowadays often measure their popularity through some package download counts, Github stars or other indicators. All of these are trending upwards generally and it takes a really long time for projects to lose traction because all the users count against it, even the ones that are migrating off frustratedly.

The cheat is to convince the community as a whole that the migration is very much worth it. However the under-delivery to what is promised then sets up the community for another one of these experiences later. I have seen how GTK migrated from 1, to 2 and then later to 3. At any point it was painful and when most apps finally were on the same version, the next big breaking change was coming up.

Since the migration causes a lot of emotional distress, the cheat is carried happily by the entire community. The big Python 3 migration is a good example of this: A lot of users of the language started a community effort to force participants in the ecosystem to migrate. Suffering together does not feel as bad, and putting yourself on the moral right side (the one that migrates vs the ones that are holding off) helps even more. That Python 3 effort was less based on reasonable arguments but on emotions. While the core of the argument was correct and a lot of stuff was better on Python 3, it took many iterations not to regress in many other aspects. Yet websites were started like a big "wall of shame" for libraries that did not undergo the migration yet. The community is very good at pushing through even the most controversial of changes. This tour de force then became something of a defining characteristic of the community.

A big reason why this all happens in the first place is because as an Open Source maintainer the standard response which works against almost all forms of criticism is “I'm not paid for this and I no longer want to maintain the old version of X”. And in fact this is a pretty good argument because it's both true, and very few projects actually are large enough that a fork by some third party would actually survive. Python for instance currently has a fork of 2.7 called Tauthon which got very little traction.

There are projects which are clearly managing such forceful transitions, but I think what is often forgotten is that with that transition many people love the community who do not want to participate in it or can't. Very often a backwards incompatible replacement without clear migration might be able to guide the majority of people but they will lose out on many on the fringes and those people might be worthwhile investment into the future. For a start such a reckless deprecation path will likely alienate commercial users. That might be fine for a project (since many are non profit efforts in the first place) and very successful projects will likely still retain a lot of commercial users but with that user base reduced there will be reduced investments by those too.

I honestly believe a lot of Open Source projects would have an easier time existing if they would acknowledge that these painful migrations are painful for everybody involved. Writing a new version that fixes all known issues might be fun for a developer in the first place, but if they then need to spend their mental and emotional capacity to convince their user base that migrating is worth the effort it takes out all the enjoyment in the process. I have been a part of the Python 3 migration and I can tell you that it sucked out all my enjoyment of being a part of that community. No matter on which side you were during that migration I heard very little positive about that experience.

Setting good migration paths rewards you and there are many projects to learn from for how to manage this. It's lovely as a user to be able to upgrade to a new version of a project and the upgrade is smooth. Not only that, it also encourages me as a user to give back valuable contributions because there is a high chance that I can use it without having to be afraid that upgrading is going to break all my stuff.

It's also important to realize that many projects outside the Open Source world just do not have the luxury to break backwards compatibility this easily. Especially when you work in an environment where hundreds of systems have to be interoperable migrations are really hard and you sometimes have to make decisions which seem bad. The open source community was much quicker in dropping support for older TLS standards than many others because they did not have to live with the consequences of that change really as they force everybody to upgrade. That's just not always possible for everybody else at the speeds envisioned.

I'm writing this because we're a few days away from the end of life of Python 2 at which point the community is also going to stop maintaining a lot of valuable tools like pytest, pip[1] and others for Python 2. Yet the user base of the language has only migrated to ~50%. My own libraries which are now maintained by the pallets community are joining in on this something I can understand but don't agree with. I really wish the Python community all the best but I hope that someone does a post-mortem on all of this, because there are lots of things to be learned from all of this.

[1]	it has correctly been pointed out that pip is not deprecating Python 2 support any time soon.

↧

Catalin George Festila: Python 3.7.5 : Fix to python language the GitHub project.

December 28, 2019, 1:31 am

≫ Next: Erik Marsja: How to use Pandas get_dummies to Create Dummy Variables in Python

≪ Previous: Armin Ronacher: Open Source Migrates With Emotional Distress

I created a GitHub project with Django and I saw is detect like tcl programming language: You need to create a file named .gitattributes in the root folder of my repository. Use this source code to tell GitHub is a python project: * linguist-vendored *.py linguist-vendored=false Now the project will be target with python language.

↧

Erik Marsja: How to use Pandas get_dummies to Create Dummy Variables in Python

December 29, 2019, 9:36 am

≫ Next: Mike Driscoll: PyDev of the Week: Saul Pwanson

≪ Previous: Catalin George Festila: Python 3.7.5 : Fix to python language the GitHub project.

The post How to use Pandas get_dummies to Create Dummy Variables in Python appeared first on Erik Marsja.

In this post, we will learn how to use Pandas get_dummies() method to create dummy variables in Python. Dummy variables (or binary/indicator variables) are often used in statistical analyses as well as in more simple descriptive statistics.

Dummy Coding for Regression Analysis

One statistical analysis in which we may need to create dummy variables in regression analysis. In fact, regression analysis requires numerical variables and this means that when we, whether doing research or just analyzing data, wishes to include a categorical variable in a regression model, supplementary steps are required to make the results interpretable.

Dummy Coded Variables in Python - 3 variables

Three dummy coded variables in Python

In these steps, categorical variables in the data set are recoded into a set of separate binary variables (dummy variables). Furthermore, this re-coding is called “dummy coding” and involves the creation of a table called contrast matrix. Dummy coding can be done automatically by statistical software, such as R, SPSS, or Python.

What is Categorical Data?

In this section, of the creating dummy variables in Python guide, we are going to answer the question about what categorical data is. Now, in statistics, a categorical variable (also known as factor or qualitative variable) is a variable that takes on one of a limited, and most commonly a fixed number of possible values. Furthermore, these variables are typically assigning each individual, or another unit of observation, to a particular group or nominal category. For example, gender is a categorical variable.

What is a Dummy Variable?

Now, the next question we are going to answer before working with Pandas get_dummies, is “what is a dummy variable?”. Typically, a dummy variable (or column) is one which has a value of one (1) when a categorical event occurs (e.g., an individual is male) and zero (0) when it doesn’t occur (e.g., an individual is female). Now, the next question we are going to answer before working with Pandas get_dummies, is “what is a dummy variable?”. Typically, a dummy variable (or column) is one which has a value of one (1) when a categorical event occurs (e.g., an individual is male) and zero (0) when it doesn’t occur (e.g., an individual is female).

Installing Pandas

Obviously, we need to have Pandas installed to use the get_dummies() method. Pandas can be installed using pip or conda, for instance. If we want to install Pandas using condas we type conda install pandas. On the other hand, if we want to use pip, we type pip install pandas. Note, it is typically suggested that Python packages are installed in virtual environments. Pipx can be used to install Python packages directly in virtual environments and if we want to install, update, and use Python packages we can, as in this post, use conda or pip.

Example Data to Dummy Code

In this Pandas dummies tutorial, we will use the Salaries dataset, which contains the 2008-09 nine-month academic salary for Assistant Professors, Associate Professors, and Professors in a college in the U.S.

Import Data in Python using Pandas

Now, before we start using Pandas get_dummies() method, we need to load pandas and import the data.

import pandas as pd

data_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/Salaries.csv'
df = pd.read_csv(data_url, index_col=0)

df.head()

Pandas Dataframe to dummy code

Of course, data can be stored in multiple different file types. For instance, we could have our data stored in .xlsx, SPSS, SAS, or STATA files. See the following tutorials to learn more about importing data from different file types:

Creating Dummy Variables in Python

In this section, we are going to use pandas get_dummies to generate dummy variables in Python. First, we are going to work with the categorical variable “sex”. That is, we will start with dummy coding in Python with a categorical variable with two levels.

Second, we are going to generate dummy variables in Python with the variable “rank”. That is, in that dummy coding example we are going to work with a factor variable with three levels.

How to Make Dummy Variables in Python with Two Levels

In this section, we are going to create a dummy variable in Python using pandas get_dummies method. Specifically, we will generate dummy variables for a categorical variable with two levels (i.e., male and female).

In this create dummy variables in Python post, we are going to work with Pandas get_dummies(). As can be seen, in the image above we can change the prefix of our dummy variables, and specify which columns that contain our categorical variables.

First Dummy Coding in Python Example:

In the first Python dummy coding example below, we are using Pandas get_dummies to make dummy variables. Note, we are using a series as data and, thus, get two new columns named Female and Male.

 pd.get_dummies(df[‘sex’]).head()

Two New Columns with Dummy Variables Created in Python

Female and Male dummy coded columns

In the code, above, we also printed the first 5 rows (using Pandas head()). We will now continue and use the columns argument. Here we input a list with the column(s) we want to create dummy variables from. Furthermore, we will create the new Pandas dataframe containing our new two columns.

More Python Dummy Coding Examples:

df_dummies = pd.get_dummies(df, columns=['sex'])
df_dummies.head()code>

Dummy variables in Python with underscore and prefix

Resulting dataframe with dummy coded columns

In the output (using Pandas head()), we can see that Pandas get_dummies automatically added “sex” as prefix and underscore as prefix separator. If we, however, want to change the prefix as well as the prefix separator we can add these arguments to Pandas get_dummies():

 df_dummies = pd.get_dummies(df, prefix='Gender', prefix_sep='.', 
                            columns=['sex'])
df_dummies.head()

Indicator Variables in Python - separator for prefix changed

Gender, instead of sex, as the prefix for the dummy columns.

Remove Prefix and Separator from Dummy Columns

In the next Pandas dummies example code, we are going to make dummy variables in Python but we will set the prefix and the prefix_sep arguments so that we the column name will be the factor levels (categories):

 df_dummies = pd.get_dummies(df, prefix=, prefix_sep='', 
                            columns=['sex'])
df_dummies.head()

How to Create Dummy Variables in Python with Three Levels

In this section, of the dummy coding in Python tutorial, we are going to work with the variable “rank”. That is, we will create dummy variables in Python from a categorical variable with three levels (or 3 factor levels). In the first dummy variable in Python code example below, we are working with Pandas get_dummies() the same way as we did in the first example.

 pd.get_dummies(df['rank']).head()

That is, we put in a Pandas Series (i.e., the column with the variable) as the only argument and then we only got a new dataframe with 3 columns (i.e., for the 3 levels).

Create a Dataframe with Dummy Coded Variables

Of course, we want to have the dummy variables in a dataframe with the data. Again, we do this by using the columns argument and a list with the column that we want to use:

 df_dummies = pd.get_dummies(df, columns=['rank'])
df_dummies.head()

In the image above, we can see that Pandas get_dummies() added “rank” as prefix and underscore as prefix separator. Next, we are going to change the prefix and the separator to “Rank” (uppercase) and “.” (dot).

df_dummies = pd.get_dummies(df, prefix='Rank', prefix_sep='.', 
                            columns=['rank'])
df_dummies.head()code>

Now, we may not need to have a prefix or a separator and, as in the previous Pandas create dummy variables in Python example, want to remove these. To accomplish this, we just add empty strings to the prefix and prefix_sep arguments:

df_dummies = pd.get_dummies(df, prefix='', prefix_sep='', 
                            columns=['rank'])code>

Creating Dummy Variables in Python for Many Columns

In the final Pandas dummies example, we are going to dummy code two columns. Specifically, we are going to add a list with two categorical variables and get 5 new columns that are dummy coded. This is, in fact, very easy and we can follow the example code from above:

Creating Multiple Dummy Variables Example Code:

 df_dummies = pd.get_dummies(df, prefix='', prefix_sep='', 
                            columns=['rank', 'sex'])
df_dummies.head()

Finally, if we want to add more columns, to create dummy variables from, we can add that to the list we add as a parameter to the columns argument.

Conclusion: Dummy Coding in Python

In this post, we have learned how to do dummy coding in Python using Pandas get_dummies() method. More specifically, we have worked with categorical data with two levels, and categorical data with three levels. Furthermore, we have learned how to add and remove prefixes from the new columns created in the dataframe.

The post How to use Pandas get_dummies to Create Dummy Variables in Python appeared first on Erik Marsja.

↧

Mike Driscoll: PyDev of the Week: Saul Pwanson

December 29, 2019, 10:05 pm

≫ Next: Real Python: Python Timer Functions: Three Ways to Monitor Your Code

≪ Previous: Erik Marsja: How to use Pandas get_dummies to Create Dummy Variables in Python

This week we welcome Saul Pwanson (@saulfp) as our PyDev of the Week! Saul is the creator of VisiData, an interactive multitool for tabular data. If you’d like to see what Saul has been up to, then you should check out his website or his Github profile. You can also support Saul’s open source endeavors on Patreon. Let’s take a few moments to get to know Saul better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in Chicagoland in the 80s, was on BBSes in the early 90s, and IRC in college and thereafter. I’ve been once to the Recurse Center in New York, twice to Holland, and six times to Bruno’s in Gerlach, NV. I like crossword puzzles, board games, and point-and-click adventures. One day I’d like to finish my “board simulation” of the awe-inspiring mechanics inside mitochondria.

Why did you start using Python

It was for a job at a startup back in 2004. It’s really great as a scripting language, and the standard library makes most common things easy by itself, with the rest of the ecosystem providing not just one but usually about 4 different ways of doing any task, often including one that works really well. I tip my hat to all the unsung developers of Python libraries who make interfaces to other systems that *just work*. VisiData supports so many data formats simply because the richness of the Python ecosystem makes it easy.

What other programming languages do you know and which is your favorite?

I did a lot of x86 assembly as a teenager in my BBS days, and started using both C and C++ in college. I still use C on a daily basis doing embedded development for my day job. I haven’t used C++ for about 10 years, which means I’m way out of date on it now.

My favorite language, though, is an older language called Forth, which is a brilliant little system and gets you the most bang for your buck in highly constrained environments. (We’re talking kilobytes and megahertz, orders of magnitude fewer resources than most software could even dream of fitting their runtime into). The esssence of Forth is incredibly elegant, with the implementation setting things up “just so” and then everything falls into place naturally by design, with very little actual code.

Programming in Forth has encouraged me to think in very clean ways about my own code in other languages. Often if you’re looking at the VisiData source code, a particular bit of code may seem devastatingly simple and turn out to be subtly and amazingly powerful, but it wasn’t by chance. The rest of the system often has to be designed “just so” that little bit of code can be elegant. I know many modern software engineers might consider that a waste of time, but spending that effort on the core design often leads to other surprising capabilities that then just magically work.

What projects are you working on now?

VisiData 2.0, with an API we can get behind, to encourage a rich ecosystem of plugins and loaders.
Some people like have already started [writing plugins](https://github.com/jsvine/visidata-plugins), and I dream that someday there will be a visidata loader for every format and service that has tabular data.
“Where in the Data is Carmen Sanmateo?” a data-diving game a la Noah Veltman’s Command-line Mystery or the Knightlab [SQL murders](https://mystery.knightlab.com/). You know, like a detective game for data nerds.

Which Python libraries are your favorite (core or 3rd party)?

From a quick grep on the VisiData source code, it seems that collections, functools, and itertools are used the most. As for 3rd party utils, I always have to mention python-dateutil. It just makes date parsing so easy, no matter the format, it just figures it out. My only wish is that it allowed access to the deduced format, so you could reformat other dates the same way.

What is VisiData and how did it come about?

VisiData is a playground for tabular data in the terminal. It provides a spreadsheet-like interface for many formats, including even its own internals. I first made a version back in 2011 when I was at F5 Networks. It was surprisingly flexible and unreasonably effective to use for lots of tasks, and after I left that job, I found myself missing it (for example to view and explore HDF5 files which I worked with at my next job). But I couldn’t use that version because F5 owned it, so I decided to remake it completely, and release it as open-source. Then, I could use it for my own projects and at other jobs.

But to do it “right” is a lot of thankless work, making it reliable and seamless in all kinds of situations, and hashing out all the little details and edge cases so that it feels like a tool that doesn’t just get the job done, but is so smooth that it is *fun* to use it. It’s quite a bit of work, and I would never take the time to do it just for myself. But when other people use it and appreciate it, that makes the effort worthwhile; both in a global optimization sense, and in a personal emotional satisfaction sense.

Are there any new challenges or features you expect to add to VisiData?

I really want split-pane: two separate but related VisiData windows in the same terminal, for things like internal menus (e.g. of aggregators and jointypes), or directory/file browsing (a la Norton/Midnight Commander), and a number of other interesting use cases. But it’s been complicated from a design perspective. I’m hoping that making a public statement like this will spur my subconscious to find an elegant solution like happened in my Podcast__init__ interview.

Thanks for doing the interview, Saul!

The post PyDev of the Week: Saul Pwanson appeared first on The Mouse Vs. The Python.

↧

Real Python: Python Timer Functions: Three Ways to Monitor Your Code

December 30, 2019, 6:00 am

≫ Next: John Cook: Minimizing context switching between shell and Python

≪ Previous: Mike Driscoll: PyDev of the Week: Saul Pwanson

While many developers recognize Python as an effective programming language, pure Python programs may run slower than their counterparts in compiled languages like C, Rust, and Java. Throughout this tutorial, you’ll see how to use a Python timer to monitor how fast your programs are running.

In this tutorial, you’ll learn how to use:

time.perf_counter() to measure time in Python
Classes to keep state
Context managers to work with a block of code
Decorators to customize a function

You’ll also gain background knowledge into how classes, context managers, and decorators work. As you see examples of each concept, you’ll be inspired to use one or several of them in your code, both for timing code execution and other applications. Each method has its advantages, and you’ll learn which to use depending on the situation. Plus, you’ll have a working Python timer that you can use to monitor your programs!

Decorators Q&A Transcript:Click here to get access to a 25-page chat log from our recent Python decorators Q&A session in the Real Python Community Slack where we discussed common decorator questions.

Python Timers

First, you’ll take a look at some example code that you’ll use throughout the tutorial. Later, you’ll add a Python timer to this code to monitor its performance. You’ll also see some of the simplest ways to measure the running time of this example.

Python Timer Functions

If you look at the built in time module in Python, then you’ll notice several functions that can measure time:

Python 3.7 introduced several new functions, like thread_time(), as well as nanosecond versions of all the functions above, named with an _ns suffix. For example, perf_counter_ns() is the nanosecond version of perf_counter(). You’ll learn more about these functions later. For now, note what the documentation has to say about perf_counter():

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. (Source)

First, you’ll use perf_counter() to create a Python timer. Later, you’ll compare this with other Python timer functions and learn why perf_counter() is usually the best choice.

Example: Download Tutorials

To better compare the different ways you can add a Python timer to your code, you’ll apply different Python timer functions to the same code example throughout this tutorial. If you already have code you’d like to measure, then feel free to follow the examples with that instead.

The example you’ll see in this tutorial is a short function that uses the realpython-reader package to download the latest tutorials available here on Real Python. To learn more about the Real Python Reader and how it works, check out How to Publish an Open-Source Python Package to PyPI. You can install realpython-reader on your system with pip:

$ python -m pip install realpython-reader

Then, you can import the package as reader.

You’ll store the example in a file named latest_tutorial.py. The code consists of one function that downloads and prints the latest tutorial from Real Python:

 1 # latest_tutorial.py 2  3 fromreaderimportfeed 4  5 defmain(): 6 """Download and print the latest tutorial from Real Python""" 7 tutorial=feed.get_article(0) 8 print(tutorial) 9 10 if__name__=="__main__":11 main()

realpython-reader handles most of the hard work:

Line 3 imports feed from realpython-reader. This module contains functionality for downloading tutorials from the Real Python feed.
Line 7 downloads the latest tutorial from Real Python. The number 0 is an offset, where 0 means the most recent tutorial, 1 is the previous tutorial, and so on.
Line 8 prints the tutorial to the console.
Line 11 calls main() when you run the script.

When you run this example, your output will typically look something like this:

$ python latest_tutorial.py
# Python Timer Functions: Three Ways to Monitor Your Code

While many developers recognize Python as an effective programming language,pure Python programs may run slower than their counterparts in compiledlanguages like C, Rust, and Java. Throughout this tutorial, you’ll see how touse a Python timer to monitor how fast your programs are running.[ ... The full text of the tutorial ... ]

The code may take a little while to run depending on the network, so you might want to use a Python timer to monitor the performance of the script.

Your First Python Timer

Let’s add a bare-bones Python timer to the example with time.perf_counter(). Again, this is a performance counter that’s well-suited for timing parts of your code.

perf_counter() measures the time in seconds from some unspecified moment in time, which means that the return value of a single call to the function isn’t useful. However, when you look at the difference between two calls to perf_counter(), you can figure out how many seconds passed between the two calls:

>>>

>>> importtime>>> time.perf_counter()32311.48899951>>> time.perf_counter()# A few seconds later32315.261320793

In this example, you made two calls to perf_counter() almost 4 seconds apart. You can confirm this by calculating the difference between the two outputs: 32315.26 - 32311.49 = 3.77.

You can now add a Python timer to the example code:

 1 # latest_tutorial.py 2  3 importtime 4 fromreaderimportfeed 5  6 defmain(): 7 """Print the latest tutorial from Real Python""" 8 tic=time.perf_counter() 9 tutorial=feed.get_article(0)10 toc=time.perf_counter()11 print(f"Downloaded the tutorial in {toc - tic:0.4f} seconds")12 13 print(tutorial)14 15 if__name__=="__main__":16 main()

Note that you call perf_counter() both before and after downloading the tutorial. You then print the time it took to download the tutorial by calculating the difference between the two calls.

Note: In line 11, the f before the string indicates that this is an f-string, which is a convenient way to format a text string. :0.4f is a format specifier that says the number, toc - tic, should be printed as a decimal number with four decimals.

f-strings are only available in Python 3.6 and later. For more information, check out Python 3’s f-Strings: An Improved String Formatting Syntax.

Now, when you run the example, you’ll see the elapsed time before the tutorial:

$ python latest_tutorial.py
Downloaded the tutorial in 0.67 seconds# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of the tutorial ... ]

That’s it! You’ve covered the basics of timing your own Python code. In the rest of the tutorial, you’ll learn how you can wrap a Python timer into a class, a context manager, and a decorator to make it more consistent and convenient to use.

A Python Timer Class

Look back at how you added the Python timer to the example above. Note that you need at least one variable (tic) to store the state of the Python timer before you download the tutorial. After staring at the code a little, you might also note that the three highlighted lines are added only for timing purposes! Now, you’ll create a class that does the same as your manual calls to perf_counter(), but in a more readable and consistent manner.

Throughout this tutorial, you’ll create and update Timer, a class that you can use to time your code in several different ways. The final code is also available on PyPI under the name codetiming. You can install this to your system like so:

$ python -m pip install codetiming

You can find more information about codetiming later on in this tutorial, in the section named The Python Timer Code.

Understanding Classes in Python

Classes are the main building blocks of object-oriented programming. A class is essentially a template that you can use to create objects. While Python doesn’t force you to program in an object-oriented manner, classes are everywhere in the language. For a quick proof, let’s investigate the time module:

>>>

>>> importtime>>> type(time)<class 'module'>>>> time.__class__<class 'module'>

type() returns the type of an object. Here you can see that modules are in fact objects created from a module class. The special attribute .__class__ can be used to get access to the class that defines an object. In fact, almost everything in Python is a class:

>>>

>>> type(3)<class 'int'>>>> type(None)<class 'NoneType'>>>> type(print)<class 'builtin_function_or_method'>>>> type(type)<class 'type'>

In Python, classes are great when you need to model something that needs to keep track of a particular state. In general, a class is a collection of properties (called attributes) and behaviors (called methods). For more background on classes and object-oriented programming, check out Object-Oriented Programming (OOP) in Python 3 or the official docs.

Creating a Python Timer Class

Classes are good for tracking state. In a Timer class, you want to keep track of when a timer starts and how much time has passed since then. For the first implementation of Timer, you’ll add a ._start_time attribute, as well as .start() and .stop() methods. Add the following code to a file named timer.py:

 1 # timer.py 2  3 importtime 4  5 classTimerError(Exception): 6 """A custom exception used to report errors in use of Timer class""" 7  8 classTimer: 9 def__init__(self):10 self._start_time=None11 12 defstart(self):13 """Start a new timer"""14 ifself._start_timeisnotNone:15 raiseTimerError(f"Timer is running. Use .stop() to stop it")16 17 self._start_time=time.perf_counter()18 19 defstop(self):20 """Stop the timer, and report the elapsed time"""21 ifself._start_timeisNone:22 raiseTimerError(f"Timer is not running. Use .start() to start it")23 24 elapsed_time=time.perf_counter()-self._start_time25 self._start_time=None26 print(f"Elapsed time: {elapsed_time:0.4f} seconds")

A few different things are happening here, so let’s walk through the code step by step.

In line 5, you define a TimerError class. The (Exception) notation means that TimerErrorinherits from another class called Exception. Python uses this built-in class for error handling. You don’t need to add any attributes or methods to TimerError. However, having a custom error will give you more flexibility to handle problems inside Timer. For more information, check out Python Exceptions: An Introduction.

The definition of Timer itself starts on line 8. When you first create or instantiate an object from a class, your code calls the special method .__init__(). In this first version of Timer, you only initialize the ._start_time attribute, which you’ll use to track the state of your Python timer. It has the value None when the timer isn’t running. Once the timer is running, ._start_time keeps track of when the timer started.

Note: The underscore prefix of ._start_time is a Python convention. It signals that ._start_time is an internal attribute that should not be manipulated by users of the Timer class.

When you call .start() to start a new Python timer, you first check that the timer isn’t already running. Then you store the current value of perf_counter() in ._start_time. On the other hand, when you call .stop(), you first check that the Python timer is running. If it is, then you calculate the elapsed time as the difference between the current value of perf_counter() and the one you stored in ._start_time. Finally, you reset ._start_time so that the timer can be restarted, and print the elapsed time.

Here’s how you use Timer:

>>>

>>> fromtimerimportTimer>>> t=Timer()>>> t.start()>>> t.stop()# A few seconds laterElapsed time: 3.8191 seconds

Compare this to the earlier example where you used perf_counter() directly. The structure of the code is fairly similar, but now the code is more clear, and this is one of the benefits of using classes. By carefully choosing your class, method, and attribute names, you can make your code very descriptive!

Using the Python Timer Class

Let’s apply Timer to latest_tutorial.py. You only need to make a few changes to your previous code:

# latest_tutorial.pyfromtimerimportTimerfromreaderimportfeeddefmain():"""Print the latest tutorial from Real Python"""t=Timer()t.start()tutorial=feed.get_article(0)t.stop()print(tutorial)if__name__=="__main__":main()

Notice that the code is very similar to what you saw earlier. In addition to making the code more readable, Timer takes care of printing the elapsed time to the console, which makes the logging of time spent more consistent. When you run the code, you’ll see pretty much the same output:

$ python latest_tutorial.py
Elapsed time: 0.64 seconds# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of the tutorial ... ]

Printing the elapsed time from Timer may be consistent, but it seems that this approach is not very flexible. In the next section, you’ll see how to customize your class.

Adding More Convenience and Flexibility

So far, you’ve seen that classes are suitable for when you want to encapsulate state and ensure consistent behavior in your code. In this section, you’ll add more convenience and flexibility to your Python timer:

Use adaptable text and formatting when reporting the time spent
Apply flexible logging, either to the screen, to a log file, or other parts of your program
Create a Python timer that can accumulate over several invocations
Build an informative representation of a Python timer

First, let’s see how you can customize the text used to report the time spent. In the previous code, the text f"Elapsed time: {elapsed_time:0.4f} seconds" is hard-coded into .stop(). You can add flexibility to classes using instance variables. Their values are normally passed as arguments to .__init__() and stored as self attributes. For convenience, you can also provide reasonable default values.

To add .text as a Timer instance variable, you’ll do something like this:

def__init__(self,text="Elapsed time: {:0.4f} seconds"):self._start_time=Noneself.text=text

Note that the default text, "Elapsed time: {:0.4f} seconds", is given as a regular string, not as an f-string. You can’t use an f-string here because they evaluate immediately, and when you instantiate Timer, your code has not yet calculated the elapsed time.

Note: If you want to use an f-string to specify .text, then you need to use double curly braces to escape the curly braces that the actual elapsed time will replace.

One example would be f"Finished {task} in {{:0.4f}} seconds". If the value of task is "reading", then this f-string would be evaluated as "Finished reading in {:0.4f} seconds".

In .stop(), you use .text as a template and .format() to populate the template:

defstop(self):"""Stop the timer, and report the elapsed time"""ifself._start_timeisNone:raiseTimerError(f"Timer is not running. Use .start() to start it")elapsed_time=time.perf_counter()-self._start_timeself._start_time=Noneprint(self.text.format(elapsed_time))

After this update to timer.py, you can change the text as follows:

>>>

>>> fromtimerimportTimer>>> t=Timer(text="You waited {:.1f} seconds")>>> t.start()>>> t.stop()# A few seconds laterYou waited 4.1 seconds

Next, assume that you don’t just want to print a message to the console. Maybe you want to save your time measurements so you can store them in a database. You can do this by returning the value of elapsed_time from .stop(). Then, the calling code can choose to either ignore that return value or save it for later processing.

Perhaps you want to integrate Timer into your logging routines. To support logging or other outputs from Timer you need to change the call to print() so that the user can supply their own logging function. This can be done similar to how you customized the text earlier:

def__init__(self,text="Elapsed time: {:0.4f} seconds",logger=print):self._start_time=Noneself.text=textself.logger=loggerdefstop(self):"""Stop the timer, and report the elapsed time"""ifself._start_timeisNone:raiseTimerError(f"Timer is not running. Use .start() to start it")elapsed_time=time.perf_counter()-self._start_timeself._start_time=Noneifself.logger:self.logger(self.text.format(elapsed_time))returnelapsed_time

Instead of using print() directly, you create another instance variable, self.logger, that should refer to a function that takes a string as an argument. In addition to print(), you can use functions like logging.info() or .write() on file objects. Also note the if test, which allows you to turn off printing completely by passing logger=None.

Here are two examples that show the new functionality in action:

>>>

>>> fromtimerimportTimer>>> importlogging>>> t=Timer(logger=logging.warning)>>> t.start()>>> t.stop()# A few seconds laterWARNING:root:Elapsed time: 3.1610 seconds3.1609658249999484>>> t=Timer(logger=None)>>> t.start()>>> value=t.stop()# A few seconds later>>> value4.710851433001153

When you run these examples in an interactive shell, Python prints the return value automatically.

The third improvement you’ll add is the ability to accumulate time measurements. You may want to do this, for instance, when you’re calling a slow function in a loop. You’ll add a bit more functionality in the form of named timers with a dictionary that keeps track of every Python timer in your code.

Assume that you’re expanding latest_tutorial.py to a latest_tutorials.py script that downloads and prints the ten latest tutorials from Real Python. The following is one possible implementation:

# latest_tutorials.pyfromtimerimportTimerfromreaderimportfeeddefmain():"""Print the 10 latest tutorials from Real Python"""t=Timer(text="Downloaded 10 tutorials in {:0.2f} seconds")t.start()fortutorial_numinrange(10):tutorial=feed.get_article(tutorial_num)print(tutorial)t.stop()if__name__=="__main__":main()

The code loops over the numbers from 0 to 9 and uses those as offset arguments to feed.get_article(). When you run the script, you’ll see a lot of information printed to your console:

$ python latest_tutorials.py
# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of ten tutorials ... ]Downloaded 10 tutorials in 0.67 seconds

One subtle issue with this code is that you’re not only measuring the time it takes to download the tutorials, but also the time Python spends printing the tutorials to your screen. This might not be that important since the time spent printing should be negligible compared to the time spent downloading. Still, it would be good to have a way to precisely time what you’re after in these kinds of situations.

Note: The time spent downloading ten tutorials is about the same as the time spent downloading one tutorial. This is not a bug in your code! Instead, reader caches the Real Python feed the first time get_article() is called, and reuses the information on later invocations.

There are several ways you can work around this without changing the current implementation of Timer. However, supporting this use case will be quite useful, and can be done with just a few lines of code.

First, you’ll introduce a dictionary called .timers as a class variable on Timer, which means that all instances of Timer will share it. You implement it by defining it outside any methods:

classTimer:timers=dict()

Class variables can be accessed either directly on the class, or through an instance of the class:

>>>

>>> fromtimerimportTimer>>> Timer.timers{}>>> t=Timer()>>> t.timers{}>>> Timer.timersist.timersTrue

In both cases, the code returns the same empty class dictionary.

Next, you’ll add optional names to your Python timer. You can use the name for two different purposes:

Looking up the elapsed time later in your code
Accumulating timers with the same name

To add names to your Python timer, you need to make two more changes to timer.py. First, Timer should accept the name as a parameter. Second, the elapsed time should be added to .timers when a timer stops:

classTimer:timers=dict()def__init__(self,name=None,text="Elapsed time: {:0.4f} seconds",logger=print,):self._start_time=Noneself.name=nameself.text=textself.logger=logger# Add new named timers to dictionary of timersifname:self.timers.setdefault(name,0)# Other methods are unchangeddefstop(self):"""Stop the timer, and report the elapsed time"""ifself._start_timeisNone:raiseTimerError(f"Timer is not running. Use .start() to start it")elapsed_time=time.perf_counter()-self._start_timeself._start_time=Noneifself.logger:self.logger(self.text.format(elapsed_time))ifself.name:self.timers[self.name]+=elapsed_timereturnelapsed_time

Note that you use .setdefault() when adding the new Python timer to .timers. This is a great feature that only sets the value if name is not already defined in the dictionary. If name is already used in .timers, then the value is left untouched. This allows you to accumulate several timers:

>>>

>>> fromtimerimportTimer>>> t=Timer("accumulate")>>> t.start()>>> t.stop()# A few seconds laterElapsed time: 3.7036 seconds3.703554293999332>>> t.start()>>> t.stop()# A few seconds laterElapsed time: 2.3449 seconds2.3448921170001995>>> Timer.timers{'accumulate': 6.0484464109995315}

You can now revisit latest_tutorials.py and make sure only the time spent on downloading the tutorials is measured:

# latest_tutorials.pyfromtimerimportTimerfromreaderimportfeeddefmain():"""Print the 10 latest tutorials from Real Python"""t=Timer("download",logger=None)fortutorial_numinrange(10):t.start()tutorial=feed.get_article(tutorial_num)t.stop()print(tutorial)download_time=Timer.timers["download"]print(f"Downloaded 10 tutorials in {download_time:0.2f} seconds")if__name__=="__main__":main()

Rerunning the script will give similar output as earlier, although now you are only timing the actual download of the tutorials:

$ python latest_tutorials.py
# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of ten tutorials ... ]Downloaded 10 tutorials in 0.65 seconds

The final improvement that you’ll make to Timer is to make it more informative when you’re working with it interactively. Try the following:

>>>

>>> fromtimerimportTimer>>> t=Timer()>>> t<timer.Timer object at 0x7f0578804320>

That last line is the default way Python represents objects. While you can glean some information from it, it’s usually not very useful. Instead, it would be nice to see things like the name of the Timer, or how it will report on the timings.

In Python 3.7, data classes were added to the standard library. These provide several conveniences to your classes, including a more informative representation string.

Note: Data classes are included in Python only for version 3.7 and later. However, there is a backport available on PyPI for Python 3.6.

You can install it using pip:

$ python -m pip install dataclasses

See The Ultimate Guide to Data Classes in Python 3.7 for more information.

You convert your Python timer to a data class using the @dataclass decorator. You’ll learn more about decorators later in this tutorial. For now, you can think of this as a notation that tells Python that Timer is a data class:

 1 fromdataclassesimportdataclass,field 2 fromtypingimportAny,ClassVar 3  4 @dataclass 5 classTimer: 6 timers:ClassVar=dict() 7 name:Any=None 8 text:Any="Elapsed time: {:0.4f} seconds" 9 logger:Any=print10 _start_time:Any=field(default=None,init=False,repr=False)11 12 def__post_init__(self):13 """Initialization: add timer to dict of timers"""14 ifself.name:15 self.timers.setdefault(self.name,0)16 17 # The rest of the code is unchanged

This code replaces your earlier .__init__() method. Note how data classes use syntax that looks similar to the class variable syntax you saw earlier for defining all variables. In fact, .__init__() is created automatically for data classes, based on annotated variables in the definition of the class.

You need to annotate your variables to use a data class. You can use this to add type hints to your code. If you don’t want to use type hints, then you can instead annotate all variables with Any, just like you did above. You’ll soon see how to add actual type hints to your data class.

Here are a few notes about the Timer data class:

Line 4: The @dataclass decorator defines Timer to be a data class.
Line 6: The special ClassVar annotation is necessary for data classes to specify that .timers is a class variable.
Lines 7 to 9:.name, .text, and .logger will be defined as attributes on Timer, whose values can be specified when creating Timer instances. They all have the given default values.
Line 10: Recall that ._start_time is a special attribute that’s used to keep track of the state of the Python timer, but should be hidden from the user. Using dataclasses.field() you say that ._start_time should be removed from .__init__() and the representation of Timer.
Lines 12 to 15: You can use the special .__post_init__() method for any initialization you need to do apart from setting the instance attributes. Here, you use it to add named timers to .timers.

Your new Timer data class works just like your previous regular class, except that it now has a nice representation:

>>>

>>> fromtimerimportTimer>>> t=Timer()>>> tTimer(name=None, text='Elapsed time: {:0.4f} seconds',      logger=<built-in function print>)>>> t.start()>>> t.stop()# A few seconds laterElapsed time: 6.7197 seconds6.719705373998295

Now you have a pretty neat version of Timer that’s consistent, flexible, convenient, and informative! Many of the improvements you’ve seen in this section can be applied to other types of classes in your projects as well.

Before ending this section, let’s have a look at the complete source code of Timer as it currently stands. You’ll notice the addition of type hints to the code for extra documentation:

# timer.pyfromdataclassesimportdataclass,fieldimporttimefromtypingimportCallable,ClassVar,Dict,OptionalclassTimerError(Exception):"""A custom exception used to report errors in use of Timer class"""@dataclassclassTimer:timers:ClassVar[Dict[str,float]]=dict()name:Optional[str]=Nonetext:str="Elapsed time: {:0.4f} seconds"logger:Optional[Callable[[str],None]]=print_start_time:Optional[float]=field(default=None,init=False,repr=False)def__post_init__(self)->None:"""Add timer to dict of timers after initialization"""ifself.nameisnotNone:self.timers.setdefault(self.name,0)defstart(self)->None:"""Start a new timer"""ifself._start_timeisnotNone:raiseTimerError(f"Timer is running. Use .stop() to stop it")self._start_time=time.perf_counter()defstop(self)->float:"""Stop the timer, and report the elapsed time"""ifself._start_timeisNone:raiseTimerError(f"Timer is not running. Use .start() to start it")# Calculate elapsed timeelapsed_time=time.perf_counter()-self._start_timeself._start_time=None# Report elapsed timeifself.logger:self.logger(self.text.format(elapsed_time))ifself.name:self.timers[self.name]+=elapsed_timereturnelapsed_time

Using a class to create a Python timer has several benefits:

Readability: Your code will read more naturally if you carefully choose class and method names.
Consistency: Your code will be easier to use if you encapsulate properties and behaviors into attributes and methods.
Flexibility: Your code will be reusable if you use attributes with default values instead of hardcoded values.

This class is very flexible, and you can use it in almost any situation where you want to monitor the time it takes for code to run. However, in the next sections, you’ll learn about using context managers and decorators, which will be more convenient for timing code blocks and functions.

A Python Timer Context Manager

Your Python Timer class has come a long way! Compared with the first Python timer you created, your code has gotten quite powerful. However, there’s still a bit of boilerplate code necessary to use your Timer:

First, instantiate the class.
Call .start() before the code block that you want to time.
Call .stop() after the code block.

Luckily, Python has a unique construct for calling functions before and after a block of code: the context manager. In this section, you’ll learn what context managers are and how you can create your own. Then you’ll see how to expand Timer so that it can work as a context manager as well. Finally, you’ll see how using Timer as a context manager can simplify your code.

Understanding Context Managers in Python

Context managers have been a part of Python for a long time. They were introduced by PEP 343 in 2005, and first implemented in Python 2.5. You can recognize context managers in code by the use of the with keyword:

withEXPRESSIONasVARIABLE:BLOCK

EXPRESSION is some Python expression that returns a context manager. The context manager is optionally bound to the name VARIABLE. Finally, BLOCK is any regular Python code block. The context manager will guarantee that your program calls some code before BLOCK and some other code after BLOCK executes. The latter will happen, even if BLOCK raises an exception.

The most common use of context managers is probably handling different resources, like files, locks, and database connections. The context manager is then used to free and clean up the resource after you’ve used it. The following example reveals the fundamental structure of timer.py by only printing lines that contain a colon. More importantly, it shows the common idiom for opening a file in Python:

>>>

>>> withopen("timer.py")asfp:... print("".join(lnforlninfpif":"inln))...class TimerError(Exception):class Timer:    timers: ClassVar[Dict[str, float]] = dict()    name: Optional[str] = None    text: str = "Elapsed time: {:0.4f} seconds"    logger: Optional[Callable[[str], None]] = print    _start_time: Optional[float] = field(default=None, init=False, repr=False)    def __post_init__(self) -> None:        if self.name is not None:    def start(self) -> None:        if self._start_time is not None:    def stop(self) -> float:        if self._start_time is None:        if self.logger:        if self.name:

Note that fp, the file pointer, is never explicitly closed because you used open() as a context manager. You can confirm that fp has closed automatically:

>>>

>>> fp.closedTrue

In this example, open("timer.py") is an expression that returns a context manager. That context manager is bound to the name fp. The context manager is in effect during the execution of print(). This one-line code block executes in the context of fp.

What does it mean that fp is a context manager? Technically, it means that fp implements the context manager protocol. There are many different protocols underlying the Python language. You can think of a protocol as a contract that states what specific methods your code must implement.

The context manager protocol consists of two methods:

Call .__enter__() when entering the context related to the context manager.
Call .__exit__() when exiting the context related to the context manager.

In other words, to create a context manager yourself, you need to write a class that implements .__enter__() and .__exit__(). No more, no less. Let’s try a Hello, World! context manager example:

# greeter.pyclassGreeter:def__init__(self,name):self.name=namedef__enter__(self):print(f"Hello {self.name}")returnselfdef__exit__(self,exc_type,exc_value,exc_tb):print(f"See you later, {self.name}")

Greeter is a context manager because it implements the context manager protocol. You can use it like this:

>>>

>>> fromgreeterimportGreeter>>> withGreeter("Nick"):... print("Doing stuff ...")...Hello NickDoing stuff ...See you later, Nick

First, note how .__enter__() is called before you’re doing stuff, while .__exit__() is called after. In this simplified example, you’re not referencing the context manager. In such cases, you don’t need to give the context manager a name with as.

Next, notice how .__enter__() returns self. The return value of .__enter__() is what is bound by as. You usually want to return self from .__enter__() when creating context managers. You can use that return value as follows:

>>>

>>> fromgreeterimportGreeter>>> withGreeter("Emily")asgrt:... print(f"{grt.name} is doing stuff ...")...Hello EmilyEmily is doing stuff ...See you later, Emily

Finally, .__exit__() takes three arguments: exc_type, exc_value, and exc_tb. These are used for error handling within the context manager, and mirror the return values of sys.exc_info(). If an exception happens while the block is being executed, then your code calls .__exit__() with the type of the exception, an exception instance, and a traceback object. Often, you can ignore these in your context manager, in which case .__exit__() is called before the exception is reraised:

>>>

>>> fromgreeterimportGreeter>>> withGreeter("Rascal")asgrt:... print(f"{grt.age} does not exist")...Hello RascalSee you later, RascalTraceback (most recent call last):
  File "<stdin>", line 2, in <module>AttributeError: 'Greeter' object has no attribute 'age'

You can see that "See you later, Rascal" is printed, even though there is an error in the code.

Now you know what context managers are and how you can create your own. If you want to dive deeper, then check out contextlib in the standard library. It includes convenient ways for defining new context managers, as well as ready-made context managers that can be used to close objects, suppress errors, or even do nothing! For even more information, check out Python Context Managers and the “with” Statement and the accompanying tutorial.

Creating a Python Timer Context Manager

You’ve seen how context managers work in general, but how can they help with timing code? If you can run certain functions before and after a block of code, then you can simplify how your Python timer works. So far, you’ve needed to call .start() and .stop() explicitly when timing your code, but a context manager can do this automatically.

Again, for Timer to work as a context manager, it needs to adhere to the context manager protocol. In other words, it must implement .__enter__() and .__exit__() to start and stop the Python timer. All the necessary functionality is already available, so there’s not much new code you need to write. Just add the following methods to your Timer class:

def__enter__(self):"""Start a new timer as a context manager"""self.start()returnselfdef__exit__(self,*exc_info):"""Stop the context manager timer"""self.stop()

Timer is now a context manager. The important part of the implementation is that .__enter__() calls .start() to start a Python timer when the context is entered, and .__exit__() uses .stop() to stop the Python timer when the code leaves the context. Try it out:

>>>

>>> fromtimerimportTimer>>> importtime>>> withTimer():... time.sleep(0.7)...Elapsed time: 0.7012 seconds

You should also note two more subtle details:

.__enter__() returns self, the Timer instance, which allows the user to bind the Timer instance to a variable using as. For example, with Timer() as t: will create the variable t pointing to the Timer object.
.__exit__() expects a triple of arguments with information about any exception that occurred during the execution of the context. In your code, these arguments are packed into a tuple called exc_info and then ignored, which means that Timer will not attempt any exception handling.

.__exit__() doesn’t do any error handling in this case. Still, one of the great features of context managers is that they’re guaranteed to call .__exit__(), no matter how the context exits. In the following example, you purposely create an error by dividing by zero:

>>>

>>> fromtimerimportTimer>>> withTimer():... fornuminrange(-3,3):... print(f"1 / {num} = {1 / num:.3f}")...1 / -3 = -0.3331 / -2 = -0.5001 / -1 = -1.000Elapsed time: 0.0001 secondsTraceback (most recent call last):
  File "<stdin>", line 3, in <module>ZeroDivisionError: division by zero

Note that Timer prints out the elapsed time, even though the code crashed. It’s possible to inspect and suppress errors in .__exit__(). See the documentation for more information.

Using the Python Timer Context Manager

Let’s see how to use the Timer context manager to time the download of Real Python tutorials. Recall how you used Timer earlier:

# latest_tutorial.pyfromtimerimportTimerfromreaderimportfeeddefmain():"""Print the latest tutorial from Real Python"""t=Timer()t.start()tutorial=feed.get_article(0)t.stop()print(tutorial)if__name__=="__main__":main()

You’re timing the call to feed.get_article(). You can use the context manager to make the code shorter, simpler, and more readable:

# latest_tutorial.pyfromtimerimportTimerfromreaderimportfeeddefmain():"""Print the latest tutorial from Real Python"""withTimer():tutorial=feed.get_article(0)print(tutorial)if__name__=="__main__":main()

This code does virtually the same as the code above. The main difference is that you don’t define the extraneous variable t, which keeps your namespace cleaner.

Running the script should give a familiar result:

$ python latest_tutorial.py
Elapsed time: 0.71 seconds# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of the tutorial ... ]

There are a few advantages to adding context manager capabilities to your Python timer class:

Low effort: You only need one extra line of code to time the execution of a block of code.
Readability: Invoking the context manager is readable, and you can more clearly visualize the code block you’re timing.

Using Timer as a context manager is almost as flexible as using .start() and .stop() directly, while it has less boilerplate code. In the next section, you’ll see how Timer can be used as a decorator as well. This will make it easier to monitor the runtime of complete functions.

A Python Timer Decorator

Your Timer class is now very versatile. However, there’s one use case where it could be even more streamlined. Say that you want to track the time spent inside one given function in your codebase. Using a context manager, you have essentially two different options:

Use Timer every time you call the function:
```
withTimer("some_name"):do_something()
```
If you call do_something() in many places, then this will become cumbersome and hard to maintain.
Wrap the code in your function inside a context manager:
```
defdo_something():withTimer("some_name"):...
```
The Timer only needs to be added in one place, but this adds a level of indentation to the whole definition of do_something().

A better solution is to use Timer as a decorator. Decorators are powerful constructs that you use to modify the behavior of functions and classes. In this section, you’ll learn a little about how decorators work, how Timer can be extended to be a decorator, and how that will simplify timing functions. For a more in-depth explanation of decorators, see Primer on Python Decorators.

Understanding Decorators in Python

A decorator is a function that wraps another function to modify its behavior. This technique is possible because functions are first-class objects in Python. In other words, functions can be assigned to variables and used as arguments to other functions, just like any other object. This gives you a lot of flexibility and is the basis for several of Python’s more powerful features.

As a first example, you’ll create a decorator that does nothing:

defturn_off(func):returnlambda*args,**kwargs:None

First, note that turn_off() is just a regular function. What makes this a decorator is that it takes a function as its only argument and returns a function. You can use this to modify other functions like this:

>>>

>>> print("Hello")Hello>>> print=turn_off(print)>>> print("Hush")>>> # Nothing is printed

The line print = turn_off(print)decorates the print statement with the turn_off() decorator. Effectively, it replaces print() with lambda *args, **kwargs: None returned by turn_off(). The lambda statement represents an anonymous function that does nothing except return None.

For you to define more interesting decorators, you need to know about inner functions. An inner function is a function defined inside another function. One common use of inner functions is to create function factories:

defcreate_multiplier(factor):defmultiplier(num):returnfactor*numreturnmultiplier

multiplier() is an inner function, defined inside create_multiplier(). Note that you have access to factor inside multiplier(), while multiplier() is not defined outside create_multiplier():

>>>

>>> multiplierTraceback (most recent call last):
  File "<stdin>", line 1, in <module>NameError: name 'multiplier' is not defined

Instead you use create_multiplier() to create new multiplier functions, each based on a different factor:

>>>

>>> double=create_multiplier(factor=2)>>> double(3)6>>> quadruple=create_multiplier(factor=4)>>> quadruple(7)28

Similarly, you can use inner functions to create decorators. Remember, a decorator is a function that returns a function:

 1 deftriple(func): 2 defwrapper_triple(*args,**kwargs): 3 print(f"Tripled {func.__name__!r}") 4 value=func(*args,**kwargs) 5 returnvalue*3 6 returnwrapper_triple

triple() is a decorator, because it’s a function that expects a function as it’s only argument, func(), and returns another function, wrapper_triple(). Note the structure of triple() itself:

Line 1 starts the definition of triple() and expects a function as an argument.
Lines 2 to 5 define the inner function wrapper_triple().
Line 6 returns wrapper_triple().

This pattern is prevalent for defining decorators. The interesting parts are those happening inside the inner function:

Line 2 starts the definition of wrapper_triple(). This function will replace whichever function triple() decorates. The parameters are *args and **kwargs, which collect whichever positional and keyword arguments you pass to the function. This gives you the flexibility to use triple() on any function.
Line 3 prints out the name of the decorated function, and note that triple() has been applied to it.
Line 4 calls func(), the function that has been decorated by triple(). It passes on all arguments passed to wrapper_triple().
Line 5 triples the return value of func() and returns it.

Let’s try it out! knock() is a function that returns the word Penny. See what happens if it’s tripled:

>>>

>>> defknock():... return"Penny! "...>>> knock=triple(knock)>>> result=knock()Tripled 'knock'>>> result'Penny! Penny! Penny! '

Multiplying a text string by a number is a form of repetition, so Penny repeats three times. The decoration happens at knock = triple(knock).

It feels a bit clunky to keep repeating knock. Instead, PEP 318 introduced a more convenient syntax for applying decorators. The following definition of knock() does the same as the one above:

>>>

>>> @triple... defknock():... return"Penny! "...>>> result=knock()Tripled 'knock'>>> result'Penny! Penny! Penny! '

The @ symbol is used to apply decorators. In this case, @triple means that triple() is applied to the function defined just after it.

One of the few decorators defined in the standard library is @functools.wraps. This one is quite helpful when defining your own decorators. Since decorators effectively replace one function with another, they create a subtle issue with your functions:

>>>

>>> knock<function triple.<locals>.wrapper_triple at 0x7fa3bfe5dd90>

@triple decorates knock(), which is then replaced by the wrapper_triple() inner function, as the output above confirms. This will also replace the name, docstring, and other metadata. Often, this will not have much effect, but it can make introspection difficult.

Sometimes, decorated functions must have correct metadata. @functools.wraps fixes exactly this issue:

importfunctoolsdeftriple(func):@functools.wraps(func)defwrapper_triple(*args,**kwargs):print(f"Tripled {func.__name__!r}")value=func(*args,**kwargs)returnvalue*3returnwrapper_triple

With this new definition of @triple, metadata are preserved:

>>>

>>> @triple... defknock():... return"Penny! "...>>> knock<function knock at 0x7fa3bfe5df28>

Note that knock() now keeps its proper name, even after being decorated. It’s good form to use @functools.wraps whenever you define a decorator. A blueprint you can use for most of your decorators is the following:

importfunctoolsdefdecorator(func):@functools.wraps(func)defwrapper_decorator(*args,**kwargs):# Do something beforevalue=func(*args,**kwargs)# Do something afterreturnvaluereturnwrapper_decorator

To see more examples of how to define decorators, check out the examples listed in Primer on Python Decorators.

Creating a Python Timer Decorator

In this section, you’ll learn how to extend your Python timer so that you can use it as a decorator as well. However, as a first exercise, let’s create a Python timer decorator from scratch.

Based on the blueprint above, you only need to decide what to do before and after you call the decorated function. This is similar to the considerations about what to do when entering and exiting the context manager. You want to start a Python timer before calling the decorated function, and stop the Python timer after the call finishes. A @timer decorator can be defined as follows:

importfunctoolsimporttimedeftimer(func):@functools.wraps(func)defwrapper_timer(*args,**kwargs):tic=time.perf_counter()value=func(*args,**kwargs)toc=time.perf_counter()elapsed_time=toc-ticprint(f"Elapsed time: {elapsed_time:0.4f} seconds")returnvaluereturnwrapper_timer

Note how much wrapper_timer() resembles the early pattern you established for timing Python code. You can apply @timer as follows:

>>>

>>> @timer... deflatest_tutorial():... tutorial=feed.get_article(0)... print(tutorial)...>>> latest_tutorial()# Python Timer Functions: Three Ways to Monitor Your Code[ ... The full text of the tutorial ... ]Elapsed time: 0.5414 seconds

Recall that you can also apply a decorator to a previously defined function:

>>>

>>> feed.get_article=timer(feed.get_article)

Since @ applies when functions are defined, you need to use the more basic form in these cases. One advantage of using a decorator is that you only need to apply it once, and it’ll time the function every time:

>>>

>>> tutorial=feed.get_article(0)Elapsed time: 0.5512 seconds

@timer does the job. However, in a sense, you’re back to square one, since @timer does not have any of the flexibility or convenience of Timer. Can you also make your Timer class act like a decorator?

So far, you’ve used decorators as functions applied to other functions, but that’s not entirely correct. Decorators must be callables. There are many callable types in Python. You can make your own objects callable by defining the special .__call__() method in their class. The following function and class behave similarly:

>>>

>>> defsquare(num):... returnnum**2...>>> square(4)16>>> classSquarer:... def__call__(self,num):... returnnum**2...>>> square=Squarer()>>> square(4)16

Here, square is an instance that is callable and can square numbers, just like the square() function in the first example.

This gives you a way of adding decorator capabilities to the existing Timer class:

def__call__(self,func):"""Support using Timer as a decorator"""@functools.wraps(func)defwrapper_timer(*args,**kwargs):withself:returnfunc(*args,**kwargs)returnwrapper_timer

.__call__() uses the fact that Timer is already a context manager to take advantage of the conveniences you’ve already defined there. Make sure you also import functools at the top of timer.py.

You can now use Timer as a decorator:

>>>

>>> @Timer(text="Downloaded the tutorial in {:.2f} seconds")... deflatest_tutorial():... tutorial=feed.get_article(0)... print(tutorial)...>>> latest_tutorial()# Python Timer Functions: Three Ways to Monitor Your Code[ ... The full text of the tutorial ... ]Downloaded the tutorial in 0.72 seconds

Before rounding out this section, know that there’s a more straightforward way of turning your Python timer into a decorator. You’ve already seen some of the similarities between context managers and decorators. They’re both typically used to do something before and after executing some given code.

Based on these similarities, there’s a mixin class defined in the standard library called ContextDecorator. You can add decorator abilities to your context manager classes simply by inheriting ContextDecorator:

fromcontextlibimportContextDecoratorclassTimer(ContextDecorator):# Implementation of Timer is unchanged

When you use ContextDecorator this way, there’s no need to implement .__call__() yourself, so you can safely delete it from the Timer class.

Using the Python Timer Decorator

Let’s redo the latest_tutorial.py example one last time, using the Python timer as a decorator:

 1 # latest_tutorial.py 2  3 fromtimerimportTimer 4 fromreaderimportfeed 5  6 @Timer() 7 defmain(): 8 """Print the latest tutorial from Real Python""" 9 tutorial=feed.get_article(0)10 print(tutorial)11 12 if__name__=="__main__":13 main()

If you compare this implementation with the original implementation without any timing, then you’ll notice that the only differences are the import of Timer on line 3 and the application of @Timer() on line 6. A significant advantage of using decorators is that they’re usually straightforward to apply, as you see here.

However, the decorator still applies to the whole function. This means your code is taking into account the time it takes to print the tutorial, in addition to the time it takes to download. Let’s run the script one final time:

$ python latest_tutorial.py
# Python Timer Functions: Three Ways to Monitor Your Code

[ ... The full text of the tutorial ... ]Elapsed time: 0.69 seconds

The location of the elapsed time output is a tell-tale sign that your code is considering the time it takes to print time as well. As you see here, your code prints the elapsed time after the tutorial.

When you use Timer as a decorator, you’ll see similar advantages as you did with context managers:

Low effort: You only need one extra line of code to time the execution of a function.
Readability: When you add the decorator, you can note more clearly that your code will time the function.
Consistency: You only need to add the decorator when the function is defined. Your code will consistently time it every time it’s called.

However, decorators are not as flexible as context managers. You can only apply them to complete functions. It’s possible to add decorators to already defined functions, but this is a bit clunky and less common.

The Python Timer Code

You can expand the code block below to view the final source code for your Python timer:

# timer.pyfromcontextlibimportContextDecoratorfromdataclassesimportdataclass,fieldimporttimefromtypingimportAny,Callable,ClassVar,Dict,OptionalclassTimerError(Exception):"""A custom exception used to report errors in use of Timer class"""@dataclassclassTimer(ContextDecorator):"""Time your code using a class, context manager, or decorator"""timers:ClassVar[Dict[str,float]]=dict()name:Optional[str]=Nonetext:str="Elapsed time: {:0.4f} seconds"logger:Optional[Callable[[str],None]]=print_start_time:Optional[float]=field(default=None,init=False,repr=False)def__post_init__(self)->None:"""Initialization: add timer to dict of timers"""ifself.name:self.timers.setdefault(self.name,0)defstart(self)->None:"""Start a new timer"""ifself._start_timeisnotNone:raiseTimerError(f"Timer is running. Use .stop() to stop it")self._start_time=time.perf_counter()defstop(self)->float:"""Stop the timer, and report the elapsed time"""ifself._start_timeisNone:raiseTimerError(f"Timer is not running. Use .start() to start it")# Calculate elapsed timeelapsed_time=time.perf_counter()-self._start_timeself._start_time=None# Report elapsed timeifself.logger:self.logger(self.text.format(elapsed_time))ifself.name:self.timers[self.name]+=elapsed_timereturnelapsed_timedef__enter__(self)->"Timer":"""Start a new timer as a context manager"""self.start()returnselfdef__exit__(self,*exc_info:Any)->None:"""Stop the context manager timer"""self.stop()

The code is also available in the codetiming repository on GitHub.

You can use the code yourself by saving it to a file named timer.py and importing it into your program:

>>>

>>> fromtimerimportTimer

Timer is also available on PyPI, so an even easier option is to install it using pip:

$ python -m pip install codetiming

Note that the package name on PyPI is codetiming. You’ll need to use this name both when you install the package and when you import Timer:

>>>

>>> fromcodetimingimportTimer

Apart from this, codetiming.Timer works exactly as timer.Timer. To summarize, you can use Timer in three different ways:

As a class:

t=Timer(name="class")t.start()# Do somethingt.stop()

As a context manager:

withTimer(name="context manager"):# Do something

As a decorator:

@Timer(name="decorator")defstuff():# Do something

This kind of Python timer is mainly useful for monitoring the time your code spends at individual key code blocks or functions. In the next section, you’ll get a quick overview of alternatives you can use if you’re looking to optimize your code.

Other Python Timer Functions

There are many options for timing your code with Python. In this tutorial, you’ve learned how to create a flexible and convenient class that you can use in several different ways. A quick search on PyPI shows that there are already many projects available that offer Python timer solutions.

In this section, you’ll first learn more about the different functions available in the standard library for measuring time, and why perf_counter() is preferable. Then, you’ll see alternatives for optimizing your code, for which Timer is not well-suited.

Using Alternative Python Timer Functions

You’ve been using perf_counter() throughout this tutorial to do the actual time measurements, but Python’s time library comes with several other functions that also measure time. Here are some alternatives:

One of the reasons why there are several functions is that Python represents time as a float. Floating-point numbers are inaccurate by nature. You may have seen results like these before:

>>>

>>> 0.1+0.1+0.10.30000000000000004>>> 0.1+0.1+0.1==0.3False

Python’s float follows the IEEE 754 Standard for Floating-Point Arithmetic, which tries to represent all floating-point numbers in 64 bits. Since there are infinitely many floating-point numbers, you can’t express them all with a finite number of bits.

IEEE 754 prescribes a system where the density of numbers that you can represent varies. The closer you are to 1, the more numbers you can represent. For larger numbers, there’s more space between the numbers that you can express. This has some consequences when you use a float to represent time.

Consider time(). The main purpose of this function is to represent the actual time right now. It does this as the number of seconds since a given point in time, called the epoch. The number returned by time() is quite big, which means that there are fewer numbers available, and the resolution suffers. Specifically, time() is not able to measure nanosecond differences:

>>>

>>> importtime>>> t=time.time()>>> t1564342757.0654016>>> t+1e-91564342757.0654016>>> t==t+1e-9True

A nanosecond is one-billionth of a second. Note that adding a nanosecond to t does not affect the result. perf_counter(), on the other hand, uses some undefined point in time as its epoch, allowing it to work with smaller numbers and therefore obtain a better resolution:

>>>

>>> importtime>>> p=time.perf_counter()>>> p11370.015653846>>> p+1e-911370.015653847>>> p==p+1e-9False

Here, you see that adding a nanosecond to p actually affects the outcome. For more information about how to work with time(), see A Beginner’s Guide to the Python time Module.

The challenges with representing time as a float are well known, so Python 3.7 introduced a new option. Each time measurement function now has a corresponding _ns function that returns the number of nanoseconds as an int instead of the number of seconds as a float. For instance, time() now has a nanosecond counterpart called time_ns():

>>>

>>> importtime>>> time.time_ns()1564342792866601283

Integers are unbounded in Python, so this allows time_ns() to give nanosecond resolution for all eternity. Similarly, perf_counter_ns() is a nanosecond variant of perf_counter():

>>>

>>> importtime>>> time.perf_counter()13580.153084446>>> time.perf_counter_ns()13580765666638

Since perf_counter() already provides nanosecond resolution, there are fewer advantages to using perf_counter_ns().

Note:perf_counter_ns() is only available in Python 3.7 and later. In this tutorial, you’ve used perf_counter() in your Timer class. That way, Timer can be used on older Python versions as well.

For more information about the _ns functions in time, check out Cool New Features in Python 3.7.

There are two functions in time that do not measure the time spent sleeping. These are process_time() and thread_time(), which are useful in some settings. However, for Timer, you typically want to measure the full time spent. The final function in the list above is monotonic(). The name alludes to this function being a monotonic timer, which is a Python timer that can never move backward.

All these functions are monotonic except time(), which can go backward if the system time is adjusted. On some systems, monotonic() is the same function as perf_counter(), and you can use them interchangeably. However, this is not always the case. You can use time.get_clock_info() to get more information about a Python timer function. Using Python 3.7 on Linux I get the following information:

>>>

>>> importtime>>> time.get_clock_info("monotonic")namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)',          monotonic=True, resolution=1e-09)>>> time.get_clock_info("perf_counter")namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)',          monotonic=True, resolution=1e-09)

The results could be different on your system.

PEP 418 describes some of the rationale behind introducing these functions. It includes the following short descriptions:

time.monotonic(): timeout and scheduling, not affected by system clock updates
time.perf_counter(): benchmarking, most precise clock for short period
time.process_time(): profiling, CPU time of the process (Source)

As you can see, it’s usually the best choice for you to use perf_counter() for your Python timer.

Estimating Running Time With `timeit`

Say you’re trying to squeeze the last bit of performance out of your code, and you’re wondering about the most effective way to convert a list to a set. You want to compare using set() and the set literal, {...}. You can use your Python timer for this:

>>>

>>> fromtimerimportTimer>>> numbers=[7,6,1,4,1,8,0,6]>>> withTimer(text="{:.8f}"):... set(numbers)...{0, 1, 4, 6, 7, 8}0.00007373>>> withTimer(text="{:.8f}"):... {*numbers}...{0, 1, 4, 6, 7, 8}0.00006204

This test seems to indicate that the set literal might be slightly faster. However, these results are quite uncertain, and if you rerun the code, you might get wildly different results. That’s because you’re only trying the code once. You could, for instance, get unlucky and run the script just as your computer is becoming busy with other tasks.

A better way is to use the timeit standard library. It’s designed precisely to measure the execution time of small code snippets. While you can import and call timeit.timeit() from Python as a regular function, it is usually more convenient to use the command-line interface. You can time the two variants as follows:

$ python -m timeit --setup "nums = [7, 6, 1, 4, 1, 8, 0, 6]""set(nums)"2000000 loops, best of 5: 163 nsec per loop$ python -m timeit --setup "nums = [7, 6, 1, 4, 1, 8, 0, 6]""{*nums}"2000000 loops, best of 5: 121 nsec per loop

timeit automatically calls your code many times to average out noisy measurements. The results from timeit confirm that the set literal is faster than set(). You can find more information about this particular issue at Michael Bassili’s blog.

Note: Be careful when you’re using timeit on code that can download files or access databases. Since timeit automatically calls your program several times, you could unintentionally end up spamming the server with requests!

Finally, the IPython interactive shell and the Jupyter notebook have extra support for this functionality with the %timeit magic command:

>>>

In [1]: numbers=[7,6,1,4,1,8,0,6]In [2]: %timeit set(numbers)
171 ns ± 0.748 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)In [3]: %timeit {*numbers}
147 ns ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Again, the measurements indicate that using a set literal is faster. In Jupyter Notebooks you can also use the %%timeit cell-magic to measure the time of running a whole cell.

Finding Bottlenecks in Your Code With Profilers

timeit is excellent for benchmarking a particular snippet of code. However, it would be very cumbersome to use it to check all parts of your program and locate which sections take the most time. Instead, you can use a profiler.

cProfile is a profiler that you can access at any time from the standard library. You can use it in several ways, although it’s usually most straightforward to use it as a command-line tool:

$ python -m cProfile -o latest_tutorial.prof latest_tutorial.py

This command runs latest_tutorial.py with profiling turned on. You save the output from cProfile in latest_tutorial.prof, as specified by the -o option. The output data is in a binary format that needs a dedicated program to make sense of it. Again, Python has an option right in the standard library! Runnin the pstats module on your .prof file opens an interactive profile statistics browser:

$ python -m pstats latest_tutorial.prof
Welcome to the profile statistics browser.latest_tutorial.prof% helpDocumented commands (type help <topic>):========================================EOF  add  callees  callers  help  quit  read  reverse  sort  stats  strip

To use pstats you type commands at the prompt. Here you can see the integrated help system. Typically you’ll use the sort and stats commands. To get a cleaner output, strip can be useful:

latest_tutorial.prof% striplatest_tutorial.prof% sort cumtimelatest_tutorial.prof% stats 10         1393801 function calls (1389027 primitive calls) in 0.586 seconds Ordered by: cumulative time List reduced from 1443 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function)  144/1   0.001   0.000   0.586   0.586 {built-in method builtins.exec}      1   0.000   0.000   0.586   0.586 latest_tutorial.py:3(<module>)      1   0.000   0.000   0.521   0.521 contextlib.py:71(inner)      1   0.000   0.000   0.521   0.521 latest_tutorial.py:6(read_latest_tutorial)      1   0.000   0.000   0.521   0.521 feed.py:28(get_article)      1   0.000   0.000   0.469   0.469 feed.py:15(_feed)      1   0.000   0.000   0.469   0.469 feedparser.py:3817(parse)      1   0.000   0.000   0.271   0.271 expatreader.py:103(parse)      1   0.000   0.000   0.271   0.271 xmlreader.py:115(parse)     13   0.000   0.000   0.270   0.021 expatreader.py:206(feed)

This output shows that the total runtime was 0.586 seconds. It also lists the ten functions where your code spent most of its time. Here you’ve sorted by cumulative time (cumtime), which means that your code counts time when the given function has called another function.

You can see that your code spends virtually all its time inside the latest_tutorial module, and in particular, inside read_latest_tutorial(). While this might be useful confirmation of what you already know, it’s often more interesting to find where your code actually spends time.

The total time (tottime) column indicates how much time your code spent inside a function, excluding time in sub-functions. You can see that none of the functions above really spend any time doing this. To find where the code spent most of its time, issue another sort command:

latest_tutorial.prof% sort tottimelatest_tutorial.prof% stats 10         1393801 function calls (1389027 primitive calls) in 0.586 seconds Ordered by: internal time List reduced from 1443 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function)     59   0.091   0.002   0.091   0.002 {method 'read' of '_ssl._SSLSocket'} 114215   0.070   0.000   0.099   0.000 feedparser.py:308(__getitem__) 113341   0.046   0.000   0.173   0.000 feedparser.py:756(handle_data)      1   0.033   0.033   0.033   0.033 {method 'do_handshake' of '_ssl._SSLSocket'}      1   0.029   0.029   0.029   0.029 {method 'connect' of '_socket.socket'}     13   0.026   0.002   0.270   0.021 {method 'Parse' of 'pyexpat.xmlparser'} 113806   0.024   0.000   0.123   0.000 feedparser.py:373(get)   3455   0.023   0.000   0.024   0.000 {method 'sub' of 're.Pattern'} 113341   0.019   0.000   0.193   0.000 feedparser.py:2033(characters)    236   0.017   0.000   0.017   0.000 {method 'translate' of 'str'}

You can now see that latest_tutorial.py actually spends most of its time working with sockets or handling data inside feedparser. The latter is one of the dependencies of the Real Python Reader that’s used to parse the tutorial feed.

You can use pstats to get some idea on where your code is spending most of its time and see if you can optimize any bottlenecks you find. You can also use the tool to understand the structure of your code better. For instance, the commands callees and callers will show you which functions call and are called by a given function.

You can also investigate certain functions. Let’s see how much overhead Timer causes by filtering the results with the phrase timer:

latest_tutorial.prof% stats timer         1393801 function calls (1389027 primitive calls) in 0.586 seconds Ordered by: internal time List reduced from 1443 to 8 due to restriction <'timer'> ncalls tottime percall cumtime percall filename:lineno(function)      1   0.000   0.000   0.000   0.000 timer.py:13(Timer)      1   0.000   0.000   0.000   0.000 timer.py:35(stop)      1   0.000   0.000   0.003   0.003 timer.py:3(<module>)      1   0.000   0.000   0.000   0.000 timer.py:28(start)      1   0.000   0.000   0.000   0.000 timer.py:9(TimerError)      1   0.000   0.000   0.000   0.000 timer.py:23(__post_init__)      1   0.000   0.000   0.000   0.000 timer.py:57(__exit__)      1   0.000   0.000   0.000   0.000 timer.py:52(__enter__)

Luckily, Timer causes only minimal overhead. Use quit to leave the pstats browser when you’re done investigating.

For a more powerful interface into profile data, check out KCacheGrind. It uses its own data format, but you can convert data from cProfile using pyprof2calltree:

$ pyprof2calltree -k -i latest_tutorial.prof

This command will convert latest_tutorial.prof and open KCacheGrind to analyze the data.

The last option you’ll see here for timing your code is line_profiler. cProfile can tell you which functions your code spends the most time in, but it won’t give you insights into which lines inside that function are the slowest. That’s where line_profiler can help you.

Note: You can also profile the memory consumption of your code. This falls outside the scope of this tutorial. However, you can have a look at memory-profiler if you need to monitor the memory consumption of your programs.

Note that line profiling takes time and adds a fair bit of overhead to your runtime. A more standard workflow is first to use cProfile to identify which functions to look at and then run line_profiler on those functions. line_profiler is not part of the standard library, so you should first follow the installation instructions to set it up.

Before you run the profiler, you need to tell it which functions to profile. You do this by adding a @profile decorator inside your source code. For example, to profile Timer.stop() you add the following inside timer.py:

@profiledefstop(self)->float:# The rest of the code is unchanged

Note that you don’t import profile anywhere. Instead, it’s automatically added to the global namespace when you run the profiler. You need to delete the line when you’re done profiling, though. Otherwise, you’ll get a NameError.

Next, run the profiler using kernprof, which is part of the line_profiler package:

$ kernprof -l latest_tutorial.py

This command automatically saves the profiler data in a file called latest_tutorial.py.lprof. You can see those results using line_profiler:

$ python -m line_profiler latest_tutorial.py.lprof
Timer unit: 1e-06 sTotal time: 1.6e-05 sFile: /home/realpython/timer.pyFunction: stop at line 35# Hits Time PrHit %Time Line Contents
=====================================35                      @profile36                      def stop(self) -> float:37                          """Stop the timer, and report the elapsed time"""38  1   1.0   1.0   6.2     if self._start_time is None:39                              raise TimerError(f"Timer is not running. ...")4041                          # Calculate elapsed time42  1   2.0   2.0  12.5     elapsed_time = time.perf_counter() - self._start_time43  1   0.0   0.0   0.0     self._start_time = None4445                          # Report elapsed time46  1   0.0   0.0   0.0     if self.logger:47  1  11.0  11.0  68.8         self.logger(self.text.format(elapsed_time))48  1   1.0   1.0   6.2     if self.name:49  1   1.0   1.0   6.2         self.timers[self.name] += elapsed_time5051  1   0.0   0.0   0.0     return elapsed_time

First, note that the time unit in this report is microseconds (1e-06 s). Usually, the most accessible number to look at is %Time, which tells you the percentage of the total time your code spends inside a function at each line. In this example, you can see that your code spends almost 70% of the time on line 47, which is the line that formats and prints the result of the timer.

Conclusion

In this tutorial, you’ve seen several different approaches to adding a Python timer to your code:

You used a class to keep state and add a user-friendly interface. Classes are very flexible, and using Timer directly gives you full control over how and when to invoke the timer.
You used a context manager to add features to a block of code and, if necessary, to clean up afterward. Context managers are straightforward to use, and adding with Timer() can help you more clearly distinguish your code visually.
You used a decorator to add behavior to a function. Decorators are concise and compelling, and using @Timer() is a quick way to monitor your code’s runtime.

You’ve also seen why you should prefer time.perf_counter() over time.time() when benchmarking code, as well as what other alternatives are useful when you’re optimizing your code.

Now you can add Python timer functions to your own code! Keeping track of how fast your program runs in your logs will help you monitor your scripts. Do you have ideas for other use cases where classes, context managers, and decorators play well together? Leave a comment down below!

Resources

For a deeper dive into Python timer functions, check out these resources:

codetiming is the Python timer available on PyPI.
time.perf_counter() is a performance counter for precise timings.
timeit is a tool for comparing the runtimes of code snippets.
cProfile is a profiler for finding bottlenecks in scripts and programs.
pstats is a command-line tool for looking at profiler data.
KCachegrind is a GUI for looking at profiler data.
line_profiler is a profiler for measuring individual lines of code.
memory-profiler is a profiler for monitoring memory usage.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

↧

John Cook: Minimizing context switching between shell and Python

December 30, 2019, 6:05 am

≫ Next: Mike C. Fletcher: Playing with EGL+OpenGL Off-screen Multi-Card

≪ Previous: Real Python: Python Timer Functions: Three Ways to Monitor Your Code

Sometimes you’re in the flow using the command line and you’d like to briefly switch over to Python without too much interruption. Or it could be the other way around: you’re in the Python REPL and need to issue a quick shell command.

One solution would be to run your shell and Python session in different terminals, but let’s assume that for whatever reason you’re working in just one terminal. For example, maybe you want the output of a shell command to be visually close when you run Python, or vice versa.

Calling Python from shell

You can run a Python one-liner from the shell by calling Python with the -c option. For example,

    $ python -c "print(3*7)"
    21

I hardly ever do this because I want to run more than a one-liner. What I find more useful is to launch Python with the -q option to suppress all the start-up verbiage and simply bring up a prompt.

    $ python -q
    >>>>

Calling shell from Python

If you run Python with the ipython command rather than the default python you get much better shell integration. IPython let’s you type a shell command at any point simply by preceding it with a !. For example, the following command tells us this is the 364th day of the year.

    In [1]: ! date +%j
    364

You can run some of the most common shell commands, such as cd and ls without even a bang prefix. These are “magic” commands that do what you’d expect if you forgot for a moment that you’re in a Python REPL rather than a command shell.

    In [2]: cd ..
    Out[2]: '/mnt/c/Users'

IPython also supports other forms of shell integration such as capturing the output of a shell command as a Python variable, or using a Python variable as an argument to a shell command.

More on context switching and shells

↧

Mike C. Fletcher: Playing with EGL+OpenGL Off-screen Multi-Card

December 30, 2019, 9:34 am

≫ Next: qutebrowser development blog: 2019 qutebrowser crowdfunding - reminder

≪ Previous: John Cook: Minimizing context switching between shell and Python

So I've now spent the last day and a half playing with getting EGL offscreen rendering working on Linux. There are two major ways to do off-screen rendering with EGL and OpenGL. In the first, you use a pbuffer surface, that surface is basically a purpose-defined surface-type for off-screen backing of a renderer. When I use the EGL enumeration API we always seem to get pbuffer compatible visuals (and *not* window compatible ones).

On Ubuntu 18.04 the enumeration API seems to be... problematic, lots of segfaults, particularly with the VirtualBox driver that shows up in the enumerations. On Ubuntu 19.10 the behaviour is much more reliable, with all 3 GPUs in my prime-based nVidia/Intel laptop (including the VirtualBox GPU) completing the OpenGL query for version, extensions, etc. The missing bit is being able to specify which GPU to use, as the EGL query API doesn't seem to have a way to get a "name" that a user would recognise to describe the card.

The second way to do EGL offscreen is to use a GBM device, which does *not* seem to work with nVidia binary drivers, but does seem to run on Intel and VirtualBox GPUs. The nice thing about the GBM devices is that you can tell which device you are selecting, so you can say "run with /dev/dri/card1" and know that we should wind up running on that particular GPU. Weirdly, the Intel card seems to support pbuffer when run via EGL device query, but only window when run with gbm. Because the Intel device doesn't seem to run with the query on Ubuntu 18.04, the gbm device is still useful (it may also have a performance benefit, but I haven't looked at that).

You can see the work in the develop branch of PyOpenGL on github. If you have an exotic platform which should support EGL+OpenGL, feel free to check out the branch and see what fails. One thing I should note is that I'm using "multi gpu" here more to mean "targetting a particular GPU" not "running two GPUs at once", as I expect there are function pointer caches in PyOpenGL which would mean that running two different OpenGL implementations in the same process would result in segfaults and/or rendering on the wrong context.

↧

qutebrowser development blog: 2019 qutebrowser crowdfunding - reminder

December 30, 2019, 9:54 am

≫ Next: Zero-with-Dot (Oleg Żero): Training on batch: how to split data effectively?

≪ Previous: Mike C. Fletcher: Playing with EGL+OpenGL Off-screen Multi-Card

Two months ago, I wrote:

Just like in the 2017/2018 crowdfundings, it'll be possible to get t-shirts and stickers again. I'll also add some new swag to the mix :)

Just a quick reminder: If you want physical rewards with the current perk levels, sign up to the GitHub Sponsors …

↧

Python Jobs

Articles & Tutorials

Projects & Code

Events

What is Pipx?

How to Install Pipx

How to Install a Python Package Using Pipx

Installing a Python Package with Pipx

Pipx list: See all Installed Python Packages

Using Pipx to Run a Python App

How to Uninstall Python Packages with Pipx

Upgrading Python Packages with Pipx

Conclusion

Using events in ivp_solve

Theoretical results

Related posts

Introduction

Heap Sort

The Heap Data Structure

Implementation

Sorting Arrays

Sorting Custom Objects

Comparison to Other Sorting Algorithms

Conclusion

What is Market Basket Analysis?

Market Basket Analysis with Python and Pandas

Market Basket Analysis with MLxtend

When can you not use MLxtend?

Dummy Coding for Regression Analysis

What is Categorical Data?

What is a Dummy Variable?

Installing Pandas

Example Data to Dummy Code

Import Data in Python using Pandas

Creating Dummy Variables in Python

How to Make Dummy Variables in Python with Two Levels

First Dummy Coding in Python Example:

More Python Dummy Coding Examples:

Remove Prefix and Separator from Dummy Columns

How to Create Dummy Variables in Python with Three Levels

Create a Dataframe with Dummy Coded Variables

Creating Dummy Variables in Python for Many Columns

Creating Multiple Dummy Variables Example Code:

Conclusion: Dummy Coding in Python

Python Timers

Python Timer Functions

Example: Download Tutorials

Your First Python Timer

A Python Timer Class

Understanding Classes in Python

Creating a Python Timer Class

Using the Python Timer Class

Adding More Convenience and Flexibility

A Python Timer Context Manager

Understanding Context Managers in Python

Creating a Python Timer Context Manager

Using the Python Timer Context Manager

A Python Timer Decorator

Understanding Decorators in Python

Creating a Python Timer Decorator

Using the Python Timer Decorator

The Python Timer Code

Other Python Timer Functions

Using Alternative Python Timer Functions

Estimating Running Time With timeit

Finding Bottlenecks in Your Code With Profilers

Conclusion

Resources

Calling Python from shell

Calling shell from Python

More on context switching and shells

Estimating Running Time With `timeit`