Quantcast
Channel: Planet Python
Viewing all 22874 articles
Browse latest View live

codingdirectional: Change python string to lower or upper case

$
0
0

In this article, we will create a function which will take in a string and then change the word within that string to either all uppercases if most of the words within that string are uppercase or all lowercases if most of those words are either lowercase or the word counts for the uppercase word and lowercase word are equal.

def solve(s):
    list_string = list(s)
    upper = 0
    lower = 0
    for word in list_string:
        if word.isupper():
            upper += 1
        else:
            lower += 1
    if upper > lower :
        return s.upper()
    elif upper < lower:
        return s.lower()
    else:
        return s.lower()

Very simple solution, if you have better idea leave your comment below.


Talk Python to Me: #225 Can subinterpreters free us from Python's GIL?

$
0
0
Have you heard that Python is not good for writing concurrent asynchronous code? This is generally a misconception. But there is one class of parallel computing that Python is not good at: CPU bound work running the Python layer.

Erik Marsja: Repeated Measures ANOVA in R and Python using afex & pingouin

$
0
0

The post Repeated Measures ANOVA in R and Python using afex & pingouin appeared first on Erik Marsja.

In this post we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA f or within subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons using R or Python for data analysis (i.e., ANOVA).

What is ANOVA?

Before we go into how to carry out repeated measures ANOVA in R and Python, we are briefly going to learn what an ANOVA is. An ANOVA test is a parametrical method to find out whether the results from collected data are significant. That is, this type of test will enable us to figure out whether we should to reject the null hypothesis or accept the alternate hypothesis. In a between ANOVA we’re testing groups to see if there’s a statistical difference between them. In this post we are going to learn to do repeated measures ANOVA, however, and using this method we compare means across one or more variables that are based on repeated observations. These repeated observations can either be time points or different conditions. In the repeated ANOVA examples below we use different conditions.

For more information about ANOVA:

Data

In this repeated measures ANOVA example, we will use fake data (can be downloaded here). This fake data is a sample of 60 adults responding as fast as they can to visual stimuli. This, the dependent variable (DV) is response time to the visual stimuli. While the subjects were categorizing visual stimuli, they were either exposed to background noise or quiet (independent variable, iv1).

In the first example, we are going to use these two conditions (iv1) when we carry out a one-way ANOVA for repeated measures. Furthermore, the visual stimuli could either be presented in the upper part, lower part, or in the middle part of the computer screen (independent variable, iv2). A CSV file with the data used in this ANOVA tutorial can be downloaded here.

The variables given in the data set:

  • Sub_id = Subject ID #
  • iv1 = Noise condition; quiet or noise
  • iv2 = Location condition; upper, lower, middle
  • DV = response time

Repeated Measures ANOVA in R

In this section we are going to learn how to do a repeated measures ANOVA in R using afex. More specifically, we are going to learn how carry out a one-way and two-way ANOVA using the aov_ez function. Note, working with aov_ez function we need to have our data in long format.

Installing afex

First, we are going to install the needed package: afex. In the code chunk, below , the package will only be installed if it’s not already installed.

list.of.packages <- c("afex", "emmeans")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

One-Way Repeated Measures ANOVA in R.

In the first example, we are going to carry out a one-way repeated measures ANOVA in R using aov_ez. Here we want to know whether there is any difference in response time with background noise compared to without background noise. To test this, we need to conduct a within-subjects ANOVA.

In the first code chunk, below, we load the package, the data, and print the first 5 rows using head

require(afex)

df <- read.csv(file='./Python_ANOVA/rmAOV2way.csv',
     header=TRUE, sep=',')

head(df)

Example ANOVA for Within-Subjects Design:

aov <- aov_ez('Sub_id', 'rt',
              fun_aggregate = mean, df, within = 'iv1')
print(aov)

Two-Way Repeated Measures ANOVA in R

In the second example, we are going to conduct a two-way repeated measures ANOVA in R. Here we want to know whether there is any difference in response time during background noise compared to without background noise, and whether there is a difference depending on where the visual stimuli are presented (up, down, middle). Finally, we are interested if there is an interaction between the noise and location conditions.

aov <- aov_ez('Sub_id', 'rt', fun_aggregate = mean,
              df, within = c('iv1', 'iv2'))
print(aov)
  

Plotting an Interaction

The R package afex also have a function to plot an interaction. Now, before continuing with the Python ANOVA, we are going to use this function.

afex_plot(aov, x = "iv1", trace = "iv2",
         error = "within")
  

As can be seen, and confirmed by the ANOVA table above, we see that there is no interaction. If we had an interaction, we could follow this up with pairwise comparisons using the package emmeans.

Here’s a Jupyter Notebook containing the above code examples.

Repeated Measures ANOVA in Python

Now that we know how to conduct a within-subjects ANOVA in R we are going to carry out the same ANOVA in Python. In a previous post, we learned how to use the class AnovaRM from the Python package Statsmodels. In this post, however, we are going to use the package pingouin and the function anova_rm. Note, this function can handle both a wide and a long format data file.

One-Way Repeated Measures ANOVA in Python

In the first example, we are going to conduct a one-way ANOVA for repeated measures using Python. We start by imporring pandas as pd and pingoin as pg:

import pandas as pd
import pingouin as pg


df = pd.read_csv('./Python_ANOVA/rmAOV2way.csv')
df.head()
   

Learn more about how to work with Pandas dataframe and load data from different file types:

Now we can carry out our repeated measures ANOVA using Python:

aov = pg.rm_anova(dv='rt', within='iv1',
                   subject='Sub_id', data=df, detailed=True)
print(aov.round(2))

Two-Way Repeated Measures ANOVA in Python

In the second example, we are going to carry out a two-way ANOVA for repeated measures using Python.

 aov = pg.rm_anova(dv='rt',
                   within=['iv1', 'iv2'],
                   subject='Sub_id', data=df)
print(aov.round(2))
        

Interaction Plot in Python using Seaborn

For completeness, even though we didn’t have a significant interaction, we are going to create an interaction plot using Seaborn:

import seaborn as sns

ax = sns.pointplot(x="iv1", y="rt", hue="iv2",
                    data=df)
        

Learn more about data visualization in Python:

Pingouin also comes with a function to carry out pairwise comparison. If we had a significant interaction, we could use it. See this post for an example how to use this function.

Here’s a Jupyter Notebook containing the Python ANOVA examples above.

Conclusion: R vs Python

In this post, we have learned how to carry out one-way and two-way ANOVA for repeated measures using R and Python. We have used the r-package afex and the Python package pingouin. Both afex and pingouin are quite similar; they offer the Greenhouse-Geisser correction. In afex, however, you can c hoose to get either partial eta-squared or general eta-squared effect sizes. Furthermore, as can be seen in the ANOVA tables the results are basically the same.

In conclusion, the packages afex and pingouin offers an easy way to carry out ANOVA for within-subject designs in R and Python, respectively.

Resources

Here are some previous posts on how to carry out ANOVA in Python:

 

The post Repeated Measures ANOVA in R and Python using afex & pingouin appeared first on Erik Marsja.

Podcast.__init__: Learning To Program In Python With CodeGrades

$
0
0
With the increasing role of software in our world there has been an accompanying focus on teaching people to program. There are numerous approaches that have been attempted to achieve this goal with varying levels of success. Nicholas Tollervey has begun a new effort that blends the approach adopted by musicians and martial artists that uses a series of grades to provide recognition for the achievements of students. In this episode he explains how he has structured the study groups, syllabus, and evaluations to help learners build projects based on their interests and guide their own education while incorporating useful skills that are necessary for a career in software. If you are interested in learning to program, teach others, or act as a mentor then give this a listen and then get in touch with Nicholas to help make this endeavor a success.

Summary

With the increasing role of software in our world there has been an accompanying focus on teaching people to program. There are numerous approaches that have been attempted to achieve this goal with varying levels of success. Nicholas Tollervey has begun a new effort that blends the approach adopted by musicians and martial artists that uses a series of grades to provide recognition for the achievements of students. In this episode he explains how he has structured the study groups, syllabus, and evaluations to help learners build projects based on their interests and guide their own education while incorporating useful skills that are necessary for a career in software. If you are interested in learning to program, teach others, or act as a mentor then give this a listen and then get in touch with Nicholas to help make this endeavor a success.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today Nicholas Tollervey is back to talk about his work on CodeGrades, a new effort that he is building to blend his backgrounds in music, education, and software to help teach kids of all ages how to program.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what CodeGrades is and what motivated you to start this project?
    • How does it differ from other approaches to teaching software development that you have encountered?
    • Is there a particular age or level of background knowledge that you are targeting with the curriculum that you are developing?
  • What are the criteria that you are measuring against and how does that criteria change as you progress in grade levels?
  • For someone who completes the full set of levels, what level of capability would you expect them to have as a developer?
  • Given your affiliation with the Python community it is understandable that you would target that language initially. What would be involved in adapting the curriculum, mentorship, and assessments to other languages?
    • In what other ways can this idea and platform be adapted to accomodate other engineering skills? (e.g. system administration, statistics, graphic design, etc.)
  • What interesting/exciting/unexpected outcomes and lessons have you found while iterating on this idea?
  • For engineers who would like to be involved in the CodeGrades platform, how can they contribute?
  • What challenges do you anticipate as you continue to develop the curriculum and mentor networks?
  • How do you envision the future of CodeGrades taking ship in the medium to long term?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Real Python: Your Guide to the Python Print Function

$
0
0

If you’re like most Python users, including me, then you probably started your Python journey by learning about print(). It helped you write your very own hello world one-liner. You can use it to display formatted messages onto the screen and perhaps find some bugs. But if you think that’s all there is to know about Python’s print() function, then you’re missing out on a lot!

Keep reading to take full advantage of this seemingly boring and unappreciated little function. This tutorial will get you up to speed with using Python print() effectively. However, prepare for a deep dive as you go through the sections. You may be surprised how much print() has to offer!

By the end of this tutorial, you’ll know how to:

  • Avoid common mistakes with Python’s print()
  • Deal with newlines, character encodings, and buffering
  • Write text to files
  • Mock print() in unit tests
  • Build advanced user interfaces in the terminal

If you’re a complete beginner, then you’ll benefit most from reading the first part of this tutorial, which illustrates the essentials of printing in Python. Otherwise, feel free to skip that part and jump around as you see fit.

Note:print() was a major addition to Python 3, in which it replaced the old print statement available in Python 2.

There were a number of good reasons for that, as you’ll see shortly. Although this tutorial focuses on Python 3, it does show the old way of printing in Python for reference.

Free Bonus:Click here to get our free Python Cheat Sheet that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Printing in a Nutshell

Let’s jump in by looking at a few real-life examples of printing in Python. By the end of this section, you’ll know every possible way of calling print(). Or, in programmer lingo, you’d say you’ll be familiar with the function signature.

Calling Print

The simplest example of using Python print() requires just a few keystrokes:

>>>
>>> print()

You don’t pass any arguments, but you still need to put empty parentheses at the end, which tell Python to actually execute the function rather than just refer to it by name.

This will produce an invisible newline character, which in turn will cause a blank line to appear on your screen. You can call print() multiple times like this to add vertical space. It’s just as if you were hitting Enter on your keyboard in a word processor.

A newline character is a special control character used to indicate the end of a line (EOL). It usually doesn’t have a visible representation on the screen, but some text editors can display such non-printable characters with little graphics.

The word “character” is somewhat of a misnomer in this case, because a newline is often more than one character long. For example, the Windows operating system, as well as the HTTP protocol, represent newlines with a pair of characters. Sometimes you need to take those differences into account to design truly portable programs.

To find out what constitutes a newline in your operating system, use Python’s built-in os module.

This will immediately tell you that Windows and DOS represent the newline as a sequence of \r followed by \n:

>>>
>>> importos>>> os.linesep'\r\n'

On Unix, Linux, and recent versions of macOS, it’s a single \n character:

>>>
>>> importos>>> os.linesep'\n'

The classic Mac OS X, however, sticks to its own “think different” philosophy by choosing yet another representation:

>>>
>>> importos>>> os.linesep'\r'

Notice how these characters appear in string literals. They use special syntax with a preceding backslash (\) to denote the start of an escape character sequence. Such sequences allow for representing control characters, which would be otherwise invisible on screen.

Most programming languages come with a predefined set of escape sequences for special characters such as these:

  • \\: backslash
  • \b: backspace
  • \t: tab
  • \r: carriage return (CR)
  • \n: newline, also known as line feed (LF)

The last two are reminiscent of mechanical typewriters, which required two separate commands to insert a newline. The first command would move the carriage back to the beginning of the current line, while the second one would advance the roll to the next line.

By comparing the corresponding ASCII character codes, you’ll see that putting a backslash in front of a character changes its meaning completely. However, not all characters allow for this–only the special ones.

To compare ASCII character codes, you may want to use the built-in ord() function:

>>>
>>> ord('r')114>>> ord('\r')13

Keep in mind that, in order to form a correct escape sequence, there must be no space between the backslash character and a letter!

As you just saw, calling print() without arguments results in a blank line, which is a line comprised solely of the newline character. Don’t confuse this with an empty line, which doesn’t contain any characters at all, not even the newline!

You can use Python’s string literals to visualize these two:

'\n'# Blank line''# Empty line

The first one is one character long, whereas the second one has no content.

Note: To remove the newline character from a string in Python, use its .rstrip() method, like this:

>>>
>>> 'A line of text.\n'.rstrip()'A line of text.'

This strips any trailing whitespace from the right edge of the string of characters.

In a more common scenario, you’d want to communicate some message to the end user. There are a few ways to achieve this.

First, you may pass a string literal directly to print():

>>>
>>> print('Please wait while the program is loading...')

This will print the message verbatim onto the screen.

String literals in Python can be enclosed either in single quotes (') or double quotes ("). According to the official PEP 8 style guide, you should just pick one and keep using it consistently. There’s no difference, unless you need to nest one in another.

For example, you can’t use double quotes for the literal and also include double quotes inside of it, because that’s ambiguous for the Python interpreter:

"My favorite book is "PythonTricks""# Wrong!

What you want to do is enclose the text, which contains double quotes, within single quotes:

'My favorite book is "Python Tricks"'

The same trick would work the other way around:

"My favorite book is 'Python Tricks'"

Alternatively, you could use escape character sequences mentioned earlier, to make Python treat those internal double quotes literally as part of the string literal:

"My favorite book is \"Python Tricks\""

Escaping is fine and dandy, but it can sometimes get in the way. Specifically, when you need your string to contain relatively many backslash characters in literal form.

One classic example is a file path on Windows:

'C:\Users\jdoe'# Wrong!'C:\\Users\\jdoe'

Notice how each backslash character needs to be escaped with yet another backslash.

This is even more prominent with regular expressions, which quickly get convoluted due to the heavy use of special characters:

'^\\w:\\\\(?:(?:(?:[^\\\\]+)?|(?:[^\\\\]+)\\\\[^\\\\]+)*)$'

Fortunately, you can turn off character escaping entirely with the help of raw-string literals. Simply prepend an r or R before the opening quote, and now you end up with this:

r'C:\Users\jdoe'r'^\w:\\(?:(?:(?:[^\\]+)?|(?:[^\\]+)\\[^\\]+)*)$'

That’s much better, isn’t it?

There are a few more prefixes that give special meaning to string literals in Python, but you won’t get into them here.

Lastly, you can define multi-line string literals by enclosing them between ''' or """, which are often used as docstrings.

Here’s an example:

"""This is an exampleof a multi-line stringin Python."""

To prevent an initial newline, simply put the text right after the opening """:

"""This is an exampleof a multi-line stringin Python."""

You can also use a backslash to get rid of the newline:

"""\This is an exampleof a multi-line stringin Python."""

To remove indentation from a multi-line string, you might take advantage of the built-in textwrap module:

>>>
>>> importtextwrap>>> paragraph='''...     This is an example...     of a multi-line string...     in Python....     '''...>>> print(paragraph)    This is an example    of a multi-line string    in Python.>>> print(textwrap.dedent(paragraph).strip())This is an exampleof a multi-line stringin Python.

This will take care of unindenting paragraphs for you. There are also a few other useful functions in textwrap for text alignment you’d find in a word processor.

Secondly, you could extract that message into its own variable with a meaningful name to enhance readability and promote code reuse:

>>>
>>> message='Please wait while the program is loading...'>>> print(message)

Lastly, you could pass an expression, like string concatenation, to be evaluated before printing the result:

>>>
>>> importos>>> print('Hello, '+os.getlogin()+'! How are you?')Hello, jdoe! How are you?

In fact, there are a dozen ways to format messages in Python. I highly encourage you to take a look at f-strings, introduced in Python 3.6, because they offer the most concise syntax of them all:

>>>
>>> importos>>> print(f'Hello, {os.getlogin()}! How are you?')

Moreover, f-strings will prevent you from making a common mistake, which is forgetting to type cast concatenated operands. Python is a strongly typed language, which means it won’t allow you to do this:

>>>
>>> 'My age is '+42Traceback (most recent call last):
  File "<input>", line 1, in <module>'My age is '+42TypeError: can only concatenate str (not "int") to str

That’s wrong because adding numbers to strings doesn’t make sense. You need to explicitly convert the number to string first, in order to join them together:

>>>
>>> 'My age is '+str(42)'My age is 42'

Unless you handle such errors yourself, the Python interpreter will let you know about a problem by showing a traceback.

Note:str() is a global built-in function that converts an object into its string representation.

You can call it directly on any object, for example, a number:

>>>
>>> str(3.14)'3.14'

Built-in data types have a predefined string representation out of the box, but later in this article, you’ll find out how to provide one for your custom classes.

As with any function, it doesn’t matter whether you pass a literal, a variable, or an expression. Unlike many other functions, however, print() will accept anything regardless of its type.

So far, you only looked at the string, but how about other data types? Let’s try literals of different built-in types and see what comes out:

>>>
>>> print(42)# <class 'int'>42>>> print(3.14)# <class 'float'>3.14>>> print(1+2j)# <class 'complex'>(1+2j)>>> print(True)# <class 'bool'>True>>> print([1,2,3])# <class 'list'>[1, 2, 3]>>> print((1,2,3))# <class 'tuple'>(1, 2, 3)>>> print({'red','green','blue'})# <class 'set'>{'red', 'green', 'blue'}>>> print({'name':'Alice','age':42})# <class 'dict'>{'name': 'Alice', 'age': 42}>>> print('hello')# <class 'str'>hello

Watch out for the None constant, though. Despite being used to indicate an absence of a value, it will show up as 'None' rather than an empty string:

>>>
>>> print(None)None

How does print() know how to work with all these different types? Well, the short answer is that it doesn’t. It implicitly calls str() behind the scenes to type cast any object into a string. Afterward, it treats strings in a uniform way.

Later in this tutorial, you’ll learn how to use this mechanism for printing custom data types such as your classes.

Okay, you’re now able to call print() with a single argument or without any arguments. You know how to print fixed or formatted messages onto the screen. The next subsection will expand on message formatting a little bit.

To achieve the same result in the previous language generation, you’d normally want to drop the parentheses enclosing the text:

# Python 2printprint'Please wait...'print'Hello, %s! How are you?'%os.getlogin()print'Hello, %s. Your age is %d.'%(name,age)

That’s because print wasn’t a function back then, as you’ll see in the next section. Note, however, that in some cases parentheses in Python are redundant. It wouldn’t harm to include them as they’d just get ignored. Does that mean you should be using the print statement as if it were a function? Absolutely not!

For example, parentheses enclosing a single expression or a literal are optional. Both instructions produce the same result in Python 2:

>>>
>>> # Python 2>>> print'Please wait...'Please wait...>>> print('Please wait...')Please wait...

Round brackets are actually part of the expression rather than the print statement. If your expression happens to contain only one item, then it’s as if you didn’t include the brackets at all.

On the other hand, putting parentheses around multiple items forms a tuple:

>>>
>>> # Python 2>>> print'My name is','John'My name is John>>> print('My name is','John')('My name is', 'John')

This is a known source of confusion. In fact, you’d also get a tuple by appending a trailing comma to the only item surrounded by parentheses:

>>>
>>> # Python 2>>> print('Please wait...')Please wait...>>> print('Please wait...',)# Notice the comma('Please wait...',)

The bottom line is that you shouldn’t call print with brackets in Python 2. Although, to be completely accurate, you can work around this with the help of a __future__ import, which you’ll read more about in the relevant section.

Separating Multiple Arguments

You saw print() called without any arguments to produce a blank line and then called with a single argument to display either a fixed or a formatted message.

However, it turns out that this function can accept any number of positional arguments, including zero, one, or more arguments. That’s very handy in a common case of message formatting, where you’d want to join a few elements together.

Arguments can be passed to a function in one of several ways. One way is by explicitly naming the arguments when you’re calling the function, like this:

>>>
>>> defdiv(a,b):... returna/b...>>> div(a=3,b=4)0.75

Since arguments can be uniquely identified by name, their order doesn’t matter. Swapping them out will still give the same result:

>>>
>>> div(b=4,a=3)0.75

Conversely, arguments passed without names are identified by their position. That’s why positional arguments need to follow strictly the order imposed by the function signature:

>>>
>>> div(3,4)0.75>>> div(4,3)1.3333333333333333

print() allows an arbitrary number of positional arguments thanks to the *args parameter.

Let’s have a look at this example:

>>>
>>> importos>>> print('My name is',os.getlogin(),'and I am',42)My name is jdoe and I am 42

print() concatenated all four arguments passed to it, and it inserted a single space between them so that you didn’t end up with a squashed message like 'My name isjdoeand I am42'.

Notice that it also took care of proper type casting by implicitly calling str() on each argument before joining them together. If you recall from the previous subsection, a naïve concatenation may easily result in an error due to incompatible types:

>>>
>>> print('My age is: '+42)Traceback (most recent call last):
  File "<input>", line 1, in <module>print('My age is: '+42)TypeError: can only concatenate str (not "int") to str

Apart from accepting a variable number of positional arguments, print() defines four named or keyword arguments, which are optional since they all have default values. You can view their brief documentation by calling help(print) from the interactive interpreter.

Let’s focus on sep just for now. It stands for separator and is assigned a single space (' ') by default. It determines the value to join elements with.

It has to be either a string or None, but the latter has the same effect as the default space:

>>>
>>> print('hello','world',sep=None)hello world>>> print('hello','world',sep=' ')hello world>>> print('hello','world')hello world

If you wanted to suppress the separator completely, you’d have to pass an empty string ('') instead:

>>>
>>> print('hello','world',sep='')helloworld

You may want print() to join its arguments as separate lines. In that case, simply pass the escaped newline character described earlier:

>>>
>>> print('hello','world',sep='\n')helloworld

A more useful example of the sep parameter would be printing something like file paths:

>>>
>>> print('home','user','documents',sep='/')home/user/documents

Remember that the separator comes between the elements, not around them, so you need to account for that in one way or another:

>>>
>>> print('/home','user','documents',sep='/')/home/user/documents>>> print('','home','user','documents',sep='/')/home/user/documents

Specifically, you can insert a slash character (/) into the first positional argument, or use an empty string as the first argument to enforce the leading slash.

Note: Be careful about joining elements of a list or tuple.

Doing it manually will result in a well-known TypeError if at least one of the elements isn’t a string:

>>>
>>> print(' '.join(['jdoe is',42,'years old']))Traceback (most recent call last):
  File "<input>", line 1, in <module>print(','.join(['jdoe is',42,'years old']))TypeError: sequence item 1: expected str instance, int found

It’s safer to just unpack the sequence with the star operator (*) and let print() handle type casting:

>>>
>>> print(*['jdoe is',42,'years old'])jdoe is 42 years old

Unpacking is effectively the same as calling print() with individual elements of the list.

One more interesting example could be exporting data to a comma-separated values (CSV) format:

>>>
>>> print(1,'Python Tricks','Dan Bader',sep=',')1,Python Tricks,Dan Bader

This wouldn’t handle edge cases such as escaping commas correctly, but for simple use cases, it should do. The line above would show up in your terminal window. In order to save it to a file, you’d have to redirect the output. Later in this section, you’ll see how to use print() to write text to files straight from Python.

Finally, the sep parameter isn’t constrained to a single character only. You can join elements with strings of any length:

>>>
>>> print('node','child','child',sep=' -> ')node -> child -> child

In the upcoming subsections, you’ll explore the remaining keyword arguments of the print() function.

To print multiple elements in Python 2, you must drop the parentheses around them, just like before:

>>>
>>> # Python 2>>> importos>>> print'My name is',os.getlogin(),'and I am',42My name is jdoe and I am 42

If you kept them, on the other hand, you’d be passing a single tuple element to the print statement:

>>>
>>> # Python 2>>> importos>>> print('My name is',os.getlogin(),'and I am',42)('My name is', 'jdoe', 'and I am', 42)

Moreover, there’s no way of altering the default separator of joined elements in Python 2, so one workaround is to use string interpolation like so:

>>>
>>> # Python 2>>> importos>>> print'My name is %s and I am %d'%(os.getlogin(),42)My name is jdoe and I am 42

That was the default way of formatting strings until the .format() method got backported from Python 3.

Preventing Line Breaks

Sometimes you don’t want to end your message with a trailing newline so that subsequent calls to print() will continue on the same line. Classic examples include updating the progress of a long-running operation or prompting the user for input. In the latter case, you want the user to type in the answer on the same line:

Are you sure you want to do this? [y/n] y

Many programming languages expose functions similar to print() through their standard libraries, but they let you decide whether to add a newline or not. For example, in Java and C#, you have two distinct functions, while other languages require you to explicitly append \n at the end of a string literal.

Here are a few examples of syntax in such languages:

LanguageExample
Perlprint "hello world\n"
Cprintf("hello world\n");
C++std::cout << "hello world"<< std::endl;

In contrast, Python’s print() function always adds \n without asking, because that’s what you want in most cases. To disable it, you can take advantage of yet another keyword argument, end, which dictates what to end the line with.

In terms of semantics, the end parameter is almost identical to the sep one that you saw earlier:

  • It must be a string or None.
  • It can be arbitrarily long.
  • It has a default value of '\n'.
  • If equal to None, it’ll have the same effect as the default value.
  • If equal to an empty string (''), it’ll suppress the newline.

Now you understand what’s happening under the hood when you’re calling print() without arguments. Since you don’t provide any positional arguments to the function, there’s nothing to be joined, and so the default separator isn’t used at all. However, the default value of end still applies, and a blank line shows up.

Note: You may be wondering why the end parameter has a fixed default value rather than whatever makes sense on your operating system.

Well, you don’t have to worry about newline representation across different operating systems when printing, because print() will handle the conversion automatically. Just remember to always use the \n escape sequence in string literals.

This is currently the most portable way of printing a newline character in Python:

>>>
>>> print('line1\nline2\nline3')line1line2line3

If you were to try to forcefully print a Windows-specific newline character on a Linux machine, for example, you’d end up with broken output:

>>>
>>> print('line1\r\nline2\r\nline3')line3

On the flip side, when you open a file for reading with open(), you don’t need to care about newline representation either. The function will translate any system-specific newline it encounters into a universal '\n'. At the same time, you have control over how the newlines should be treated both on input and output if you really need that.

To disable the newline, you must specify an empty string through the end keyword argument:

print('Checking file integrity...',end='')# (...)print('ok')

Even though these are two separate print() calls, which can execute a long time apart, you’ll eventually see only one line. First, it’ll look like this:

Checking file integrity...

However, after the second call to print(), the same line will appear on the screen as:

Checking file integrity...ok

As with sep, you can use end to join individual pieces into a big blob of text with a custom separator. Instead of joining multiple arguments, however, it’ll append text from each function call to the same line:

print('The first sentence',end='. ')print('The second sentence',end='. ')print('The last sentence.')

These three instructions will output a single line of text:

The first sentence. The second sentence. The last sentence.

You can mix the two keyword arguments:

print('Mercury','Venus','Earth',sep=', ',end=', ')print('Mars','Jupiter','Saturn',sep=', ',end=', ')print('Uranus','Neptune','Pluto',sep=', ')

Not only do you get a single line of text, but all items are separated with a comma:

Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, Pluto

There’s nothing to stop you from using the newline character with some extra padding around it:

print('Printing in a Nutshell',end='\n * ')print('Calling Print',end='\n * ')print('Separating Multiple Arguments',end='\n * ')print('Preventing Line Breaks')

It would print out the following piece of text:

Printing in a Nutshell
 * Calling Print
 * Separating Multiple Arguments
 * Preventing Line Breaks

As you can see, the end keyword argument will accept arbitrary strings.

Note: Looping over lines in a text file preserves their own newline characters, which combined with the print() function’s default behavior will result in a redundant newline character:

>>>
>>> withopen('file.txt')asfile_object:... forlineinfile_object:... print(line)...Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmodtempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo

There are two newlines after each line of text. You want to strip one of the them, as shown earlier in this article, before printing the line:

print(line.rstrip())

Alternatively, you can keep the newline in the content but suppress the one appended by print() automatically. You’d use the end keyword argument to do that:

>>>
>>> withopen('file.txt')asfile_object:... forlineinfile_object:... print(line,end='')...Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmodtempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo

By ending a line with an empty string, you effectively disable one of the newlines.

You’re getting more acquainted with printing in Python, but there’s still a lot of useful information ahead. In the upcoming subsection, you’ll learn how to intercept and redirect the print() function’s output.

Preventing a line break in Python 2 requires that you append a trailing comma to the expression:

print'hello world',

However, that’s not ideal because it also adds an unwanted space, which would translate to end=' ' instead of end='' in Python 3. You can test this with the following code snippet:

print'BEFORE'print'hello',print'AFTER'

Notice there’s a space between the words hello and AFTER:

BEFORE
hello AFTER

In order to get the expected result, you’d need to use one of the tricks explained later, which is either importing the print() function from __future__ or falling back to the sys module:

importsysprint'BEFORE'sys.stdout.write('hello')print'AFTER'

This will print the correct output without extra space:

BEFORE
helloAFTER

While using the sys module gives you control over what gets printed to the standard output, the code becomes a little bit more cluttered.

Printing to a File

Believe it or not, print() doesn’t know how to turn messages into text on your screen, and frankly it doesn’t need to. That’s a job for lower-level layers of code, which understand bytes and know how to push them around.

print() is an abstraction over these layers, providing a convenient interface that merely delegates the actual printing to a stream or file-like object. A stream can be any file on your disk, a network socket, or perhaps an in-memory buffer.

In addition to this, there are three standard streams provided by the operating system:

  1. stdin: standard input
  2. stdout: standard output
  3. stderr: standard error

Standard output is what you see in the terminal when you run various command-line programs including your own Python scripts:

$ cat hello.py 
print('This will appear on stdout')$ python hello.py
This will appear on stdout

Unless otherwise instructed, print() will default to writing to standard output. However, you can tell your operating system to temporarily swap out stdout for a file stream, so that any output ends up in that file rather than the screen:

$ python hello.py > file.txt
$ cat file.txt
This will appear on stdout

That’s called stream redirection.

The standard error is similar to stdout in that it also shows up on the screen. Nonetheless, it’s a separate stream, whose purpose is to log error messages for diagnostics. By redirecting one or both of them, you can keep things clean.

Note: To redirect stderr, you need to know about file descriptors, also known as file handles.

They’re arbitrary, albeit constant, numbers associated with standard streams. Below, you’ll find a summary of the file descriptors for a family of POSIX-compliant operating systems:

StreamFile Descriptor
stdin0
stdout1
stderr2

Knowing those descriptors allows you to redirect one or more streams at a time:

CommandDescription
./program > out.txtRedirect stdout
./program 2> err.txtRedirect stderr
./program > out.txt 2> err.txtRedirect stdout and stderr to separate files
./program &> out_err.txtRedirect stdout and stderr to the same file

Note that > is the same as 1>.

Some programs use different coloring to distinguish between messages printed to stdout and stderr:

The output of a program executed in PyCharmRun Tool Window in PyCharm

While both stdout and stderr are write-only, stdin is read-only. You can think of standard input as your keyboard, but just like with the other two, you can swap out stdin for a file to read data from.

In Python, you can access all standard streams through the built-in sys module:

>>>
>>> importsys>>> sys.stdin<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>>>> sys.stdin.fileno()0>>> sys.stdout<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>>>> sys.stdout.fileno()1>>> sys.stderr<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>>>> sys.stderr.fileno()2

As you can see, these predefined values resemble file-like objects with mode and encoding attributes as well as .read() and .write() methods among many others.

By default, print() is bound to sys.stdout through its file argument, but you can change that. Use that keyword argument to indicate a file that was open in write or append mode, so that messages go straight to it:

withopen('file.txt',mode='w')asfile_object:print('hello world',file=file_object)

This will make your code immune to stream redirection at the operating system level, which might or might not be desired.

For more information on working with files in Python, you can check out Reading and Writing Files in Python (Guide).

Note: Don’t try using print() for writing binary data as it’s only well suited for text.

Just call the binary file’s .write() directly:

withopen('file.dat','wb')asfile_object:file_object.write(bytes(4))file_object.write(b'\xff')

If you wanted to write raw bytes on the standard output, then this will fail too because sys.stdout is a character stream:

>>>
>>> importsys>>> sys.stdout.write(bytes(4))Traceback (most recent call last):
  File "<stdin>", line 1, in <module>TypeError: write() argument must be str, not bytes

You must dig deeper to get a handle of the underlying byte stream instead:

>>>
>>> importsys>>> num_bytes_written=sys.stdout.buffer.write(b'\x41\x0a')A

This prints an uppercase letter A and a newline character, which correspond to decimal values of 65 and 10 in ASCII. However, they’re encoded using hexadecimal notation in the bytes literal.

Note that print() has no control over character encoding. It’s the stream’s responsibility to encode received Unicode strings into bytes correctly. In most cases, you won’t set the encoding yourself, because the default UTF-8 is what you want. If you really need to, perhaps for legacy systems, you can use the encoding argument of open():

withopen('file.txt',mode='w',encoding='iso-8859-1')asfile_object:print('über naïve café',file=file_object)

Instead of a real file existing somewhere in your file system, you can provide a fake one, which would reside in your computer’s memory. You’ll use this technique later for mocking print() in unit tests:

>>>
>>> importio>>> fake_file=io.StringIO()>>> print('hello world',file=fake_file)>>> fake_file.getvalue()'hello world\n'

If you got to this point, then you’re left with only one keyword argument in print(), which you’ll see in the next subsection. It’s probably the least used of them all. Nevertheless, there are times when it’s absolutely necessary.

There’s a special syntax in Python 2 for replacing the default sys.stdout with a custom file in the print statement:

withopen('file.txt',mode='w')asfile_object:print>>file_object,'hello world'

Because strings and bytes are represented with the same str type in Python 2, the print statement can handle binary data just fine:

withopen('file.dat',mode='wb')asfile_object:print>>file_object,'\x41\x0a'

Although, there’s a problem with character encoding. The open() function in Python 2 lacks the encoding parameter, which would often result in the dreadful UnicodeEncodeError:

>>>
>>> withopen('file.txt',mode='w')asfile_object:... unicode_text=u'\xfcber na\xefve caf\xe9'... print>>file_object,unicode_text... Traceback (most recent call last):
  File "<stdin>", line 3, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc'...

Notice how non-Latin characters must be escaped in both Unicode and string literals to avoid a syntax error. Take a look at this example:

unicode_literal=u'\xfcber na\xefve caf\xe9'string_literal='\xc3\xbcber na\xc3\xafve caf\xc3\xa9'

Alternatively, you could specify source code encoding according to PEP 263 at the top of the file, but that wasn’t the best practice due to portability issues:

#!/usr/bin/env python2# -*- coding: utf-8 -*-unescaped_unicode_literal=u'über naïve café'unescaped_string_literal='über naïve café'

Your best bet is to encode the Unicode string just before printing it. You can do this manually:

withopen('file.txt',mode='w')asfile_object:unicode_text=u'\xfcber na\xefve caf\xe9'encoded_text=unicode_text.encode('utf-8')print>>file_object,encoded_text

However, a more convenient option is to use the built-in codecs module:

importcodecswithcodecs.open('file.txt','w',encoding='utf-8')asfile_object:unicode_text=u'\xfcber na\xefve caf\xe9'print>>file_object,unicode_text

It’ll take care of making appropriate conversions when you need to read or write files.

Buffering Print Calls

In the previous subsection, you learned that print() delegates printing to a file-like object such as sys.stdout. Some streams, however, buffer certain I/O operations to enhance performance, which can get in the way. Let’s take a look at an example.

Imagine you were writing a countdown timer, which should append the remaining time to the same line every second:

3...2...1...Go!

Your first attempt may look something like this:

importtimenum_seconds=3forcountdowninreversed(range(num_seconds+1)):ifcountdown>0:print(countdown,end='...')time.sleep(1)else:print('Go!')

As long as the countdown variable is greater than zero, the code keeps appending text without a trailing newline and then goes to sleep for one second. Finally, when the countdown is finished, it prints Go! and terminates the line.

Unexpectedly, instead of counting down every second, the program idles wastefully for three seconds, and then suddenly prints the entire line at once:

Terminal with buffered output

That’s because the operating system buffers subsequent writes to the standard output in this case. You need to know that there are three kinds of streams with respect to buffering:

  1. Unbuffered
  2. Line-buffered
  3. Block-buffered

Unbuffered is self-explanatory, that is, no buffering is taking place, and all writes have immediate effect. A line-buffered stream waits before firing any I/O calls until a line break appears somewhere in the buffer, whereas a block-buffered one simply allows the buffer to fill up to a certain size regardless of its content. Standard output is both line-buffered and block-buffered, depending on which event comes first.

Buffering helps to reduce the number of expensive I/O calls. Think about sending messages over a high-latency network, for example. When you connect to a remote server to execute commands over the SSH protocol, each of your keystrokes may actually produce an individual data packet, which is orders of magnitude bigger than its payload. What an overhead! It would make sense to wait until at least a few characters are typed and then send them together. That’s where buffering steps in.

On the other hand, buffering can sometimes have undesired effects as you just saw with the countdown example. To fix it, you can simply tell print() to forcefully flush the stream without waiting for a newline character in the buffer using its flush flag:

print(countdown,end='...',flush=True)

That’s all. Your countdown should work as expected now, but don’t take my word for it. Go ahead and test it to see the difference.

Congratulations! At this point, you’ve seen examples of calling print() that cover all of its parameters. You know their purpose and when to use them. Understanding the signature is only the beginning, however. In the upcoming sections, you’ll see why.

There isn’t an easy way to flush the stream in Python 2, because the print statement doesn’t allow for it by itself. You need to get a handle of its lower-level layer, which is the standard output, and call it directly:

importtimeimportsysnum_seconds=3forcountdowninreversed(range(num_seconds+1)):ifcountdown>0:sys.stdout.write('%s...'%countdown)sys.stdout.flush()time.sleep(1)else:print'Go!'

Alternatively, you could disable buffering of the standard streams either by providing the -u flag to the Python interpreter or by setting up the PYTHONUNBUFFERED environment variable:

$ python2 -u countdown.py
$PYTHONUNBUFFERED=1 python2 countdown.py

Note that print() was backported to Python 2 and made available through the __future__ module. Unfortunately, it doesn’t come with the flush parameter:

>>>
>>> from__future__importprint_function>>> help(print)Help on built-in function print in module __builtin__:print(...)    print(value, ..., sep=' ', end='\n', file=sys.stdout)

What you’re seeing here is a docstring of the print() function. You can display docstrings of various objects in Python using the built-in help() function.

Printing Custom Data Types

Up until now, you only dealt with built-in data types such as strings and numbers, but you’ll often want to print your own abstract data types. Let’s have a look at different ways of defining them.

For simple objects without any logic, whose purpose is to carry data, you’ll typically take advantage of namedtuple, which is available in the standard library. Named tuples have a neat textual representation out of the box:

>>>
>>> fromcollectionsimportnamedtuple>>> Person=namedtuple('Person','name age')>>> jdoe=Person('John Doe',42)>>> print(jdoe)Person(name='John Doe', age=42)

That’s great as long as holding data is enough, but in order to add behaviors to the Person type, you’ll eventually need to define a class. Take a look at this example:

classPerson:def__init__(self,name,age):self.name,self.age=name,age

If you now create an instance of the Person class and try to print it, you’ll get this bizarre output, which is quite different from the equivalent namedtuple:

>>>
>>> jdoe=Person('John Doe',42)>>> print(jdoe)<__main__.Person object at 0x7fcac3fed1d0>

It’s the default representation of objects, which comprises their address in memory, the corresponding class name and a module in which they were defined. You’ll fix that in a bit, but just for the record, as a quick workaround you could combine namedtuple and a custom class through inheritance:

fromcollectionsimportnamedtupleclassPerson(namedtuple('Person','name age')):pass

Your Person class has just become a specialized kind of namedtuple with two attributes, which you can customize.

Note: In Python 3, the pass statement can be replaced with the ellipsis (...) literal to indicate a placeholder:

defdelta(a,b,c):...

This prevents the interpreter from raising IndentationError due to missing indented block of code.

That’s better than a plain namedtuple, because not only do you get printing right for free, but you can also add custom methods and properties to the class. However, it solves one problem while introducing another. Remember that tuples, including named tuples, are immutable in Python, so they can’t change their values once created.

It’s true that designing immutable data types is desirable, but in many cases, you’ll want them to allow for change, so you’re back with regular classes again.

Note: Following other languages and frameworks, Python 3.7 introduced data classes, which you can think of as mutable tuples. This way, you get the best of both worlds:

>>>
>>> fromdataclassesimportdataclass>>> @dataclass... classPerson:... name:str... age:int... ... defcelebrate_birthday(self):... self.age+=1... >>> jdoe=Person('John Doe',42)>>> jdoe.celebrate_birthday()>>> print(jdoe)Person(name='John Doe', age=43)

The syntax for variable annotations, which is required to specify class fields with their corresponding types, was defined in Python 3.6.

From earlier subsections, you already know that print() implicitly calls the built-in str() function to convert its positional arguments into strings. Indeed, calling str() manually against an instance of the regular Person class yields the same result as printing it:

>>>
>>> jdoe=Person('John Doe',42)>>> str(jdoe)'<__main__.Person object at 0x7fcac3fed1d0>'

str(), in turn, looks for one of two magic methods within the class body, which you typically implement. If it doesn’t find one, then it falls back to the ugly default representation. Those magic methods are, in order of search:

  1. def __str__(self)
  2. def __repr__(self)

The first one is recommended to return a short, human-readable text, which includes information from the most relevant attributes. After all, you don’t want to expose sensitive data, such as user passwords, when printing objects.

However, the other one should provide complete information about an object, to allow for restoring its state from a string. Ideally, it should return valid Python code, so that you can pass it directly to eval():

>>>
>>> repr(jdoe)"Person(name='John Doe', age=42)">>> type(eval(repr(jdoe)))<class '__main__.Person'>

Notice the use of another built-in function, repr(), which always tries to call .__repr__() in an object, but falls back to the default representation if it doesn’t find that method.

Note: Even though print() itself uses str() for type casting, some compound data types delegate that call to repr() on their members. This happens to lists and tuples, for example.

Consider this class with both magic methods, which return alternative string representations of the same object:

classUser:def__init__(self,login,password):self.login=loginself.password=passworddef__str__(self):returnself.logindef__repr__(self):returnf"User('{self.login}', '{self.password}')"

If you print a single object of the User class, then you won’t see the password, because print(user) will call str(user), which eventually will invoke user.__str__():

>>>
>>> user=User('jdoe','s3cret')>>> print(user)jdoe

However, if you put the same user variable inside a list by wrapping it in square brackets, then the password will become clearly visible:

>>>
>>> print([user])[User('jdoe', 's3cret')]

That’s because sequences, such as lists and tuples, implement their .__str__() method so that all of their elements are first converted with repr().

Python gives you a lot of freedom when it comes to defining your own data types if none of the built-in ones meet your needs. Some of them, such as named tuples and data classes, offer string representations that look good without requiring any work on your part. Still, for the most flexibility, you’ll have to define a class and override its magic methods described above.

The semantics of .__str__() and .__repr__() didn’t change since Python 2, but you must remember that strings were nothing more than glorified byte arrays back then. To convert your objects into proper Unicode, which was a separate data type, you’d have to provide yet another magic method: .__unicode__().

Here’s an example of the same User class in Python 2:

classUser(object):def__init__(self,login,password):self.login=loginself.password=passworddef__unicode__(self):returnself.logindef__str__(self):returnunicode(self).encode('utf-8')def__repr__(self):user=u"User('%s', '%s')"%(self.login,self.password)returnuser.encode('unicode_escape')

As you can see, this implementation delegates some work to avoid duplication by calling the built-in unicode() function on itself.

Both .__str__() and .__repr__() methods must return strings, so they encode Unicode characters into specific byte representations called character sets. UTF-8 is the most widespread and safest encoding, while unicode_escape is a special constant to express funky characters, such as é, as escape sequences in plain ASCII, such as \xe9.

The print statement is looking for the magic .__str__() method in the class, so the chosen charset must correspond to the one used by the terminal. For example, default encoding in DOS and Windows is CP 852 rather than UTF-8, so running this can result in a UnicodeEncodeError or even garbled output:

>>>
>>> user=User(u'\u043d\u0438\u043a\u0438\u0442\u0430',u's3cret')>>> printuserđŻđŞđ║đŞĐéđ░

However, if you ran the same code on a system with UTF-8 encoding, then you’d get the proper spelling of a popular Russian name:

>>>
>>> user=User(u'\u043d\u0438\u043a\u0438\u0442\u0430',u's3cret')>>> printuserникита

It’s recommended to convert strings to Unicode as early as possible, for example, when you’re reading data from a file, and use it consistently everywhere in your code. At the same time, you should encode Unicode back to the chosen character set right before presenting it to the user.

It seems as if you have more control over string representation of objects in Python 2 because there’s no magic .__unicode__() method in Python 3 anymore. You may be asking yourself if it’s possible to convert an object to its byte string representation rather than a Unicode string in Python 3. It’s possible, with a special .__bytes__() method that does just that:

>>>
>>> classUser(object):... def__init__(self,login,password):... self.login=login... self.password=password... ... def__bytes__(self):# Python 3... returnself.login.encode('utf-8')...>>> user=User(u'\u043d\u0438\u043a\u0438\u0442\u0430',u's3cret')>>> bytes(user)b'\xd0\xbd\xd0\xb8\xd0\xba\xd0\xb8\xd1\x82\xd0\xb0'

Using the built-in bytes() function on an instance delegates the call to its __bytes__() method defined in the corresponding class.

Understanding Python Print

You know how to use print() quite well at this point, but knowing what it is will allow you to use it even more effectively and consciously. After reading this section, you’ll understand how printing in Python has improved over the years.

You’ve seen that print() is a function in Python 3. More specifically, it’s a built-in function, which means that you don’t need to import it from anywhere:

>>>
>>> print<built-in function print>

It’s always available in the global namespace so that you can call it directly, but you can also access it through a module from the standard library:

>>>
>>> importbuiltins>>> builtins.print<built-in function print>

This way, you can avoid name collisions with custom functions. Let’s say you wanted to redefineprint() so that it doesn’t append a trailing newline. At the same time, you wanted to rename the original function to something like println():

>>>
>>> importbuiltins>>> println=builtins.print>>> defprint(*args,**kwargs):... builtins.print(*args,**kwargs,end='')...>>> println('hello')hello>>> print('hello\n')hello

Now you have two separate printing functions just like in the Java programming language. You’ll define custom print() functions in the mocking section later as well. Also, note that you wouldn’t be able to overwrite print() in the first place if it wasn’t a function.

On the other hand, print() isn’t a function in the mathematical sense, because it doesn’t return any meaningful value other than the implicit None:

>>>
>>> value=print('hello world')hello world>>> print(value)None

Such functions are, in fact, procedures or subroutines that you call to achieve some kind of side-effect, which ultimately is a change of a global state. In the case of print(), that side-effect is showing a message on the standard output or writing to a file.

Because print() is a function, it has a well-defined signature with known attributes. You can quickly find its documentation using the editor of your choice, without having to remember some weird syntax for performing a certain task.

Besides, functions are easier to extend. Adding a new feature to a function is as easy as adding another keyword argument, whereas changing the language to support that new feature is much more cumbersome. Think of stream redirection or buffer flushing, for example.

Another benefit of print() being a function is composability. Functions are so-called first-class objects or first-class citizens in Python, which is a fancy way of saying they’re values just like strings or numbers. This way, you can assign a function to a variable, pass it to another function, or even return one from another. print() isn’t different in this regard. For instance, you can take advantage of it for dependency injection:

defdownload(url,log=print):log(f'Downloading {url}')# ...defcustom_print(*args):pass# Do not print anythingdownload('/js/app.js',log=custom_print)

Here, the log parameter lets you inject a callback function, which defaults to print() but can be any callable. In this example, printing is completely disabled by substituting print() with a dummy function that does nothing.

Note: A dependency is any piece of code required by another bit of code.

Dependency injection is a technique used in code design to make it more testable, reusable, and open for extension. You can achieve it by referring to dependencies indirectly through abstract interfaces and by providing them in a push rather than pull fashion.

There’s a funny explanation of dependency injection circulating on the Internet:

Dependency injection for five-year-olds

When you go and get things out of the refrigerator for yourself, you can cause problems. You might leave the door open, you might get something Mommy or Daddy doesn’t want you to have. You might even be looking for something we don’t even have or which has expired.

What you should be doing is stating a need, “I need something to drink with lunch,” and then we will make sure you have something when you sit down to eat.

John Munsch, 28 October 2009. (Source)

Composition allows you to combine a few functions into a new one of the same kind. Let’s see this in action by specifying a custom error() function that prints to the standard error stream and prefixes all messages with a given log level:

>>>
>>> fromfunctoolsimportpartial>>> importsys>>> redirect=lambdafunction,stream:partial(function,file=stream)>>> prefix=lambdafunction,prefix:partial(function,prefix)>>> error=prefix(redirect(print,sys.stderr),'[ERROR]')>>> error('Something went wrong')[ERROR] Something went wrong

This custom function uses partial functions to achieve the desired effect. It’s an advanced concept borrowed from the functional programming paradigm, so you don’t need to go too deep into that topic for now. However, if you’re interested in this topic, I recommend taking a look at the functools module.

Unlike statements, functions are values. That means you can mix them with expressions, in particular, lambda expressions. Instead of defining a full-blown function to replace print() with, you can make an anonymous lambda expression that calls it:

>>>
>>> download('/js/app.js',lambdamsg:print('[INFO]',msg))[INFO] Downloading /js/app.js

However, because a lambda expression is defined in place, there’s no way of referring to it elsewhere in the code.

Note: In Python, you can’t put statements, such as assignments, conditional statements, loops, and so on, in an anonymous lambda function. It has to be a single expression!

Another kind of expression is a ternary conditional expression:

>>>
>>> user='jdoe'>>> print('Hi!')ifuserisNoneelseprint(f'Hi, {user}.')Hi, jdoe.

Python has both conditional statements and conditional expressions. The latter is evaluated to a single value that can be assigned to a variable or passed to a function. In the example above, you’re interested in the side-effect rather than the value, which evaluates to None, so you simply ignore it.

As you can see, functions allow for an elegant and extensible solution, which is consistent with the rest of the language. In the next subsection, you’ll discover how not having print() as a function caused a lot of headaches.

A statement is an instruction that may evoke a side-effect when executed but never evaluates to a value. In other words, you wouldn’t be able to print a statement or assign it to a variable like this:

result=print'hello world'

That’s a syntax error in Python 2.

Here are a few more examples of statements in Python:

  • assignment:=
  • conditional:if
  • loop:while
  • assertion: assert

Note: Python 3.8 brings a controversial walrus operator (:=), which is an assignment expression. With it, you can evaluate an expression and assign the result to a variable at the same time, even within another expression!

Take a look at this example, which calls an expensive function once and then reuses the result for further computation:

# Python 3.8+values=[y:=f(x),y**2,y**3]

This is useful for simplifying the code without losing its efficiency. Typically, performant code tends to be more verbose:

y=f(x)values=[y,y**2,y**3]

The controversy behind this new piece of syntax caused a lot of argument. An abundance of negative comments and heated debates eventually led Guido van Rossum to step down from the Benevolent Dictator For Life or BDFL position.

Statements are usually comprised of reserved keywords such as if, for, or print that have fixed meaning in the language. You can’t use them to name your variables or other symbols. That’s why redefining or mocking the print statement isn’t possible in Python 2. You’re stuck with what you get.

Furthermore, you can’t print from anonymous functions, because statements aren’t accepted in lambda expressions:

>>>
>>> lambda:print'hello world'
  File "<stdin>", line 1lambda:print'hello world'^SyntaxError: invalid syntax

The syntax of the print statement is ambiguous. Sometimes you can add parentheses around the message, and they’re completely optional:

>>>
>>> print'Please wait...'Please wait...>>> print('Please wait...')Please wait...

At other times they change how the message is printed:

>>>
>>> print'My name is','John'My name is John>>> print('My name is','John')('My name is', 'John')

String concatenation can raise a TypeError due to incompatible types, which you have to handle manually, for example:

>>>
>>> values=['jdoe','is',42,'years old']>>> print' '.join(map(str,values))jdoe is 42 years old

Compare this with similar code in Python 3, which leverages sequence unpacking:

>>>
>>> values=['jdoe','is',42,'years old']>>> print(*values)# Python 3jdoe is 42 years old

There aren’t any keyword arguments for common tasks such as flushing the buffer or stream redirection. You need to remember the quirky syntax instead. Even the built-in help() function isn’t that helpful with regards to the print statement:

>>>
>>> help(print)
  File "<stdin>", line 1help(print)^SyntaxError: invalid syntax

Trailing newline removal doesn’t work quite right, because it adds an unwanted space. You can’t compose multiple print statements together, and, on top of that, you have to be extra diligent about character encoding.

The list of problems goes on and on. If you’re curious, you can jump back to the previous section and look for more detailed explanations of the syntax in Python 2.

However, you can mitigate some of those problems with a much simpler approach. It turns out the print() function was backported to ease the migration to Python 3. You can import it from a special __future__ module, which exposes a selection of language features released in later Python versions.

Note: You may import future functions as well as baked-in language constructs such as the with statement.

To find out exactly what features are available to you, inspect the module:

>>>
>>> import__future__>>> __future__.all_feature_names['nested_scopes', 'generators', 'division', 'absolute_import', 'with_statement', 'print_function', 'unicode_literals']

You could also call dir(__future__), but that would show a lot of uninteresting internal details of the module.

To enable the print() function in Python 2, you need to add this import statement at the beginning of your source code:

from__future__importprint_function

From now on the print statement is no longer available, but you have the print() function at your disposal. Note that it isn’t the same function like the one in Python 3, because it’s missing the flush keyword argument, but the rest of the arguments are the same.

Other than that, it doesn’t spare you from managing character encodings properly.

Here’s an example of calling the print() function in Python 2:

>>>
>>> from__future__importprint_function>>> importsys>>> print('I am a function in Python',sys.version_info.major)I am a function in Python 2

You now have an idea of how printing in Python evolved and, most importantly, understand why these backward-incompatible changes were necessary. Knowing this will surely help you become a better Python programmer.

Printing With Style

If you thought that printing was only about lighting pixels up on the screen, then technically you’d be right. However, there are ways to make it look cool. In this section, you’ll find out how to format complex data structures, add colors and other decorations, build interfaces, use animation, and even play sounds with text!

Pretty-Printing Nested Data Structures

Computer languages allow you to represent data as well as executable code in a structured way. Unlike Python, however, most languages give you a lot of freedom in using whitespace and formatting. This can be useful, for example in compression, but it sometimes leads to less readable code.

Pretty-printing is about making a piece of data or code look more appealing to the human eye so that it can be understood more easily. This is done by indenting certain lines, inserting newlines, reordering elements, and so forth.

Python comes with the pprint module in its standard library, which will help you in pretty-printing large data structures that don’t fit on a single line. Because it prints in a more human-friendly way, many popular REPL tools, including JupyterLab and IPython, use it by default in place of the regular print() function.

Note: To toggle pretty printing in IPython, issue the following command:

>>>
In [1]: %pprintPretty printing has been turned OFFIn [2]: %pprintPretty printing has been turned ON

This is an example of Magic in IPython. There are a lot of built-in commands that start with a percent sign (%), but you can find more on PyPI, or even create your own.

If you don’t care about not having access to the original print() function, then you can replace it with pprint() in your code using import renaming:

>>>
>>> frompprintimportpprintasprint>>> print<function pprint at 0x7f7a775a3510>

Personally, I like to have both functions at my fingertips, so I’d rather use something like pp as a short alias:

frompprintimportpprintaspp

At first glance, there’s hardly any difference between the two functions, and in some cases there’s virtually none:

>>>
>>> print(42)42>>> pp(42)42>>> print('hello')hello>>> pp('hello')'hello'  # Did you spot the difference?

That’s because pprint() calls repr() instead of the usual str() for type casting, so that you may evaluate its output as Python code if you want to. The differences become apparent as you start feeding it more complex data structures:

>>>
>>> data={'powers':[x**10forxinrange(10)]}>>> pp(data){'powers': [0,            1,            1024,            59049,            1048576,            9765625,            60466176,            282475249,            1073741824,            3486784401]}

The function applies reasonable formatting to improve readability, but you can customize it even further with a couple of parameters. For example, you may limit a deeply nested hierarchy by showing an ellipsis below a given level:

>>>
>>> cities={'USA':{'Texas':{'Dallas':['Irving']}}}>>> pp(cities,depth=3){'USA': {'Texas': {'Dallas': [...]}}}

The ordinary print() also uses ellipses but for displaying recursive data structures, which form a cycle, to avoid stack overflow error:

>>>
>>> items=[1,2,3]>>> items.append(items)>>> print(items)[1, 2, 3, [...]]

However, pprint() is more explicit about it by including the unique identity of a self-referencing object:

>>>
>>> pp(items)[1, 2, 3, <Recursion on list with id=140635757287688>]>>> id(items)140635757287688

The last element in the list is the same object as the entire list.

Note: Recursive or very large data sets can be dealt with using the reprlib module as well:

>>>
>>> importreprlib>>> reprlib.repr([x**10forxinrange(10)])'[0, 1, 1024, 59049, 1048576, 9765625, ...]'

This module supports most of the built-in types and is used by the Python debugger.

pprint() automatically sorts dictionary keys for you before printing, which allows for consistent comparison. When you’re comparing strings, you often don’t care about a particular order of serialized attributes. Anyways, it’s always best to compare actual dictionaries before serialization.

Dictionaries often represent JSON data, which is widely used on the Internet. To correctly serialize a dictionary into a valid JSON-formatted string, you can take advantage of the json module. It too has pretty-printing capabilities:

>>>
>>> importjson>>> data={'username':'jdoe','password':'s3cret'}>>> ugly=json.dumps(data)>>> pretty=json.dumps(data,indent=4,sort_keys=True)>>> print(ugly){"username": "jdoe", "password": "s3cret"}>>> print(pretty){"password": "s3cret","username": "jdoe"}

Notice, however, that you need to handle printing yourself, because it’s not something you’d typically want to do. Similarly, the pprint module has an additional pformat() function that returns a string, in case you had to do something other than printing it.

Surprisingly, the signature of pprint() is nothing like the print() function’s one. You can’t even pass more than one positional argument, which shows how much it focuses on printing data structures.

Adding Colors With ANSI Escape Sequences

As personal computers got more sophisticated, they had better graphics and could display more colors. However, different vendors had their own idea about the API design for controlling it. That changed a few decades ago when people at the American National Standards Institute decided to unify it by defining ANSI escape codes.

Most of today’s terminal emulators support this standard to some degree. Until recently, the Windows operating system was a notable exception. Therefore, if you want the best portability, use the colorama library in Python. It translates ANSI codes to their appropriate counterparts in Windows while keeping them intact in other operating systems.

To check if your terminal understands a subset of the ANSI escape sequences, for example, related to colors, you can try using the following command:

$ tput colors

My default terminal on Linux says it can display 256 distinct colors, while xterm gives me only 8. The command would return a negative number if colors were unsupported.

ANSI escape sequences are like a markup language for the terminal. In HTML you work with tags, such as <b> or <i>, to change how elements look in the document. These tags are mixed with your content, but they’re not visible themselves. Similarly, escape codes won’t show up in the terminal as long as it recognizes them. Otherwise, they’ll appear in the literal form as if you were viewing the source of a website.

As its name implies, a sequence must begin with the non-printable Esc character, whose ASCII value is 27, sometimes denoted as 0x1b in hexadecimal or 033 in octal. You may use Python number literals to quickly verify it’s indeed the same number:

>>>
>>> 27==0x1b==0o33True

Additionally, you can obtain it with the \e escape sequence in the shell:

$echo -e "\e"

The most common ANSI escape sequences take the following form:

ElementDescriptionExample
Escnon-printable escape character\033
[opening square bracket[
numeric codeone or more numbers separated with ;0
character codeuppercase or lowercase letterm

The numeric code can be one or more numbers separated with a semicolon, while the character code is just one letter. Their specific meaning is defined by the ANSI standard. For example, to reset all formatting, you would type one of the following commands, which use the code zero and the letter m:

$echo -e "\e[0m"$echo -e "\x1b[0m"$echo -e "\033[0m"

At the other end of the spectrum, you have compound code values. To set foreground and background with RGB channels, given that your terminal supports 24-bit depth, you could provide multiple numbers:

$echo -e "\e[38;2;0;0;0m\e[48;2;255;255;255mBlack on white\e[0m"

It’s not just text color that you can set with the ANSI escape codes. You can, for example, clear and scroll the terminal window, change its background, move the cursor around, make the text blink or decorate it with an underline.

In Python, you’d probably write a helper function to allow for wrapping arbitrary codes into a sequence:

>>>
>>> defesc(code):... returnf'\033[{code}m'...>>> print(esc('31;1;4')+'really'+esc(0)+' important')

This would make the word really appear in red, bold, and underlined font:

Text formatted with ANSI escape codes

However, there are higher-level abstractions over ANSI escape codes, such as the mentioned colorama library, as well as tools for building user interfaces in the console.

Building Console User Interfaces

While playing with ANSI escape codes is undeniably a ton of fun, in the real world you’d rather have more abstract building blocks to put together a user interface. There are a few libraries that provide such a high level of control over the terminal, but curses seems to be the most popular choice.

Note: To use the curses library in Windows, you need to install a third-party package:

C:\> pip install windows-curses

That’s because curses isn’t available in the standard library of the Python distribution for Windows.

Primarily, it allows you to think in terms of independent graphical widgets instead of a blob of text. Besides, you get a lot of freedom in expressing your inner artist, because it’s really like painting a blank canvas. The library hides the complexities of having to deal with different terminals. Other than that, it has great support for keyboard events, which might be useful for writing video games.

How about making a retro snake game? Let’s create a Python snake simulator:

The retro snake game built with curses library

First, you need to import the curses module. Since it modifies the state of a running terminal, it’s important to handle errors and gracefully restore the previous state. You can do this manually, but the library comes with a convenient wrapper for your main function:

importcursesdefmain(screen):passif__name__=='__main__':curses.wrapper(main)

Note, the function must accept a reference to the screen object, also known as stdscr, that you’ll use later for additional setup.

If you run this program now, you won’t see any effects, because it terminates immediately. However, you can add a small delay to have a sneak peek:

importtime,cursesdefmain(screen):time.sleep(1)if__name__=='__main__':curses.wrapper(main)

This time the screen went completely blank for a second, but the cursor was still blinking. To hide it, just call one of the configuration functions defined in the module:

importtime,cursesdefmain(screen):curses.curs_set(0)# Hide the cursortime.sleep(1)if__name__=='__main__':curses.wrapper(main)

Let’s define the snake as a list of points in screen coordinates:

snake=[(0,i)foriinreversed(range(20))]

The head of the snake is always the first element in the list, whereas the tail is the last one. The initial shape of the snake is horizontal, starting from the top-left corner of the screen and facing to the right. While its y-coordinate stays at zero, its x-coordinate decreases from head to tail.

To draw the snake, you’ll start with the head and then follow with the remaining segments. Each segment carries (y, x) coordinates, so you can unpack them:

# Draw the snakescreen.addstr(*snake[0],'@')forsegmentinsnake[1:]:screen.addstr(*segment,'*')

Again, if you run this code now, it won’t display anything, because you must explicitly refresh the screen afterward:

importtime,cursesdefmain(screen):curses.curs_set(0)# Hide the cursorsnake=[(0,i)foriinreversed(range(20))]# Draw the snakescreen.addstr(*snake[0],'@')forsegmentinsnake[1:]:screen.addstr(*segment,'*')screen.refresh()time.sleep(1)if__name__=='__main__':curses.wrapper(main)

You want to move the snake in one of four directions, which can be defined as vectors. Eventually, the direction will change in response to an arrow keystroke, so you may hook it up to the library’s key codes:

directions={curses.KEY_UP:(-1,0),curses.KEY_DOWN:(1,0),curses.KEY_LEFT:(0,-1),curses.KEY_RIGHT:(0,1),}direction=directions[curses.KEY_RIGHT]

How does a snake move? It turns out that only its head really moves to a new location, while all other segments shift towards it. In each step, almost all segments remain the same, except for the head and the tail. Assuming the snake isn’t growing, you can remove the tail and insert a new head at the beginning of the list:

# Move the snakesnake.pop()snake.insert(0,tuple(map(sum,zip(snake[0],direction))))

To get the new coordinates of the head, you need to add the direction vector to it. However, adding tuples in Python results in a bigger tuple instead of the algebraic sum of the corresponding vector components. One way to fix this is by using the built-in zip(), sum(), and map() functions.

The direction will change on a keystroke, so you need to call .getch() to obtain the pressed key code. However, if the pressed key doesn’t correspond to the arrow keys defined earlier as dictionary keys, the direction won’t change:

# Change direction on arrow keystrokedirection=directions.get(screen.getch(),direction)

By default, however, .getch() is a blocking call that would prevent the snake from moving unless there was a keystroke. Therefore, you need to make the call non-blocking by adding yet another configuration:

defmain(screen):curses.curs_set(0)# Hide the cursorscreen.nodelay(True)# Don't block I/O calls

You’re almost done, but there’s just one last thing left. If you now loop this code, the snake will appear to be growing instead of moving. That’s because you have to erase the screen explicitly before each iteration.

Finally, this is all you need to play the snake game in Python:

importtime,cursesdefmain(screen):curses.curs_set(0)# Hide the cursorscreen.nodelay(True)# Don't block I/O callsdirections={curses.KEY_UP:(-1,0),curses.KEY_DOWN:(1,0),curses.KEY_LEFT:(0,-1),curses.KEY_RIGHT:(0,1),}direction=directions[curses.KEY_RIGHT]snake=[(0,i)foriinreversed(range(20))]whileTrue:screen.erase()# Draw the snakescreen.addstr(*snake[0],'@')forsegmentinsnake[1:]:screen.addstr(*segment,'*')# Move the snakesnake.pop()snake.insert(0,tuple(map(sum,zip(snake[0],direction))))# Change direction on arrow keystrokedirection=directions.get(screen.getch(),direction)screen.refresh()time.sleep(0.1)if__name__=='__main__':curses.wrapper(main)

This is merely scratching the surface of the possibilities that the curses module opens up. You may use it for game development like this or more business-oriented applications.

Living It Up With Cool Animations

Not only can animations make the user interface more appealing to the eye, but they also improve the overall user experience. When you provide early feedback to the user, for example, they’ll know if your program’s still working or if it’s time to kill it.

To animate text in the terminal, you have to be able to freely move the cursor around. You can do this with one of the tools mentioned previously, that is ANSI escape codes or the curses library. However, I’d like to show you an even simpler way.

If the animation can be constrained to a single line of text, then you might be interested in two special escape character sequences:

  • Carriage return:\r
  • Backspace:\b

The first one moves the cursor to the beginning of the line, whereas the second one moves it only one character to the left. They both work in a non-destructive way without overwriting text that’s already been written.

Let’s take a look at a few examples.

You’ll often want to display some kind of a spinning wheel to indicate a work in progress without knowing exactly how much time’s left to finish:

Indefinite animation in the terminal

Many command line tools use this trick while downloading data over the network. You can make a really simple stop motion animation from a sequence of characters that will cycle in a round-robin fashion:

fromitertoolsimportcyclefromtimeimportsleepforframeincycle(r'-\|/-\|/'):print('\r',frame,sep='',end='',flush=True)sleep(0.2)

The loop gets the next character to print, then moves the cursor to the beginning of the line, and overwrites whatever there was before without adding a newline. You don’t want extra space between positional arguments, so separator argument must be blank. Also, notice the use of Python’s raw strings due to backslash characters present in the literal.

When you know the remaining time or task completion percentage, then you’re able to show an animated progress bar:

Progress bar animation in the terminal

First, you need to calculate how many hashtags to display and how many blank spaces to insert. Next, you erase the line and build the bar from scratch:

fromtimeimportsleepdefprogress(percent=0,width=30):left=width*percent//100right=width-leftprint('\r[','#'*left,' '*right,']',f' {percent:.0f}%',sep='',end='',flush=True)foriinrange(101):progress(i)sleep(0.1)

As before, each request for update repaints the entire line.

Note: There’s a feature-rich progressbar2 library, along with a few other similar tools, that can show progress in a much more comprehensive way.

Making Sounds With Print

If you’re old enough to remember computers with a PC speaker, then you must also remember their distinctive beep sound, often used to indicate hardware problems. They could barely make any more noises than that, yet video games seemed so much better with it.

Today you can still take advantage of this small loudspeaker, but chances are your laptop didn’t come with one. In such a case, you can enable terminal bell emulation in your shell, so that a system warning sound is played instead.

Go ahead and type this command to see if your terminal can play a sound:

$echo -e "\a"

This would normally print text, but the -e flag enables the interpretation of backslash escapes. As you can see, there’s a dedicated escape sequence \a, which stands for “alert”, that outputs a special bell character. Some terminals make a sound whenever they see it.

Similarly, you can print this character in Python. Perhaps in a loop to form some kind of melody. While it’s only a single note, you can still vary the length of pauses between consecutive instances. That seems like a perfect toy for Morse code playback!

The rules are the following:

  • Letters are encoded with a sequence of dot (·) and dash (–) symbols.
  • A dot is one unit of time.
  • A dash is three units of time.
  • Individual symbols in a letter are spaced one unit of time apart.
  • Symbols of two adjacent letters are spaced three units of time apart.
  • Symbols of two adjacent words are spaced seven units of time apart.

According to those rules, you could be “printing” an SOS signal indefinitely in the following way:

whileTrue:dot()symbol_space()dot()symbol_space()dot()letter_space()dash()symbol_space()dash()symbol_space()dash()letter_space()dot()symbol_space()dot()symbol_space()dot()word_space()

In Python, you can implement it in merely ten lines of code:

fromtimeimportsleepspeed=0.1defsignal(duration,symbol):sleep(duration)print(symbol,end='',flush=True)dot=lambda:signal(speed,\a')dash=lambda:signal(3*speed,'−\a')symbol_space=lambda:signal(speed,'')letter_space=lambda:signal(3*speed,'')word_space=lambda:signal(7*speed,' ')

Maybe you could even take it one step further and make a command line tool for translating text into Morse code? Either way, I hope you’re having fun with this!

Mocking Python Print in Unit Tests

Nowadays, it’s expected that you ship code that meets high quality standards. If you aspire to become a professional, you must learn how to test your code.

Software testing is especially important in dynamically typed languages, such as Python, which don’t have a compiler to warn you about obvious mistakes. Defects can make their way to the production environment and remain dormant for a long time, until that one day when a branch of code finally gets executed.

Sure, you have linters, type checkers, and other tools for static code analysis to assist you. But they won’t tell you whether your program does what it’s supposed to do on the business level.

So, should you be testing print()? No. After all, it’s a built-in function that must have already gone through a comprehensive suite of tests. What you want to test, though, is whether your code is calling print() at the right time with the expected parameters. That’s known as a behavior.

You can test behaviors by mocking real objects or functions. In this case, you want to mock print() to record and verify its invocations.

Note: You might have heard the terms: dummy, fake, stub, spy, or mock used interchangeably. Some people make a distinction between them, while others don’t.

Martin Fowler explains their differences in a short glossary and collectively calls them test doubles.

Mocking in Python can be done twofold. First, you can take the traditional path of statically-typed languages by employing dependency injection. This may sometimes require you to change the code under test, which isn’t always possible if the code is defined in an external library:

defdownload(url,log=print):log(f'Downloading {url}')# ...

This is the same example I used in an earlier section to talk about function composition. It basically allows for substituting print() with a custom function of the same interface. To check if it prints the right message, you have to intercept it by injecting a mocked function:

>>>
>>> defmock_print(message):... mock_print.last_message=message...>>> download('resource',mock_print)>>> assert'Downloading resource'==mock_print.last_message

Calling this mock makes it save the last message in an attribute, which you can inspect later, for example in an assert statement.

In a slightly alternative solution, instead of replacing the entire print() function with a custom wrapper, you could redirect the standard output to an in-memory file-like stream of characters:

>>>
>>> defdownload(url,stream=None):... print(f'Downloading {url}',file=stream)... # ......>>> importio>>> memory_buffer=io.StringIO()>>> download('app.js',memory_buffer)>>> download('style.css',memory_buffer)>>> memory_buffer.getvalue()'Downloading app.js\nDownloading style.css\n'

This time the function explicitly calls print(), but it exposes its file parameter to the outside world.

However, a more Pythonic way of mocking objects takes advantage of the built-in mock module, which uses a technique called monkey patching. This derogatory name stems from it being a “dirty hack” that you can easily shoot yourself in the foot with. It’s less elegant than dependency injection but definitely quick and convenient.

Note: The mock module got absorbed by the standard library in Python 3, but before that, it was a third-party package. You had to install it separately:

$ pip2 install mock

Other than that, you referred to it as mock, whereas in Python 3 it’s part of the unit testing module, so you must import from unittest.mock.

What monkey patching does is alter implementation dynamically at runtime. Such a change is visible globally, so it may have unwanted consequences. In practice, however, patching only affects the code for the duration of test execution.

To mock print() in a test case, you’ll typically use the @patchdecorator and specify a target for patching by referring to it with a fully qualified name, that is including the module name:

fromunittest.mockimportpatch@patch('builtins.print')deftest_print(mock_print):print('not a real print')mock_print.assert_called_with('not a real print')

This will automatically create the mock for you and inject it to the test function. However, you need to declare that your test function accepts a mock now. The underlying mock object has lots of useful methods and attributes for verifying behavior.

Did you notice anything peculiar about that code snippet?

Despite injecting a mock to the function, you’re not calling it directly, although you could. That injected mock is only used to make assertions afterward and maybe to prepare the context before running the test.

In real life, mocking helps to isolate the code under test by removing dependencies such as a database connection. You rarely call mocks in a test, because that doesn’t make much sense. Rather, it’s other pieces of code that call your mock indirectly without knowing it.

Here’s what that means:

fromunittest.mockimportpatchdefgreet(name):print(f'Hello, {name}!')@patch('builtins.print')deftest_greet(mock_print):greet('John')mock_print.assert_called_with('Hello, John!')

The code under test is a function that prints a greeting. Even though it’s a fairly simple function, you can’t test it easily because it doesn’t return a value. It has a side-effect.

To eliminate that side-effect, you need to mock the dependency out. Patching lets you avoid making changes to the original function, which can remain agnostic about print(). It thinks it’s calling print(), but in reality, it’s calling a mock you’re in total control of.

There are many reasons for testing software. One of them is looking for bugs. When you write tests, you often want to get rid of the print() function, for example, by mocking it away. Paradoxically, however, that same function can help you find bugs during a related process of debugging you’ll read about in the next section.

You can’t monkey patch the print statement in Python 2, nor can you inject it as a dependency. However, you have a few other options:

  • Use stream redirection.
  • Patch the standard output defined in the sys module.
  • Import print() from the __future__ module.

Let’s examine them one by one.

Stream redirection is almost identical to the example you saw earlier:

>>>
>>> defdownload(url,stream=None):... print>>stream,'Downloading %s'%url... # ......>>> fromStringIOimportStringIO>>> memory_buffer=StringIO()>>> download('app.js',memory_buffer)>>> download('style.css',memory_buffer)>>> memory_buffer.getvalue()'Downloading app.js\nDownloading style.css\n'

There are only two differences. First, the syntax for stream redirection uses chevron (>>) instead of the file argument. The other difference is where StringIO is defined. You can import it from a similarly named StringIO module, or cStringIO for a faster implementation.

Patching the standard output from the sys module is exactly what it sounds like, but you need to be aware of a few gotchas:

frommockimportpatch,calldefgreet(name):print'Hello, %s!'%name@patch('sys.stdout')deftest_greet(mock_stdout):greet('John')mock_stdout.write.assert_has_calls([call('Hello, John!'),call('\n')])

First of all, remember to install the mock module as it wasn’t available in the standard library in Python 2.

Secondly, the print statement calls the underlying .write() method on the mocked object instead of calling the object itself. That’s why you’ll run assertions against mock_stdout.write.

Finally, a single print statement doesn’t always correspond to a single call to sys.stdout.write(). In fact, you’ll see the newline character written separately.

The last option you have is importing print() from future and patching it:

from__future__importprint_functionfrommockimportpatchdefgreet(name):print('Hello, %s!'%name)@patch('__builtin__.print')deftest_greet(mock_print):greet('John')mock_print.assert_called_with('Hello, John!')

Again, it’s nearly identical to Python 3, but the print() function is defined in the __builtin__ module rather than builtins.

In this section, you’ll take a look at the available tools for debugging in Python, starting from a humble print() function, through the logging module, to a fully fledged debugger. After reading it, you’ll be able to make an educated decision about which of them is the most suitable in a given situation.

Note: Debugging is the process of looking for the root causes of bugs or defects in software after they’ve been discovered, as well as taking steps to fix them.

The term bug has an amusing story about the origin of its name.

Tracing

Also known as print debugging or caveman debugging, it’s the most basic form of debugging. While a little bit old-fashioned, it’s still powerful and has its uses.

The idea is to follow the path of program execution until it stops abruptly, or gives incorrect results, to identify the exact instruction with a problem. You do that by inserting print statements with words that stand out in carefully chosen places.

Take a look at this example, which manifests a rounding error:

>>>
>>> defaverage(numbers):... print('debug1:',numbers)... iflen(numbers)>0:... print('debug2:',sum(numbers))... returnsum(numbers)/len(numbers)...>>> 0.1==average(3*[0.1])debug1: [0.1, 0.1, 0.1]debug2: 0.30000000000000004False

As you can see, the function doesn’t return the expected value of 0.1, but now you know it’s because the sum is a little off. Tracing the state of variables at different steps of the algorithm can give you a hint where the issue is.

In this case, the problem lies in how floating point numbers are represented in computer memory. Remember that numbers are stored in binary form. Decimal value of 0.1 turns out to have an infinite binary representation, which gets rounded.

For more information on rounding numbers in Python, you can check out How to Round Numbers in Python.

This method is simple and intuitive and will work in pretty much every programming language out there. Not to mention, it’s a great exercise in the learning process.

On the other hand, once you master more advanced techniques, it’s hard to go back, because they allow you to find bugs much quicker. Tracing is a laborious manual process, which can let even more errors slip through. The build and deploy cycle takes time. Afterward, you need to remember to meticulously remove all the print() calls you made without accidentally touching the genuine ones.

Besides, it requires you to make changes in the code, which isn’t always possible. Maybe you’re debugging an application running in a remote web server or want to diagnose a problem in a post-mortem fashion. Sometimes you simply don’t have access to the standard output.

That’s precisely where logging shines.

Logging

Let’s pretend for a minute that you’re running an e-commerce website. One day, an angry customer makes a phone call complaining about a failed transaction and saying he lost his money. He claims to have tried purchasing a few items, but in the end, there was some cryptic error that prevented him from finishing that order. Yet, when he checked his bank account, the money was gone.

You apologize sincerely and make a refund, but also don’t want this to happen again in the future. How do you debug that? If only you had some trace of what happened, ideally in the form of a chronological list of events with their context.

Whenever you find yourself doing print debugging, consider turning it into permanent log messages. This may help in situations like this, when you need to analyze a problem after it happened, in an environment that you don’t have access to.

There are sophisticated tools for log aggregation and searching, but at the most basic level, you can think of logs as text files. Each line conveys detailed information about an event in your system. Usually, it won’t contain personally identifying information, though, in some cases, it may be mandated by law.

Here’s a breakdown of a typical log record:

[2019-06-14 15:18:34,517][DEBUG][root][MainThread] Customer(id=123) logged out

As you can see, it has a structured form. Apart from a descriptive message, there are a few customizable fields, which provide the context of an event. Here, you have the exact date and time, the log level, the logger name, and the thread name.

Log levels allow you to filter messages quickly to reduce noise. If you’re looking for an error, you don’t want to see all the warnings or debug messages, for example. It’s trivial to disable or enable messages at certain log levels through the configuration, without even touching the code.

With logging, you can keep your debug messages separate from the standard output. All the log messages go to the standard error stream by default, which can conveniently show up in different colors. However, you can redirect log messages to separate files, even for individual modules!

Quite commonly, misconfigured logging can lead to running out of space on the server’s disk. To prevent that, you may set up log rotation, which will keep the log files for a specified duration, such as one week, or once they hit a certain size. Nevertheless, it’s always a good practice to archive older logs. Some regulations enforce that customer data be kept for as long as five years!

Compared to other programming languages, logging in Python is simpler, because the logging module is bundled with the standard library. You just import and configure it in as little as two lines of code:

importlogginglogging.basicConfig(level=logging.DEBUG)

You can call functions defined at the module level, which are hooked to the root logger, but more the common practice is to obtain a dedicated logger for each of your source files:

logging.debug('hello')# Module-level functionlogger=logging.getLogger(__name__)logger.debug('hello')# Logger's method

The advantage of using custom loggers is more fine-grain control. They’re usually named after the module they were defined in through the __name__ variable.

Note: There’s a somewhat related warnings module in Python, which can also log messages to the standard error stream. However, it has a narrower spectrum of applications, mostly in library code, whereas client applications should use the logging module.

That said, you can make them work together by calling logging.captureWarnings(True).

One last reason to switch from the print() function to logging is thread safety. In the upcoming section, you’ll see that the former doesn’t play well with multiple threads of execution.

Debugging

The truth is that neither tracing nor logging can be considered real debugging. To do actual debugging, you need a debugger tool, which allows you to do the following:

  • Step through the code interactively.
  • Set breakpoints, including conditional breakpoints.
  • Introspect variables in memory.
  • Evaluate custom expressions at runtime.

A crude debugger that runs in the terminal, unsurprisingly named pdb for “The Python Debugger,” is distributed as part of the standard library. This makes it always available, so it may be your only choice for performing remote debugging. Perhaps that’s a good reason to get familiar with it.

However, it doesn’t come with a graphical interface, so using pdb may be a bit tricky. If you can’t edit the code, you have to run it as a module and pass your script’s location:

$ python -m pdb my_script.py

Otherwise, you can set up a breakpoint directly in the code, which will pause the execution of your script and drop you into the debugger. The old way of doing this required two steps:

>>>
>>> importpdb>>> pdb.set_trace()--Return--> <stdin>(1)<module>()->None(Pdb)

This shows up an interactive prompt, which might look intimidating at first. However, you can still type native Python at this point to examine or modify the state of local variables. Apart from that, there’s really only a handful of debugger-specific commands that you want to use for stepping through the code.

Note: It’s customary to put the two instructions for spinning up a debugger on a single line. This requires the use of a semicolon, which is rarely found in Python programs:

importpdb;pdb.set_trace()

While certainly not Pythonic, it stands out as a reminder to remove it after you’re done with debugging.

Since Python 3.7, you can also call the built-in breakpoint() function, which does the same thing, but in a more compact way and with some additional bells and whistles:

defaverage(numbers):iflen(numbers)>0:breakpoint()# Python 3.7+returnsum(numbers)/len(numbers)

You’re probably going to use a visual debugger integrated with a code editor for the most part. PyCharm has an excellent debugger, which boasts high performance, but you’ll find plenty of alternative IDEs with debuggers, both paid and free of charge.

Debugging isn’t the proverbial silver bullet. Sometimes logging or tracing will be a better solution. For example, defects that are hard to reproduce, such as race conditions, often result from temporal coupling. When you stop at a breakpoint, that little pause in program execution may mask the problem. It’s kind of like the Heisenberg principle: you can’t measure and observe a bug at the same time.

These methods aren’t mutually exclusive. They complement each other.

Thread-Safe Printing

I briefly touched upon the thread safety issue before, recommending logging over the print() function. If you’re still reading this, then you must be comfortable with the concept of threads.

Thread safety means that a piece of code can be safely shared between multiple threads of execution. The simplest strategy for ensuring thread-safety is by sharing immutable objects only. If threads can’t modify an object’s state, then there’s no risk of breaking its consistency.

Another method takes advantage of local memory, which makes each thread receive its own copy of the same object. That way, other threads can’t see the changes made to it in the current thread.

But that doesn’t solve the problem, does it? You often want your threads to cooperate by being able to mutate a shared resource. The most common way of synchronizing concurrent access to such a resource is by locking it. This gives exclusive write access to one or sometimes a few threads at a time.

However, locking is expensive and reduces concurrent throughput, so other means for controlling access have been invented, such as atomic variables or the compare-and-swap algorithm.

Printing isn’t thread-safe in Python. The print() function holds a reference to the standard output, which is a shared global variable. In theory, because there’s no locking, a context switch could happen during a call to sys.stdout.write(), intertwining bits of text from multiple print() calls.

Note: A context switch means that one thread halts its execution, either voluntarily or not, so that another one can take over. This might happen at any moment, even in the middle of a function call.

In practice, however, that doesn’t happen. No matter how hard you try, writing to the standard output seems to be atomic. The only problem that you may sometimes observe is with messed up line breaks:

[Thread-3 A][Thread-2 A][Thread-1 A]

[Thread-3 B][Thread-1 B]


[Thread-1 C][Thread-3 C]

[Thread-2 B]
[Thread-2 C]

To simulate this, you can increase the likelihood of a context switch by making the underlying .write() method go to sleep for a random amount of time. How? By mocking it, which you already know about from an earlier section:

importsysfromtimeimportsleepfromrandomimportrandomfromthreadingimportcurrent_thread,Threadfromunittest.mockimportpatchwrite=sys.stdout.writedefslow_write(text):sleep(random())write(text)deftask():thread_name=current_thread().nameforletterin'ABC':print(f'[{thread_name}{letter}]')withpatch('sys.stdout')asmock_stdout:mock_stdout.write=slow_writefor_inrange(3):Thread(target=task).start()

First, you need to store the original .write() method in a variable, which you’ll delegate to later. Then you provide your fake implementation, which will take up to one second to execute. Each thread will make a few print() calls with its name and a letter: A, B, and C.

If you read the mocking section before, then you may already have an idea of why printing misbehaves like that. Nonetheless, to make it crystal clear, you can capture values fed into your slow_write() function. You’ll notice that you get a slightly different sequence each time:

['[Thread-3 A]','[Thread-2 A]','[Thread-1 A]','\n','\n','[Thread-3 B]',(...)]

Even though sys.stdout.write() itself is an atomic operation, a single call to the print() function can yield more than one write. For example, line breaks are written separately from the rest of the text, and context switching takes place between those writes.

Note: The atomic nature of the standard output in Python is a byproduct of the Global Interpreter Lock, which applies locking around bytecode instructions. Be aware, however, that many interpreter flavors don’t have the GIL, where multi-threaded printing requires explicit locking.

You can make the newline character become an integral part of the message by handling it manually:

print(f'[{thread_name}{letter}]\n',end='')

This will fix the output:

[Thread-2 A]
[Thread-1 A]
[Thread-3 A]
[Thread-1 B]
[Thread-3 B]
[Thread-2 B]
[Thread-1 C]
[Thread-2 C]
[Thread-3 C]

Notice, however, that the print() function still keeps making a separate call for the empty suffix, which translates to useless sys.stdout.write('') instruction:

['[Thread-2 A]\n','[Thread-1 A]\n','[Thread-3 A]\n','','','','[Thread-1 B]\n',(...)]

A truly thread-safe version of the print() function could look like this:

importthreadinglock=threading.Lock()defthread_safe_print(*args,**kwargs):withlock:print(*args,**kwargs)

You can put that function in a module and import it elsewhere:

fromthread_safe_printimportthread_safe_printdeftask():thread_name=current_thread().nameforletterin'ABC':thread_safe_print(f'[{thread_name}{letter}]')

Now, despite making two writes per each print() request, only one thread is allowed to interact with the stream, while the rest must wait:

[# Lock acquired by Thread-3 '[Thread-3 A]','\n',# Lock released by Thread-3# Lock acquired by Thread-1'[Thread-1 B]','\n',# Lock released by Thread-1(...)]

I added comments to indicate how the lock is limiting access to the shared resource.

Note: Even in single-threaded code, you might get caught up in a similar situation. Specifically, when you’re printing to the standard output and the standard error streams at the same time. Unless you redirect one or both of them to separate files, they’ll both share a single terminal window.

Conversely, the logging module is thread-safe by design, which is reflected by its ability to display thread names in the formatted message:

>>>
>>> importlogging>>> logging.basicConfig(format='%(threadName)s%(message)s')>>> logging.error('hello')MainThread hello

It’s another reason why you might not want to use the print() function all the time.

Python Print Counterparts

By now, you know a lot of what there is to know about print()! The subject, however, wouldn’t be complete without talking about its counterparts a little bit. While print() is about the output, there are functions and libraries for the input.

Built-In

Python comes with a built-in function for accepting input from the user, predictably called input(). It accepts data from the standard input stream, which is usually the keyboard:

>>>
>>> name=input('Enter your name: ')Enter your name: jdoe>>> print(name)jdoe

The function always returns a string, so you might need to parse it accordingly:

try:age=int(input('How old are you? '))exceptValueError:pass

The prompt parameter is completely optional, so nothing will show if you skip it, but the function will still work:

>>>
>>> x=input()hello world>>> print(x)hello world

Nevertheless, throwing in a descriptive call to action makes the user experience so much better.

Note: To read from the standard input in Python 2, you have to call raw_input() instead, which is yet another built-in. Unfortunately, there’s also a misleadingly named input() function, which does a slightly different thing.

In fact, it also takes the input from the standard stream, but then it tries to evaluate it as if it was Python code. Because that’s a potential security vulnerability, this function was completely removed from Python 3, while raw_input() got renamed to input().

Here’s a quick comparison of the available functions and what they do:

Python 2Python 3
raw_input()input()
input()eval(input())

As you can tell, it’s still possible to simulate the old behavior in Python 3.

Asking the user for a password with input() is a bad idea because it’ll show up in plaintext as they’re typing it. In this case, you should be using the getpass() function instead, which masks typed characters. This function is defined in a module under the same name, which is also available in the standard library:

>>>
>>> fromgetpassimportgetpass>>> password=getpass()Password: >>> print(password)s3cret

The getpass module has another function for getting the user’s name from an environment variable:

>>>
>>> fromgetpassimportgetuser>>> getuser()'jdoe'

Python’s built-in functions for handling the standard input are quite limited. At the same time, there are plenty of third-party packages, which offer much more sophisticated tools.

Third-Party

There are external Python packages out there that allow for building complex graphical interfaces specifically to collect data from the user. Some of their features include:

  • Advanced formatting and styling
  • Automated parsing, validation, and sanitization of user data
  • A declarative style of defining layouts
  • Interactive autocompletion
  • Mouse support
  • Predefined widgets such as checklists or menus
  • Searchable history of typed commands
  • Syntax highlighting

Demonstrating such tools is outside of the scope of this article, but you may want to try them out. I personally got to know about some of those through the Python Bytes Podcast. Here they are:

Nonetheless, it’s worth mentioning a command line tool called rlwrap that adds powerful line editing capabilities to your Python scripts for free. You don’t have to do anything for it to work!

Let’s assume you wrote a command-line interface that understands three instructions, including one for adding numbers:

print('Type "help", "exit", "add a [b [c ...]]"')whileTrue:command,*arguments=input('~ ').split(' ')iflen(command)>0:ifcommand.lower()=='exit':breakelifcommand.lower()=='help':print('This is help.')elifcommand.lower()=='add':print(sum(map(int,arguments)))else:print('Unknown command')

At first glance, it seems like a typical prompt when you run it:

$ python calculator.py
Type "help", "exit", "add a [b [c ...]]"~ add 1 2 3 410~ aad 2 3Unknown command~ exit$

But as soon as you make a mistake and want to fix it, you’ll see that none of the function keys work as expected. Hitting the Left arrow, for example, results in this instead of moving the cursor back:

$ python calculator.py
Type "help", "exit", "add a [b [c ...]]"~ aad^[[D

Now, you can wrap the same script with the rlwrap command. Not only will you get the arrow keys working, but you’ll also be able to search through the persistent history of your custom commands, use autocompletion, and edit the line with shortcuts:

$ rlwrap python calculator.py
Type "help", "exit", "add a [b [c ...]]"(reverse-i-search)`a': add 1 2 3 4

Isn’t that great?

Conclusion

You’re now armed with a body of knowledge about the print() function in Python, as well as many surrounding topics. You have a deep understanding of what it is and how it works, involving all of its key elements. Numerous examples gave you insight into its evolution from Python 2.

Apart from that, you learned how to:

  • Avoid common mistakes with print() in Python
  • Deal with newlines, character encodings and buffering
  • Write text to files
  • Mock the print() function in unit tests
  • Build advanced user interfaces in the terminal

Now that you know all this, you can make interactive programs that communicate with users or produce data in popular file formats. You’re able to quickly diagnose problems in your code and protect yourself from them. Last but not least, you know how to implement the classic snake game.

If you’re still thirsty for more information, have questions, or simply would like to share your thoughts, then feel free to reach out in the comments section below.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Curtis Miller: Learn Python Statistics and Machine Learning with My New Book: Training Systems using Python Statistical Modeling

$
0
0
Packt Publishing has turned another one of my video courses, Training Your Systems with Python Statistical Modeling, into a book! This book is now available for purchase. It is a tutorial book for Python statistics and machine learning.

CodeGrades: CodeGrades on Podcast.__init__

$
0
0

CodeGrades was recently on the Podcast.__init__ show where we had lots of fun exploring the links between music and coding education as a way to explain the concepts behind CodeGrades.

Podcast.__init__

The host, Tobias, asked plenty of interesting questions, giving us a chance to explain and unpack the high-level aspects of the project, tell the story so far and describe the various upcoming milestones. We were also able to touch upon related projects such as the Mu beginner’s code editor and PyperCard GUI framework, also aimed at beginner developers.

We were very pleased with Tobias’s positive reaction to CodeGrades.

Malthe Borch: Using built-in transparent compression on MacOS

$
0
0

Ever since DriveSpace on MS-DOS (or really, Stacker), we've had transparent file compression, with varying degrees of automation; in fact, while the DriveSpace-compression on MS-DOS was a fully automated affair, the built-in transparent compression in newer filesystems such as ZFS, Btrfs, APFS (and even HFS+), is engaged manually on a per-file or folder basis.

But no one's using it!

On my system, compressing /Applications saved 18GB (38.7%).

MacOS doesn't actually come with a utility to do this even though the core functionality is included, so you'll need to install an open source tool in order to use it.

$ brew install afsctool

To compress a file or folder, use the -c flag like so:

$ afsctool -c /Applications

(You might need to use root for some application and/or system files).


Kushal Das: git checkout to previous branch

$
0
0

We regularly move between git branches while working on projects. I always used to type in the full branch name, say to go back to develop branch and then come back to the feature branch. This generally takes a lot of typing (for the branch names etc.). I found out that we can use - like in the way we use cd - to go back to the previous directory we were in.

git checkout -

Here is a small video for demonstration.

I hope this will be useful for some people.

Continuum Analytics Blog: 4 Machine Learning Use Cases in the Automotive Sector

$
0
0

From parts suppliers to vehicle manufacturers, service providers to rental car companies, the automotive and related mobility industries stand to gain significantly from implementing machine learning at scale. We see the big automakers investing in…

The post 4 Machine Learning Use Cases in the Automotive Sector appeared first on Anaconda.

Python Insider: Inspect PyPI event logs to audit your account's and project's security

$
0
0
To help you check for security problems, PyPI is adding an advanced audit log of user actions beyond the current (existing) journal. This will, for instance, allow publishers to track all actions taken by third party services on their behalf.

This beta feature is live now on PyPI and on Test PyPI.

Background:
We're further increasing the security of the Python Package Index with another new beta feature: an audit log of sensitive actions that affect users and projects. This is thanks to a grant from the Open Technology Fund, coordinated by the Packaging Working Group of the Python Software Foundation.

Details:
Project security history display, listing
events (such as "file removed from release version 1.0.1")
with user, date/time, and IP address for each event.
We're adding a display so you can look at things that have happened in your user account or project, and check for signs someone's stolen your credentials.

In your account settings, you can view a log of sensitive actions from the last two weeks that are relevant to your user account, and if you are an Owner at least one project on PyPI, you can go to that project's Manage Project page to view a log of sensitive actions (performed by any user) relevant to that project. (And PyPI site administrators are able to view the full audit log for all users and all projects.)

Please help us test this, and report issues.

User security history display, listing
events (such as "API token added")
with additional details (such as token scope), date/time,
and IP address for each event.
In beta:
We're still refining this and may fail to log, or to properly display, events in the audit log. And the sensitive event logging and display starting on 16 August 2019, so you won't see sensitive events from before that date. (Read more technical details about implementation in the GitHub issue.)

Next:
We're continuing to refine all our beta features, while working on accessibility improvements and starting to work on localization on PyPI. Follow our progress reports in more detail on Discourse.

Twisted Matrix Labs: Twisted 19.7.0 Released

$
0
0
On behalf of Twisted Matrix Laboratories and our long-suffering release manager Amber Brown, I am honored to announce1 the release of Twisted 19.7.0!

The highlights of this release include:
  • A full description on the PyPI page!  Check it out here: https://pypi.org/project/Twisted/19.7.0/ (and compare to the slightly sad previous version, here: https://pypi.org/project/Twisted/19.2.1/)
  • twisted.test.proto_helpers has been renamed to "twisted.internet.testing"
    • This removes the gross special-case carve-out where it was the only "public" API in a test module, and now the rule is that all test modules are private once again.
  • Conch's SSH server now supports hmac-sha2-512.
  • The XMPP server in Twisted Words will now validate certificates!
  • A nasty data-corruption bug in the IOCP reactor was fixed. If you're doing high-volume I/O on Windows you'll want to upgrade!
  • Twisted Web no longer gives clients a traceback by default, both when you instantiate Site and when you use twist web on the command line.  You can turn this behavior back on for local development with twist web --display-tracebacks.
  • Several bugfixes and documentation fixes resolving bytes/unicode type confusion in twisted.web.
  • Python 3.4 is no longer supported.
pip install -U twisted[tls] and enjoy all these enhancements today!

Thanks for using Twisted,

-glyph

1: somewhat belatedly: it came out 10 days ago.  Oops!

Test and Code: 83: PyBites Code Challenges behind the scenes - Bob Belderbos

$
0
0

Bob Belderbos and Julian Sequeira started PyBites a few years ago.
They started doing code challanges along with people around the world and writing about it.

Then came the codechalleng.es platform, where you can do code challenges in the browser and have your answer checked by pytest tests. But how does it all work?

Bob joins me today to go behind the scenes and share the tech stack running the PyBites Code Challenges platform.

We talk about the technology, the testing, and how it went from a cool idea to a working platform.

Special Guest: Bob Belderbos.

Sponsored By:

Support Test & Code - Python Testing & Development

Links:

<p>Bob Belderbos and Julian Sequeira started <a href="https://pybit.es/" rel="nofollow">PyBites</a> a few years ago.<br> They started doing code challanges along with people around the world and writing about it. </p> <p>Then came the <a href="https://codechalleng.es/" rel="nofollow">codechalleng.es</a> platform, where you can do code challenges in the browser and have your answer checked by pytest tests. But how does it all work?</p> <p>Bob joins me today to go behind the scenes and share the tech stack running the PyBites Code Challenges platform.</p> <p>We talk about the technology, the testing, and how it went from a cool idea to a working platform.</p><p>Special Guest: Bob Belderbos.</p><p>Sponsored By:</p><ul><li><a href="https://testandcode.com/pycharm" rel="nofollow">PyCharm Professional</a>: <a href="https://testandcode.com/pycharm" rel="nofollow">PyCharm is designed by programmers, for programmers, to provide all the tools you need for productive Python development.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code - Python Testing & Development</a></p><p>Links:</p><ul><li><a href="https://pybit.es/" title="PyBites" rel="nofollow">PyBites</a></li><li><a href="https://codechalleng.es/" title="PyBites Code Challenges coding platform" rel="nofollow">PyBites Code Challenges coding platform</a></li><li><a href="https://codechalleng.es/bites/paths" title="Learning Paths" rel="nofollow">Learning Paths</a></li><li><a href="https://pybit.es/whiteboard-interviews.html" title="Julian's article on whiteboard interviews" rel="nofollow">Julian's article on whiteboard interviews</a></li></ul>

PyCharm: PyCharm 2019.2.1 RC

$
0
0

PyCharm 2019.2.1 release candidate is available now!

Fixed in this Version

  • An issue that caused debugger functions like “Step into” to not work properly in our latest release was solved.
  • AltGr keymaps for certain characters that were not working are now fixed.

Further Improvements

  • New SQL completion suggestions of join conditions based on column or table name match and auto-inject SQL by literals.
  • Some JavaScript and Vue.js inspection issues were resolved.
  • And more, check out our release notes for more details.

Getting the New Version

Download the RC from Confluence.

The release candidate (RC) is not an early access program (EAP) build, and does not bundle an EAP license. If you get the PyCharm Professional Edition RC, you will either need a currently active PyCharm subscription, or you will receive a 30-day free trial.

Stack Abuse: Basics of Memory Management in Python

$
0
0

Introduction

Memory management is the process of efficiently allocating, de-allocating, and coordinating memory so that all the different processes run smoothly and can optimally access different system resources. Memory management also involves cleaning memory of objects that are no longer being accessed.

In Python, the memory manager is responsible for these kinds of tasks by periodically running to clean up, allocate, and manage the memory. Unlike C, Java, and other programming languages, Python manages objects by using reference counting. This means that the memory manager keeps track of the number of references to each object in the program. When an object's reference count drops to zero, which means the object is no longer being used, the garbage collector (part of the memory manager) automatically frees the memory from that particular object.

The user need not to worry about memory management as the process of allocation and de-allocation of memory is fully automatic. The reclaimed memory can be used by other objects.

Python Garbage Collection

As explained earlier, Python deletes objects that are no longer referenced in the program to free up memory space. This process in which Python frees blocks of memory that are no longer used is called Garbage Collection. The Python Garbage Collector (GC) runs during the program execution and is triggered if the reference count reduces to zero. The reference count increases if an object is assigned a new name or is placed in a container, like tuple or dictionary. Similarly, the reference count decreases when the reference to an object is reassigned, when the object's reference goes out of scope, or when an object is deleted.

The memory is a heap that contains objects and other data structures used in the program. The allocation and de-allocation of this heap space is controlled by the Python Memory manager through the use of API functions.

Python Objects in Memory

Each variable in Python acts as an object. Objects can either be simple (containing numbers, strings, etc.) or containers (dictionaries, lists, or user defined classes). Furthermore, Python is a dynamically typed language which means that we do not need to declare the variables or their types before using them in a program.

For example:

>>> x = 5
>>> print(x)
5
>>> del x
>>> print(x)
Traceback (most reent call last):
  File "<mem_manage>", line 1, in <module>
    print(x)
NameError : name 'x' is not defined

If you look at the first 2 lines of the above program, object x is known. When we delete the object x and try to use it, we get an error stating that the variable x is not defined.

You can see that the garbage collection in Python is fully automated and the programmer does not need worry about it, unlike languages like C.

Modifying the Garbage Collector

The Python garbage collector has three generations in which objects are classified. A new object at the starting point of it's life cycle is the first generation of the garbage collector. As the object survives garbage collection, it will be moved up to the next generations. Each of the 3 generations of the garbage collector has a threshold. Specifically, when the threshold of number of allocations minus the number of de0allocations is exceeded, that generation will run garbage collection.

Earlier generations are also garbage collected more often than the higher generations. This is because newer objects are more likely to be discarded than old objects.

The gc module includes functions to change the threshold value, trigger a garbage collection process manually, disable the garbage collection process, etc. We can check the threshold values of different generations of the garbage collector using the get_threshold() method:

import gc
print(gc.get_threshold())

Sample Output:

(700, 10, 10)

As you see, here we have a threshold of 700 for the first generation, and 10 for each of the other two generations.

We can alter the threshold value for triggering the garbage collection process using the set_threshold() method of the gc module:

gc.set_threshold(900, 15, 15)

In the above example, we have increased the threshold value for all the 3 generations. Increasing the threshold value will decrease the frequency of running the garbage collector. Normally, we need not think too much about Python's garbage collection as a developer, but this may be useful when optimizing the Python runtime for your target system. One of the key benefits is that Python's garbage collection mechanism handles a lot of low-level details for the developer automatically.

Why Perform Manual Garbage Collection?

We know that the Python interpreter keeps a track of references to objects used in a program. In earlier versions of Python (until version 1.6), the Python interpreter used only the reference counting mechanism to handle memory. When the reference count drops to zero, the Python interpreter automatically frees the memory. This classical reference counting mechanism is very effective, except that it fails to work when the program has reference cycles. A reference cycle happens if one or more objects are referenced each other, and hence the reference count never reaches zero.

Let's consider an example.

>>> def create_cycle():
...     list = [8, 9, 10]
...     list.append(list)
...     return list
... 
>>> create_cycle()
[8, 9, 10, [...]]

The above code creates a reference cycle, where the object list refers to itself. Hence, the memory for the object list will not be freed automatically when the function returns. The reference cycle problem can't be solved by reference counting. However, this reference cycle problem can be solved by change the behavior of the garbage collector in your Python application.

To do so, we can use the gc.collect() function of the gc module.

import gc
n = gc.collect()
print("Number of unreachable objects collected by GC:", n)

The gc.collect() returns the number of objects it has collected and de-allocated.

There are two ways to perform manual garbage collection: time-based or event-based garbage collection.

Time-based garbage collection is pretty simple: the gc.collect() function is called after a fixed time interval.

Event-based garbage collection calls the gc.collect() function after an event occurs (i.e. when the application is exited or the application remains idle for a specific time period).

Let's understand the manual garbage collection work by creating a few reference cycles.

import sys, gc

def create_cycle():
    list = [8, 9, 10]
    list.append(list)

def main():
    print("Creating garbage...")
    for i in range(8):
        create_cycle()

    print("Collecting...")
    n = gc.collect()
    print("Number of unreachable objects collected by GC:", n)
    print("Uncollectable garbage:", gc.garbage)

if __name__ == "__main__":
    main()
    sys.exit()

The output is as below:

Creating garbage...
Collecting...
Number of unreachable objects collected by GC: 8
Uncollectable garbage: []

The script above creates a list object that is referred by a variable, creatively named list. The first element of the list object refers to itself. The reference count of the list object is always greater than zero even if it is deleted or out of scope in the program. Hence, the list object is not garbage collected due to the circular reference. The garbage collector mechanism in Python will automatically check for, and collect, circular references periodically.

In the above code, as the reference count is at least 1 and can never reach 0, we have forcefully garbage collected the objects by calling gc.collect(). However, remember not to force garbage collection frequently. The reason is that even after freeing the memory, the GC takes time to evaluate the object's eligibility to be garbage collected, taking up processor time and resources. Also, remember to manually manage the garbage collector only after your app has started completely.

Conclusion

In this article, we discussed how memory management in Python is handled automatically by using reference counting and garbage collection strategies. Without garbage collection, implementing a successful memory management mechanism in Python is impossible. Also, programmers need not worry about deleting allocated memory, as it is taken care by Python memory manager. This leads to fewer memory leaks and better performance.


Vinta Software: PyBay 2019: Talking about Python in SF

$
0
0
We are back to San Francisco! Our team will be joining PyBay's conference, one of the biggest Python events in the Bay Area. For this year, we'll be giving the talk: Building effective Django queries with expressions. PyBay has been a fantastic place to meet new people, connect with new ideas, and integrate this thriving community. Here is the sl

Quansight Labs Blog: Spyder 4.0 beta4: Kite integration is here

$
0
0

Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, i.e., Matplotlib, NumPy, SciPy that cannot be obtained easily by using traditional code analysis packages such as Jedi.

alt_text

Read more… (3 min remaining to read)

Brett Cannon: How do you verify that PyPI can be trusted?

$
0
0

A co-worker of mine attended a technical talk about how Go's module mirror works and he asked me whether there was something there that Python should do.

Now Go's packaging story is rather different from Python's since in Go you specify the location of a module by the URL you fetch it from, e.g. github.com/you/hello specifies the hello module as found at https://github.com/you/hello. This means Go's module ecosystem is distributed, which leads to interesting problems of caching so code doesn't disappear off the internet (e.g. a left-pad incident), and needing to verify that a module's provider isn't suddenly changing the code they provide with something malicious.

But since the Python community has PyPI our problems are slightly different in that we just have to worry about a single point of failure (which has its own downsides). Now obviously you can run your own mirror of PyPI (and plenty of companies do), but for the general community no one wants to bother to set something up like that and try to keep it maintained (do you really need your own mirror to download some dependencies for the script you just wrote to help clean up your photos from your latest trip?). But we should still care about whether PyPI has been compromised such that packages hosted there have not been tampered with somehow between when the project owner uploaded their release's files and from when you download them.

Verifying PyPI is okay

So the first thing we can do is see if we can tell if PyPI has been compromised somehow. This takes on two different levels of complexity. One is checking if post-release anything nefarious has occurred. The fancier step is to provide a way for project owners to tell other folks what they are giving PyPI to act as an auditor.

Post-release trust

In a post-release scenario you're trusting that PyPI received a release from a project owner successfully and safely. What you're worrying about here is that at some later point PyPI gets compromised and someone e.g. swapped out the files in requests so that someone could steal some Bitcoin. So what are some options here?

Trust PyPI

The simplest one is don't worry about it. 😁 PyPI is run by some very smart, dedicated folks and so if you feel comfortable trusting them to not mess up then you can simply not stress about compromises.

Trust PyPI up to when you froze your dependencies

Now perhaps you do generally trust the PyPI administrators and don't think anything has happened yet, but you wouldn't mind a fairly cheap way that's available to today to make sure nothing fishy happens in the future. In that case you can record the hashes of your locked dependencies. (If you're an app developer you are locking your dependencies, right?)

Basically what you do is you have whatever tool you're using to lock your dependencies – e.g. pip-tools, pipenv, poetry– record the hash of the files you depend on upon locking. That way in the future you can check for yourself that the files you downloaded from PyPI match bit-for-bit what you previously downloaded and used. Now this doesn't guarantee that what you initially downloaded when you froze your dependencies didn't contain compromised code, but at least you know going forward nothing questionable has occurred.

Trust PyPI or an independent 3rd-party since they started running

Now we're into the "someone would have to do work to make this happen" realm; everything up until now you can do today, but this idea requires money (although PyPI still requires money to simply function as well, so please have your company donate if you use PyPI at work).

What one could do is run a 3rd-party service that records all the hashes of files that end up on PyPI. That way, if one wanted to see if the hash from PyPI hasn't changed since the 3rd-party service started running then one could simply ask the 3rd-party service for the hash for whatever file they want from PyPI, ask PyPI what they think the hash should be, and then check if the hashes match. If they do match then you should be able to trust the hashes, but if they differ then either PyPI or the 3rd-party service is compromised.

Now this is predicated on the idea that the 3rd-party service is truly 3rd-party. If any staff is shared between the 3rd-party service and PyPI then that's a potential point of compromise. This is also assuming that PyPI has not already been compromised. But at least in this scenario the point in time where your trust in PyPI starts from when the 3rd-party service began running and not when you locked your dependencies.

You can also extend this out to multiple 3rd-parties recording file hashes so that you can compare hashes against multiple sources. This not only makes it harder by forcing someone to compromise multiple services in order to cover up a file change, but if someone is compromised you could choose to use quorum to decide who's right and who's wrong.

Auditing what everyone claims

This entire blog post started because of a Twitter thread about how to be able to validate what PyPI claims. At some point I joked that I was shocked no one had mentioned the blockchain yet. And that's when I was informed that Certificate Transparency logs are basically what we would want and they use something called Merkle hash trees that started with P2P networks and have been used in blockchains.

I'm not going to go into all the details as how Certificate Transparency works, but basically they use an append-only log that can be cryptographically verified as having not been manipulated (and you could totally treat recording hashes of files on PyPI as an append-only log).

There are two very nice properties of these hash trees. One is it is very cheap to verify when an update has been made that all the previous entries in the log have not changed. Basically what you need is some key values from the previous version of the hash tree so that when you add new values to the tree and re-balance it's easy to verify the old stuff is still the same. This is great to help monitor for manipulation of previous data while also making it easy to add to the log.

The second property is that checking an entry hasn't been tampered with can be done without having the entire tree available. Basically you only need all nodes along a path from a leaf node to the root plus all immediate siblings of those nodes. This means that even if your hash tree has a massive amount of leaf nodes it doesn't take much to audit that a single leaf node has not not changed.

So all of this leads to a nice system to help keep PyPI honest if you can assume the initial hashes are reliable.

Release-in-progress trust

So all of the above scenarios assume PyPI was secure at the time of initially receiving a file but then potentially was compromised later. But how could we check that PyPI isn't already compromised?

One idea I had was that twine could upload a project releases' hashes to some trusted 3rd-parties as well as to PyPI. Then the 3rd-parties could either directly compare the hashes PyPI claims to have to what they were given independently or they could use their data to create that release's entry in the append-only hash tree log and see if the final hash matched what PyPI claims. And if a 3rd-party wasn't given some hashes by the project owner then they could simply fill in with what PyPI has. But the key point is that by having the project owner directly share hashes with 3rd-parties that are monitoring PyPI we would then have a way to detect if PyPI isn't providing files as the project owner expected.

Making PyPI harder to hack

Now obviously it would be best if PyPI was as hard to compromise as possible as well as detecting compromises on its own. There are actually two PEPs on the topic: PEP 458 and PEP 480. I'm not going to go into details since that's why we have PEPs, but people have thought through how to make PyPI hard to compromise as well as how to detect it.

But knowing that a design is available, you may be wondering why hasn't it been implemented?

What can you do to help?

There is a major reason why the ideas above have not been implemented: money. People using Python for personal projects typically don't worry about this sort of stuff because it just isn't a big concern, so people are not chomping at the bit to implement any of this for fun in their spare time. But for any business relying on packages coming from PyPI, it should be a concern since their business relies on the integrity of PyPI and the Python ecosystem. And so if you work for a company that uses packages from PyPI, then please consider having the company donate to the packaging WG (you can also find the link by going to PyPI and clicking the "Donate" button). Previous donations got us the current back-end and look of PyPI as well as the recent work to add two-factor authentication and API tokens, so they already know how to handle donations and turning them into results. So if anything I talked about here sounds worth doing, then please consider donating to help making it so they can happen.

Codementor: Creating a Docker Swarm Stack with Terraform (Terrascript Python), Persistent Volumes and Dynamic HAProxy.

$
0
0
This article demonstrate how to create a Docker Swarm cluster with Volume, Firewall, DNS and Load Balance using terraform wrapped by a python script.

TechBeamers Python: Python Filter()

$
0
0

Python filter() function applies another function on a given iterable (List/String/Dictionary, etc.) to test which of its item to keep or discard. In simple words, it filters the ones that don’t pass the test and returns the rest as a filter object. The filter object is of the iterable type. It retains those elements which the function passed by returning True. We can also convert it to List or Tuple or other types using their factory functions. In this tutorial, you’ll learn how to use the filter() function with different types of sequences. Also, you can refer to the examples

The post Python Filter() appeared first on Learn Programming and Software Testing.

Viewing all 22874 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>