Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Daniel Bader: The 4 Major Ways to Do String Formatting in Python

$
0
0

The 4 Major Ways to Do String Formatting in Python

Remember the Zen of Python and how there should be “one obvious way to do something in Python”? You might scratch your head when you find out that there are *four* major ways to do string formatting in Python.

String Formatting in Python (updated for Python 3.6 and above)

In this article I’ll demonstrate how these four string formatting approaches work and what their respective strengths and weaknesses are. I’ll also give you my simple “rule of thumb” for how I pick the best general purpose string formatting approach.

Let’s jump right in, as we’ve got a lot to cover. In order to have a simple toy example for experimentation, let’s assume we’ve got the following variables (or constants, really) to work with:

>>>errno=50159747054>>>name='Bob'

And based on these variables we’d like to generate an output string with the following error message:

'Hey Bob, there is a 0xbadc0ffee error!'

Hey… now that error could really spoil a dev’s Monday morning. But we’re here to discuss string formatting. So let’s get to work.

#1 – “Old Style” String Formatting (%-operator)

Strings in Python have a unique built-in operation that can be accessed with the %-operator. This lets you do simple positional formatting very easily. If you’ve ever worked with a printf-style function in C you’ll recognize how this works instantly. Here’s a simple example:

>>>'Hello, %s'%name"Hello, Bob"

I’m using the %s format specifier here to tell Python where to substitute the value of name, represented as a string.

There are other format specifiers available that let you control the output format. For example it’s possible to convert numbers to hexadecimal notation or to add whitespace padding to generate nicely formatted tables and reports (cf. Python Docs: “printf-style String Formatting”).

Here, we can use the %x format specifier to convert an int value to a string and to represent it as a hexadecimal number:

>>>'%x'%errno'badc0ffee'

The “old style” string formatting syntax changes slightly if you want to make multiple substitutions in a single string. Because the %-operator only takes one argument you need to wrap the right-hand side in a tuple, like so:

>>>'Hey %s, there is a 0x%x error!'%(name,errno)'Hey Bob, there is a 0xbadc0ffee error!'

It’s also possible to refer to variable substitutions by name in your format string, if you pass a mapping to the %-operator:

>>>'Hey %(name)s, there is a 0x%(errno)x error!'%{..."name":name,"errno":errno}'Hey Bob, there is a 0xbadc0ffee error!'

This makes your format strings easier to maintain and easier to modify in the future. You don’t have to worry about making sure the order you’re passing in the values matches up with the order the values are referenced in the format string. Of course the downside is that this technique requires a little more typing.

I’m sure you’ve been wondering why this printf-style formatting is called “old style” string formatting. It was technically superseded by “new style” formatting, which we’re going to talk about in a minute. But while “old style” formatting has been de-emphasized it hasn’t been deprecated. It is still supported in the latest versions of Python.

#2 – “New Style” String Formatting (str.format)

Python 3 introduced a new way to do string formatting that was also later back-ported to Python 2.7. This “new style” string formatting gets rid of the %-operator special syntax and makes the syntax for string formatting more regular. Formatting is now handled by calling a format() function on a string object (cf. Python Docs: “str.format()”).

You can use the format() function to do simple positional formatting, just like you could with “old style” formatting:

>>>'Hello, {}'.format(name)'Hello, Bob'

Or, you can refer to your variable substitutions by name and use them in any order you want. This is quite a powerful feature as it allows for re-arranging the order of display without changing the arguments passed to the format function:

>>>'Hey {name}, there is a 0x{errno:x} error!'.format(...name=name,errno=errno)'Hey Bob, there is a 0xbadc0ffee error!'

This also shows that the syntax to format an int variable as a hexadecimal string has changed. Now we need to pass a format spec by adding a :x suffix. The format string syntax has become more powerful without complicating the simpler use cases. It pays off to read up on this string formatting mini-language in the Python documentation (cf. Python Docs: “Format String Syntax”).

In Python 3, this “new style” string formatting is to be preferred over %-style formatting. However, starting with Python 3.6 there’s an even better way to format your strings. I’ll tell you all about it in the next section.

#3 – Literal String Interpolation (Python 3.6+)

Python 3.6 adds yet another way to format strings called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:

>>>f'Hello, {name}!''Hello, Bob!'

This new formatting syntax is very powerful. Because you can embed arbitrary Python expressions you can even do inline arithmetic with it. See here for example:

>>>a=5>>>b=10>>>f'Five plus ten is {a + b} and not {2 * (a + b)}.''Five plus ten is 15 and not 30.'

String literals also support the existing format string syntax of the str.format() method. That allows you to solve the same formatting problems we’ve discussed in the previous two sections:

>>>f"Hey {name}, there's a {errno:#x} error!""Hey Bob, there's a 0xbadc0ffee error!"

Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015. I think they’re quite a nice addition to the language and I’ve already started using them in my day to day (Python 3) work. You can learn more about Formatted String Literals in the official Python documentation (cf. Python Docs: “Formatted string literals”).

#4 – Template Strings (standard library)

Here’s one more technique for string formatting in Python: Template Strings. It’s a simpler and less powerful mechanism, but in some cases this might be exactly what you’re looking for.

Let’s take a look at a simple greeting example:

>>>fromstringimportTemplate>>>t=Template('Hey, $name!')>>>t.substitute(name=name)'Hey, Bob!'

You see here that we need to import the Template class from Python’s built-in string module. Template strings are not a core language feature but they’re supplied by a module in the standard library.

Another difference is that template strings don’t allow format specifiers. So in order to get our error string example to work we need to transform our int error number into a hex-string ourselves:

>>>templ_string='Hey $name, there is a $error error!'>>>Template(templ_string).substitute(...name=name,error=hex(errno))'Hey Bob, there is a 0xbadc0ffee error!'

That worked great. So when should you use template strings in your Python programs? In my opinion the best use case for template strings is when you’re handling format strings generated by users of your program. Due to their reduced complexity template strings are a safer choice.

The more complex formatting mini-languages of the other string formatting techniques might introduce security vulnerabilities to your programs. For example, it’s possible for format strings to access arbitrary variables in your program.

That means, if a malicious user can supply a format string they can potentially leak secret keys and other sensible information! Here’s a simple proof of concept of how this attack might be used:

>>>SECRET='this-is-a-secret'>>>classError:...def__init__(self):...pass>>>err=Error()>>>user_input='{error.__init__.__globals__[SECRET]}'# Uh-oh...>>>user_input.format(error=err)'this-is-a-secret'

See how a hypothetical attacker was able to extract our secret string by accessing the __globals__ dictionary? Scary, huh? Template Strings close this attack vector. And this makes them a safer choice if you’re handling format strings generated from user input:

>>>user_input='${error.__init__.__globals__[SECRET]}'>>>Template(user_input).substitute(error=err)ValueError:"Invalid placeholder in string: line 1, col 1"

Which String Formatting Method Should I Use?

I totally get that having so much choice for how to format your strings in Python can feel very confusing. This is an excellent cue to bust out this handy flowchart infographic I’ve put together for you:

String Formatting in Python -- Flowchart

This flowchart is based on the following rule of thumb that I apply when I’m writing Python:

Dan’s Python String Formatting Rule of Thumb:

If your format strings are user-supplied, use Template Strings (#4) to avoid security issues. Otherwise, use Literal String Interpolation (#3) if you’re on Python 3.6+, and “New Style” str.format (#2) if you’re not.


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>