Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22851

Python Morsels: String methods in Python

$
0
0

Python's strings have dozens of methods, but some are much more useful than others. Let's discuss the dozen-ish must-know string methods and why the other methods aren't so essential.

The most useful string methods

Here are the dozen-ish Python string methods I recommend committing to memory.

MethodRelated MethodsDescription
joinJoin iterable of strings by a separator
splitrsplitSplit (on whitespace by default) into list of strings
replaceReplace all copies of one substring with another
striprstrip& lstripRemove whitespace from the beginning and end
casefoldlower& upperReturn a case-normalized version of the string
startswithCheck if string starts with 1 or more other strings
endswithCheck if string ends with 1 or more other strings
splitlinesSplit into a list of lines
formatFormat the string (consider an f-string before this)
countCount how many times a given substring occurs
removeprefixRemove the given prefix
removesuffixRemove the given suffix

You might be wondering "wait why is my favorite method not in that list?" I'll briefly explain the rest of the methods and my thoughts on them below. But first, let's look at each of the above methods.

join

If you need to convert a list to a string in Python, the string join method is what you're looking for.

>>> colors=["purple","blue","green","orange"]>>> joined_colors=", ".join(colors)>>> joined_colors'purple, blue, green, orange'
>>> colors=["purple","blue","green","orange"]>>> joined_colors=", ".join(colors)>>> joined_colors'purple, blue, green, orange'

The join method can concatenate a list of strings into a single string, but it will accept any other iterable of strings as well.

>>> digits=range(10)>>> digit_string="".join(str(n)fornindigits)>>> digit_string'0123456789'
>>> digits=range(10)>>> digit_string="".join(str(n)fornindigits)>>> digit_string'0123456789'

split

If you need to break a string into smaller strings based on a separator, you need the string split method.

>>> time="1:19:48">>> parts=time.split(":")>>> parts['1', '19', '48']
>>> time="1:19:48">>> parts=time.split(":")>>> parts['1', '19', '48']

Your separator can be any substring. We're splitting by a : above, but we could also split by ->:

>>> graph="A->B->C->D">>> graph.split("->")('A', 'B', 'C', 'D')
>>> graph="A->B->C->D">>> graph.split("->")('A', 'B', 'C', 'D')

You usually wouldn't want to call split with a space character:

>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split("")['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split("")['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']

Splitting on the space character works, but often when splitting on spaces it's actually more useful to split on all whitespace.

Calling split method no arguments will split on any consecutive whitespace characters:

>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split()['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split()['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']

Note that split without any arguments also removes leading and trailing whitespace.

There's one more split feature that folks sometimes overlook: the maxsplit argument. When calling split with a maxsplit value, Python will split the string up that number of times. This is handy when you only care about the first one or two occurrences of a separator in a string:

>>> line="Rubber duck|5|10">>> item_name,the_rest=line.split("|",maxsplit=1)>>> item_name'Rubber duck'
>>> line="Rubber duck|5|10">>> item_name,the_rest=line.split("|",maxsplit=1)>>> item_name'Rubber duck'

If it's the last couple occurrences of a separator that you care about, you'll want to use the string rsplit method instead:

>>> the_rest,amount=line.rsplit("|",maxsplit=1)>>> amount'10'
>>> the_rest,amount=line.rsplit("|",maxsplit=1)>>> amount'10'

With the exception of calling split method without any arguments, there's no way to ignore repeated separators or trailing/leading separators or to supports multiple separators at once. If you need any of those features, you'll want to look into regular expressions (specifically the re.split function).

replace

Need to replace one substring (a string within a string) with another? That's what the string replace method is for!

>>> message="JavaScript is lovely">>> message.replace("JavaScript","Python")'Python is lovely'
>>> message="JavaScript is lovely">>> message.replace("JavaScript","Python")'Python is lovely'

The replace method can also be used for removing substrings, by replacing them with an empty string:

>>> message="Python is lovely!!!!">>> message.replace("!","")'Python is lovely'
>>> message="Python is lovely!!!!">>> message.replace("!","")'Python is lovely'

There's also an optional count argument, in case you only want to replace the first N occurrences:

>>> message="Python is lovely!!!!">>> message.replace("!","?",2)'Python is lovely??!!'
>>> message="Python is lovely!!!!">>> message.replace("!","?",2)'Python is lovely??!!'

strip

The strip method is for removing whitespace from the beginning and end of a string:

>>> text="""... Hello!... This is a multi-line string.... """>>> text'\nHello!\nThis is a multi-line string.\n'>>> stripped_text=text.strip()>>> stripped_text'Hello!\nThis is a multi-line string.'
>>> text="""... Hello!... This is a multi-line string.... """>>> text'\nHello!\nThis is a multi-line string.\n'>>> stripped_text=text.strip()>>> stripped_text'Hello!\nThis is a multi-line string.'

If you just need to remove whitespace from the end of the string (but not the beginning), you can use the rstrip method:

>>> line="    Indented line with trailing spaces  \n">>> line.rstrip()'    Indented line with trailing spaces'
>>> line="    Indented line with trailing spaces  \n">>> line.rstrip()'    Indented line with trailing spaces'

And if you need to strip whitespace from just the beginning, you can use the lstrip method:

>>> line="    Indented line with trailing spaces  \n">>> line.lstrip()'Indented line with trailing spaces  \n'
>>> line="    Indented line with trailing spaces  \n">>> line.lstrip()'Indented line with trailing spaces  \n'

Note that by default strip, lstrip, and rstrip remove all whitespace characters (space, tab, newline, etc.). You can also specify a specific character to remove instead. Here we're removing any trailing newline characters but leaving other whitespace intact:

>>> line="Line 1\n">>> line'Line 1\n'>>> line.rstrip("\n")'Line 1'
>>> line="Line 1\n">>> line'Line 1\n'>>> line.rstrip("\n")'Line 1'

Note that strip, lstrip, and rstrip will also accept a string of multiple characters to strip.

>>> words=['I','enjoy','Python!','Do','you?','I','hope','so.']>>> [w.strip(".!?")forwinwords]['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']
>>> words=['I','enjoy','Python!','Do','you?','I','hope','so.']>>> [w.strip(".!?")forwinwords]['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']

Passing multiple characters will strip all of those characters, but they'll be treated as individual characters (not as a substring).

If you need to strip a multi-character substring instead of individual characters, see removesuffix and removeprefix below.

casefold

Need to uppercase a string? There's an upper method for that:

>>> name="Trey">>> name.upper()'TREY'
>>> name="Trey">>> name.upper()'TREY'

Need to lowercase a string? There's a lower method for that:

>>> name="Trey">>> name.lower()'trey'
>>> name="Trey">>> name.lower()'trey'

What if you're trying to do a case-insensitive comparison between strings? You could lowercase or uppercase all of your strings for the comparison. Or you could use the string casefold method:

>>> name="Trey">>> "t"innameFalse>>> "t"inname.casefold()True
>>> name="Trey">>> "t"innameFalse>>> "t"inname.casefold()True

But wait, isn't casefold just the same thing as lower?

>>> name="Trey">>> name.casefold()'trey'
>>> name="Trey">>> name.casefold()'trey'

Almost. If you're working with ASCII characters, casefold does exactly the same thing as the string lower method.

But if you have non-ASCII characters (see Unicode character encodings in Python), there are some characters that casefold handles uniquely.

There are a few hundred characters that normalize differently between the lower and casefold methods. If you're working with text using the International Phonetic alphabet or text written in Greek, Cyrillic, Armenian, Cherokee, and large handful of other languages you should probably use casefold instead of lower.

Do keep in mind that casefold doesn't solve all text normalization issues though. It's possible to represent the same data in multiple ways in Python, so you'll need to look into Unicode data normalization and Python's unicodedata module if you think you'll be comparing non-ASCII text often.

startswith

The string startswith method can check whether one string is a prefix of another string:

>>> property_id="UA-1234567">>> property_id.startswith("UA-")True
>>> property_id="UA-1234567">>> property_id.startswith("UA-")True

The alternative to startswith is to slice the bigger string and do an equality check:

>>> property_id="UA-1234567">>> prefix="UA-">>> property_id[:len(prefix)]==prefixTrue
>>> property_id="UA-1234567">>> prefix="UA-">>> property_id[:len(prefix)]==prefixTrue

That works, but it's awkward.

You can also quickly check whether one strings starts with many different substrings by passing a tuple of substrings to startswith.

Here we're checking whether each string in a list starts with a vowel to determine whether the article "an" or "a" should be used:

>>> names=["Go","Elixir","OCaml","Rust"]>>> fornameinnames:... ifname.startswith(("A","E","I","O","U")):... print(f"An {name} program")... else:... print(f"A {name} program")...A Go programAn Elixir programAn OCaml programA Rust program
>>> names=["Go","Elixir","OCaml","Rust"]>>> fornameinnames:... ifname.startswith(("A","E","I","O","U")):... print(f"An {name} program")... else:... print(f"A {name} program")...A Go programAn Elixir programAn OCaml programA Rust program

Note that startswith returns True if any if the string starts with any of the given substrings.

Many long-time Python programmers often overlook the fact that startswith will accept either a single string or a tuple of strings.

endswith

The endswith method can check whether one string is a suffix of another string.

The string endswith method works pretty much like the startswith method.

It works with a single string:

>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith(".min.js")True
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith(".min.js")True

But it also accepts a tuple of strings:

>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith((".min.js",".min.css"))True
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith((".min.js",".min.css"))True

Just as with startswith, when endswith is given a tuple, it returns True if our string ends with any of the strings in that tuple.

splitlines

The splitlines method is specifically for splitting up strings into lines.

>>> text="I'm Nobody! Who are you?\nAre you – Nobody – too?">>> text.splitlines()["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
>>> text="I'm Nobody! Who are you?\nAre you – Nobody – too?">>> text.splitlines()["I'm Nobody! Who are you?", 'Are you – Nobody – too?']

Why make a separate method just for splitting into lines? Couldn't we just use the split method with \n instead?

>>> text.split("\n")["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
>>> text.split("\n")["I'm Nobody! Who are you?", 'Are you – Nobody – too?']

While that does work in some cases, sometimes newlines are represented by \r\n or simply \r instead of \n. If you don't know exactly what line endings your text uses, splitlines can be handy.

>>> text="Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?">>> text.split("\n")['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']>>> text.splitlines()['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']
>>> text="Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?">>> text.split("\n")['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']>>> text.splitlines()['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']

But there's an even more useful reason to use splitlines: it's quite common for text to end in a trailing newline character.

>>> zen="Flat is better than nested.\nSparse is better than dense.\n"
>>> zen="Flat is better than nested.\nSparse is better than dense.\n"

The splitlines method will remove a trailing newline if it finds one, whereas the split method will split on that trailing newline which would give us an empty line at the end (likely not what we actually want when splitting on lines).

>>> zen.split("\n")['Flat is better than nested.', 'Sparse is better than dense.', '']>>> zen.splitlines()['Flat is better than nested.', 'Sparse is better than dense.']
>>> zen.split("\n")['Flat is better than nested.', 'Sparse is better than dense.', '']>>> zen.splitlines()['Flat is better than nested.', 'Sparse is better than dense.']

Unlike split, the splitlines method can also split lines while maintaning the existing line endings by specifying keepends=True:

>>> zen.splitlines(keepends=True)['Flat is better than nested.\n', 'Sparse is better than dense.\n']
>>> zen.splitlines(keepends=True)['Flat is better than nested.\n', 'Sparse is better than dense.\n']

When splitting strings into lines in Python, I recommend reaching for splitlines instead of split.

format

Python's format method is used for string formatting (a.k.a. string interpolation).

>>>version_message="Version {version} or higher required.">>>print(version_message.format(version="3.10"))Version3.10orhigherrequired
>>>version_message="Version {version} or higher required.">>>print(version_message.format(version="3.10"))Version3.10orhigherrequired

Python's f-strings were an evolution of the format method.

>>> name="Trey">>> print(f"Hello {name}! Welcome to Python.")Hello Trey! Welcome to Python.
>>> name="Trey">>> print(f"Hello {name}! Welcome to Python.")Hello Trey! Welcome to Python.

You might think that the format method doesn't have much use now that f-strings have long been part of Python. But the format method is handy for cases where you'd like to define your template string in one part of your code and use that template string in another part.

For example we might define a string-to-be-formatted at the top of a module and then use that string later on in our module:

BASE_URL="https://api.stackexchange.com/2.3/questions/{ids}?site={site}"# More code herequestion_ids=["33809864","2759323","9321955"]url_for_questions=BASE_URL.format(site="stackoverflow",ids=";".join(question_ids),)
BASE_URL="https://api.stackexchange.com/2.3/questions/{ids}?site={site}"# More code herequestion_ids=["33809864","2759323","9321955"]url_for_questions=BASE_URL.format(site="stackoverflow",ids=";".join(question_ids),)

We've predefined our BASE_URL template string and then later used it to construct a valid URL with the format method.

count

The string count method accepts a substring and returns the number of times that substring occurs within our string:

>>> time="3:32">>> time.count(":")1>>> time="2:17:48">>> time.count(":")2
>>> time="3:32">>> time.count(":")1>>> time="2:17:48">>> time.count(":")2

That's it. The count method is pretty simple.

Note that if you don't care about the actual number but instead care whether the count is greater than 0:

has_underscores=text.count("_")>0
has_underscores=text.count("_")>0

You don't need the count method.

Why? Because Python's in operator is a better way to check whether a string contains a substring:

has_underscores="_"intext
has_underscores="_"intext

This has the added benefit that the in operator will stop as soon as it finds a match, whereas count always needs to iterate through the entire string.

removeprefix

The removeprefix method will remove an optional prefix from the beginning of a string.

>>> hex_string="0xfe34">>> hex_string.removeprefix("0x")'fe34'>>> hex_string="ac6b">>> hex_string.removeprefix("0x")'ac6b'
>>> hex_string="0xfe34">>> hex_string.removeprefix("0x")'fe34'>>> hex_string="ac6b">>> hex_string.removeprefix("0x")'ac6b'

The removeprefix method was added in Python 3.9. Before removeprefix, it was common to check whether a string startswith a prefix and then remove it via slicing:

ifhex_string.startswith("0x"):hex_string=hex_string[len("0x"):]
ifhex_string.startswith("0x"):hex_string=hex_string[len("0x"):]

Now you can just use removeprefix instead:

hex_string=hex_string.removeprefix("0x")
hex_string=hex_string.removeprefix("0x")

The removeprefix method is a bit similar to the lstrip method except that lstrip removes single characters from the end of a string and it removes as many as it finds.

So while this will remove all leading v characters from the beginning of a string:

>>> a="v3.11.0">>> a.lstrip("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.lstrip("v")"3.11.0"
>>> a="v3.11.0">>> a.lstrip("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.lstrip("v")"3.11.0"

This would remove at most onev from the beginning of the string:

>>> a="v3.11.0">>> a.removeprefix("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.removeprefix("v")"vv3.11.0"
>>> a="v3.11.0">>> a.removeprefix("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.removeprefix("v")"vv3.11.0"

removesuffix

The removesuffix method will remove an optional suffix from the end of a string.

>>> time_readings=["0","5 sec","7 sec","1","8 sec"]>>> new_readings=[t.removesuffix(" sec")fortintime_readings]>>> new_readings['0', '5', '7', '1', '8']
>>> time_readings=["0","5 sec","7 sec","1","8 sec"]>>> new_readings=[t.removesuffix(" sec")fortintime_readings]>>> new_readings['0', '5', '7', '1', '8']

It does pretty much the same thing as removeprefix, except it removes from the end instead of removing from the beginning.

Learn these methods later

I wouldn't memorize these string …

Read the full article: https://www.pythonmorsels.com/string-methods/


Viewing all articles
Browse latest Browse all 22851

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>