Python's strings have dozens of methods, but some are much more useful than others. Let's discuss the dozen-ish must-know string methods and why the other methods aren't so essential.
Table of contents
The most useful string methods
Here are the dozen-ish Python string methods I recommend committing to memory.
Method | Related Methods | Description |
---|---|---|
join | Join iterable of strings by a separator | |
split | rsplit | Split (on whitespace by default) into list of strings |
replace | Replace all copies of one substring with another | |
strip | rstrip & lstrip | Remove whitespace from the beginning and end |
casefold | lower & upper | Return a case-normalized version of the string |
startswith | Check if string starts with 1 or more other strings | |
endswith | Check if string ends with 1 or more other strings | |
splitlines | Split into a list of lines | |
format | Format the string (consider an f-string before this) | |
count | Count how many times a given substring occurs | |
removeprefix | Remove the given prefix | |
removesuffix | Remove the given suffix |
You might be wondering "wait why is my favorite method not in that list?" I'll briefly explain the rest of the methods and my thoughts on them below. But first, let's look at each of the above methods.
join
If you need to convert a list to a string in Python, the string join
method is what you're looking for.
>>> colors=["purple","blue","green","orange"]>>> joined_colors=", ".join(colors)>>> joined_colors'purple, blue, green, orange'
>>> colors=["purple","blue","green","orange"]>>> joined_colors=", ".join(colors)>>> joined_colors'purple, blue, green, orange'
The join
method can concatenate a list of strings into a single string, but it will accept any other iterable of strings as well.
>>> digits=range(10)>>> digit_string="".join(str(n)fornindigits)>>> digit_string'0123456789'
>>> digits=range(10)>>> digit_string="".join(str(n)fornindigits)>>> digit_string'0123456789'
split
If you need to break a string into smaller strings based on a separator, you need the string split
method.
>>> time="1:19:48">>> parts=time.split(":")>>> parts['1', '19', '48']
>>> time="1:19:48">>> parts=time.split(":")>>> parts['1', '19', '48']
Your separator can be any substring.
We're splitting by a :
above, but we could also split by ->
:
>>> graph="A->B->C->D">>> graph.split("->")('A', 'B', 'C', 'D')
>>> graph="A->B->C->D">>> graph.split("->")('A', 'B', 'C', 'D')
You usually wouldn't want to call split
with a space character:
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split("")['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split("")['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']
Splitting on the space character works, but often when splitting on spaces it's actually more useful to split on all whitespace.
Calling split
method no arguments will split on any consecutive whitespace characters:
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split()['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']
>>> langston="Does it dry up\nlike a raisin in the sun?\n">>> langston.split()['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']
Note that split
without any arguments also removes leading and trailing whitespace.
There's one more split
feature that folks sometimes overlook: the maxsplit
argument.
When calling split
with a maxsplit
value, Python will split the string up that number of times.
This is handy when you only care about the first one or two occurrences of a separator in a string:
>>> line="Rubber duck|5|10">>> item_name,the_rest=line.split("|",maxsplit=1)>>> item_name'Rubber duck'
>>> line="Rubber duck|5|10">>> item_name,the_rest=line.split("|",maxsplit=1)>>> item_name'Rubber duck'
If it's the last couple occurrences of a separator that you care about, you'll want to use the string rsplit
method instead:
>>> the_rest,amount=line.rsplit("|",maxsplit=1)>>> amount'10'
>>> the_rest,amount=line.rsplit("|",maxsplit=1)>>> amount'10'
With the exception of calling split
method without any arguments, there's no way to ignore repeated separators or trailing/leading separators or to supports multiple separators at once.
If you need any of those features, you'll want to look into regular expressions (specifically the re.split
function).
replace
Need to replace one substring (a string within a string) with another?
That's what the string replace
method is for!
>>> message="JavaScript is lovely">>> message.replace("JavaScript","Python")'Python is lovely'
>>> message="JavaScript is lovely">>> message.replace("JavaScript","Python")'Python is lovely'
The replace
method can also be used for removing substrings, by replacing them with an empty string:
>>> message="Python is lovely!!!!">>> message.replace("!","")'Python is lovely'
>>> message="Python is lovely!!!!">>> message.replace("!","")'Python is lovely'
There's also an optional count
argument, in case you only want to replace the first N
occurrences:
>>> message="Python is lovely!!!!">>> message.replace("!","?",2)'Python is lovely??!!'
>>> message="Python is lovely!!!!">>> message.replace("!","?",2)'Python is lovely??!!'
strip
The strip
method is for removing whitespace from the beginning and end of a string:
>>> text="""... Hello!... This is a multi-line string.... """>>> text'\nHello!\nThis is a multi-line string.\n'>>> stripped_text=text.strip()>>> stripped_text'Hello!\nThis is a multi-line string.'
>>> text="""... Hello!... This is a multi-line string.... """>>> text'\nHello!\nThis is a multi-line string.\n'>>> stripped_text=text.strip()>>> stripped_text'Hello!\nThis is a multi-line string.'
If you just need to remove whitespace from the end of the string (but not the beginning), you can use the rstrip
method:
>>> line=" Indented line with trailing spaces \n">>> line.rstrip()' Indented line with trailing spaces'
>>> line=" Indented line with trailing spaces \n">>> line.rstrip()' Indented line with trailing spaces'
And if you need to strip whitespace from just the beginning, you can use the lstrip
method:
>>> line=" Indented line with trailing spaces \n">>> line.lstrip()'Indented line with trailing spaces \n'
>>> line=" Indented line with trailing spaces \n">>> line.lstrip()'Indented line with trailing spaces \n'
Note that by default strip
, lstrip
, and rstrip
remove all whitespace characters (space, tab, newline, etc.).
You can also specify a specific character to remove instead.
Here we're removing any trailing newline characters but leaving other whitespace intact:
>>> line="Line 1\n">>> line'Line 1\n'>>> line.rstrip("\n")'Line 1'
>>> line="Line 1\n">>> line'Line 1\n'>>> line.rstrip("\n")'Line 1'
Note that strip
, lstrip
, and rstrip
will also accept a string of multiple characters to strip.
>>> words=['I','enjoy','Python!','Do','you?','I','hope','so.']>>> [w.strip(".!?")forwinwords]['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']
>>> words=['I','enjoy','Python!','Do','you?','I','hope','so.']>>> [w.strip(".!?")forwinwords]['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']
Passing multiple characters will strip all of those characters, but they'll be treated as individual characters (not as a substring).
If you need to strip a multi-character substring instead of individual characters, see removesuffix
and removeprefix
below.
casefold
Need to uppercase a string?
There's an upper
method for that:
>>> name="Trey">>> name.upper()'TREY'
>>> name="Trey">>> name.upper()'TREY'
Need to lowercase a string?
There's a lower
method for that:
>>> name="Trey">>> name.lower()'trey'
>>> name="Trey">>> name.lower()'trey'
What if you're trying to do a case-insensitive comparison between strings?
You could lowercase or uppercase all of your strings for the comparison.
Or you could use the string casefold
method:
>>> name="Trey">>> "t"innameFalse>>> "t"inname.casefold()True
>>> name="Trey">>> "t"innameFalse>>> "t"inname.casefold()True
But wait, isn't casefold
just the same thing as lower
?
>>> name="Trey">>> name.casefold()'trey'
>>> name="Trey">>> name.casefold()'trey'
Almost.
If you're working with ASCII characters, casefold
does exactly the same thing as the string lower
method.
But if you have non-ASCII characters (see Unicode character encodings in Python), there are some characters that casefold
handles uniquely.
There are a few hundred characters that normalize differently between the lower
and casefold
methods.
If you're working with text using the International Phonetic alphabet or text written in Greek, Cyrillic, Armenian, Cherokee, and large handful of other languages you should probably use casefold
instead of lower
.
Do keep in mind that casefold
doesn't solve all text normalization issues though.
It's possible to represent the same data in multiple ways in Python, so you'll need to look into Unicode data normalization and Python's unicodedata
module if you think you'll be comparing non-ASCII text often.
startswith
The string startswith
method can check whether one string is a prefix of another string:
>>> property_id="UA-1234567">>> property_id.startswith("UA-")True
>>> property_id="UA-1234567">>> property_id.startswith("UA-")True
The alternative to startswith
is to slice the bigger string and do an equality check:
>>> property_id="UA-1234567">>> prefix="UA-">>> property_id[:len(prefix)]==prefixTrue
>>> property_id="UA-1234567">>> prefix="UA-">>> property_id[:len(prefix)]==prefixTrue
That works, but it's awkward.
You can also quickly check whether one strings starts with many different substrings by passing a tuple
of substrings to startswith
.
Here we're checking whether each string in a list starts with a vowel to determine whether the article "an" or "a" should be used:
>>> names=["Go","Elixir","OCaml","Rust"]>>> fornameinnames:... ifname.startswith(("A","E","I","O","U")):... print(f"An {name} program")... else:... print(f"A {name} program")...A Go programAn Elixir programAn OCaml programA Rust program
>>> names=["Go","Elixir","OCaml","Rust"]>>> fornameinnames:... ifname.startswith(("A","E","I","O","U")):... print(f"An {name} program")... else:... print(f"A {name} program")...A Go programAn Elixir programAn OCaml programA Rust program
Note that startswith
returns True
if any if the string starts with any of the given substrings.
Many long-time Python programmers often overlook the fact that startswith
will accept either a single string or a tuple of strings.
endswith
The endswith
method can check whether one string is a suffix of another string.
The string endswith
method works pretty much like the startswith
method.
It works with a single string:
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith(".min.js")True
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith(".min.js")True
But it also accepts a tuple of strings:
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith((".min.js",".min.css"))True
>>> filename="3c9a9fd05f404aefa92817650be58036.min.js">>> filename.endswith((".min.js",".min.css"))True
Just as with startswith
, when endswith
is given a tuple, it returns True
if our string ends with any of the strings in that tuple.
splitlines
The splitlines
method is specifically for splitting up strings into lines.
>>> text="I'm Nobody! Who are you?\nAre you – Nobody – too?">>> text.splitlines()["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
>>> text="I'm Nobody! Who are you?\nAre you – Nobody – too?">>> text.splitlines()["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
Why make a separate method just for splitting into lines?
Couldn't we just use the split
method with \n
instead?
>>> text.split("\n")["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
>>> text.split("\n")["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
While that does work in some cases, sometimes newlines are represented by \r\n
or simply \r
instead of \n
.
If you don't know exactly what line endings your text uses, splitlines
can be handy.
>>> text="Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?">>> text.split("\n")['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']>>> text.splitlines()['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']
>>> text="Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?">>> text.split("\n")['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']>>> text.splitlines()['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']
But there's an even more useful reason to use splitlines
: it's quite common for text to end in a trailing newline character.
>>> zen="Flat is better than nested.\nSparse is better than dense.\n"
>>> zen="Flat is better than nested.\nSparse is better than dense.\n"
The splitlines
method will remove a trailing newline if it finds one, whereas the split
method will split on that trailing newline which would give us an empty line at the end (likely not what we actually want when splitting on lines).
>>> zen.split("\n")['Flat is better than nested.', 'Sparse is better than dense.', '']>>> zen.splitlines()['Flat is better than nested.', 'Sparse is better than dense.']
>>> zen.split("\n")['Flat is better than nested.', 'Sparse is better than dense.', '']>>> zen.splitlines()['Flat is better than nested.', 'Sparse is better than dense.']
Unlike split
, the splitlines
method can also split lines while maintaning the existing line endings by specifying keepends=True
:
>>> zen.splitlines(keepends=True)['Flat is better than nested.\n', 'Sparse is better than dense.\n']
>>> zen.splitlines(keepends=True)['Flat is better than nested.\n', 'Sparse is better than dense.\n']
When splitting strings into lines in Python, I recommend reaching for splitlines
instead of split
.
format
Python's format
method is used for string formatting (a.k.a. string interpolation).
>>>version_message="Version {version} or higher required.">>>print(version_message.format(version="3.10"))Version3.10orhigherrequired
>>>version_message="Version {version} or higher required.">>>print(version_message.format(version="3.10"))Version3.10orhigherrequired
Python's f-strings were an evolution of the format
method.
>>> name="Trey">>> print(f"Hello {name}! Welcome to Python.")Hello Trey! Welcome to Python.
>>> name="Trey">>> print(f"Hello {name}! Welcome to Python.")Hello Trey! Welcome to Python.
You might think that the format
method doesn't have much use now that f-strings have long been part of Python.
But the format
method is handy for cases where you'd like to define your template string in one part of your code and use that template string in another part.
For example we might define a string-to-be-formatted at the top of a module and then use that string later on in our module:
BASE_URL="https://api.stackexchange.com/2.3/questions/{ids}?site={site}"# More code herequestion_ids=["33809864","2759323","9321955"]url_for_questions=BASE_URL.format(site="stackoverflow",ids=";".join(question_ids),)
BASE_URL="https://api.stackexchange.com/2.3/questions/{ids}?site={site}"# More code herequestion_ids=["33809864","2759323","9321955"]url_for_questions=BASE_URL.format(site="stackoverflow",ids=";".join(question_ids),)
We've predefined our BASE_URL
template string and then later used it to construct a valid URL with the format
method.
count
The string count
method accepts a substring and returns the number of times that substring occurs within our string:
>>> time="3:32">>> time.count(":")1>>> time="2:17:48">>> time.count(":")2
>>> time="3:32">>> time.count(":")1>>> time="2:17:48">>> time.count(":")2
That's it. The count
method is pretty simple.
Note that if you don't care about the actual number but instead care whether the count is greater than 0
:
has_underscores=text.count("_")>0
has_underscores=text.count("_")>0
You don't need the count
method.
Why?
Because Python's in
operator is a better way to check whether a string contains a substring:
has_underscores="_"intext
has_underscores="_"intext
This has the added benefit that the in
operator will stop as soon as it finds a match, whereas count
always needs to iterate through the entire string.
removeprefix
The removeprefix
method will remove an optional prefix from the beginning of a string.
>>> hex_string="0xfe34">>> hex_string.removeprefix("0x")'fe34'>>> hex_string="ac6b">>> hex_string.removeprefix("0x")'ac6b'
>>> hex_string="0xfe34">>> hex_string.removeprefix("0x")'fe34'>>> hex_string="ac6b">>> hex_string.removeprefix("0x")'ac6b'
The removeprefix
method was added in Python 3.9.
Before removeprefix
, it was common to check whether a string startswith
a prefix and then remove it via slicing:
ifhex_string.startswith("0x"):hex_string=hex_string[len("0x"):]
ifhex_string.startswith("0x"):hex_string=hex_string[len("0x"):]
Now you can just use removeprefix
instead:
hex_string=hex_string.removeprefix("0x")
hex_string=hex_string.removeprefix("0x")
The removeprefix
method is a bit similar to the lstrip
method except that lstrip
removes single characters from the end of a string and it removes as many as it finds.
So while this will remove all leading v
characters from the beginning of a string:
>>> a="v3.11.0">>> a.lstrip("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.lstrip("v")"3.11.0"
>>> a="v3.11.0">>> a.lstrip("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.lstrip("v")"3.11.0"
This would remove at most onev
from the beginning of the string:
>>> a="v3.11.0">>> a.removeprefix("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.removeprefix("v")"vv3.11.0"
>>> a="v3.11.0">>> a.removeprefix("v")"3.11.0">>> b="3.11.0">>> b.lstrip("v")"3.11.0">>> c="vvv3.11.0">>> c.removeprefix("v")"vv3.11.0"
removesuffix
The removesuffix
method will remove an optional suffix from the end of a string.
>>> time_readings=["0","5 sec","7 sec","1","8 sec"]>>> new_readings=[t.removesuffix(" sec")fortintime_readings]>>> new_readings['0', '5', '7', '1', '8']
>>> time_readings=["0","5 sec","7 sec","1","8 sec"]>>> new_readings=[t.removesuffix(" sec")fortintime_readings]>>> new_readings['0', '5', '7', '1', '8']
It does pretty much the same thing as removeprefix
, except it removes from the end instead of removing from the beginning.
Learn these methods later
I wouldn't memorize these string …