A substring is a contiguous sequence of one or more characters in a string. In this article, we will discuss different ways to find all occurrences of a substring in a string in python.
- Find All Occurrences of a Substring in a String Using For Loop in Python
- Find All Occurrences of a Substring in a String Using While Loop in Python
- Find All Occurrences of a Substring in a String Using the find() Method in Python
- Substring in a String Using the startswith() Method in Python
- Find All Occurrences of a Substring in a String Using Regular Expressions in Python
- All Occurrences of a Substring in a String Using Using flashtext module in Python
- Conclusion
Find All Occurrences of a Substring in a String Using For Loop in Python
With the help of a for loop, we can iterate through the characters in a string. To find all the occurrences of a substring in a string in python using a for loop, we will use the following steps.
- First, we will find the length of the input string and store it in the variable
str_len
. - Next, we will find the length of the substring and store it in the variable
sub_len
. - We will also create a list named
sub_indices
to store the starting index of the occurrences of the substring. - After this, we will iterate through the input string using a for loop.
- During iteration, we will check if the substring of length
sub_len
starting from the current index equals the input substring or not. - If yes, we will store the current index in the
sub_indices
list using theappend()
method. Theappend()
method, when invoked onsub_indices
, takes the current index as its input argument and appends it tosub_indices
.
After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices
list. You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for i in range(str_len - sub_len):
if myStr[i:i + sub_len] == substring:
sub_indices.append(i)
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
Here, you can observe that we have obtained the starting indices of the substring python in the entire string.
Find All Occurrences of a Substring in a String Using While Loop in Python
Iterating each character of the string using a for loop is costly in terms of time. It also gives the index of overlapping substrings in the output. To reduce the execution time and find the occurrences of non-overlapping substrings of a string in python, we can use a while loop. For this, we will use the following steps.
- First, we will find the length of the input string and store it in the variable
str_len
. - Next, we will find the length of the substring and store it in the variable
sub_len
. - We will also create an empty list named
sub_indices
to store the starting index of the occurrences of the substring and a variabletemp
initialized to 0. - After this, we will iterate through the input string using a while loop.
- During iteration, we will check if the substring of length
sub_len
starting from the indextemp
equals the input substring or not. - If yes, we will store
temp
in thesub_indices
list using theappend()
method. Then, we will increment temp bysub_len
. - If we don’t find the required substring at the index
temp
, we will incrementtemp
by 1. - After execution of the while loop, we will get the starting index of all the occurrences of the input substring in the
sub_indices
list.
You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
if myStr[temp:temp + sub_len] == substring:
sub_indices.append(temp)
temp = temp + sub_len
else:
temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
In this example, we will only get the occurrences of all the non-overlapping substrings of the input string.
Find All Occurrences of a Substring in a String Using the find() Method in Python
The find()
method is used to find the first occurrence of any substring in a string in python. The syntax of the find()
method is as follows.
myStr.find(sub_string, start, end)
Here,
myStr
is the input string in which we have to find the location of thesub_string
.- The
start
andend
parameters are optional. They accept the starting index and end index of the string between which we have to search for thesub_string
.
When we invoke the find()
method on a string, it takes a substring as its input argument. After execution, it returns the start index of the substring if it is found. Otherwise, it returns -1. You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
print(myStr.find("python"))
print(myStr.find("Aditya"))
Output:
5
-1
Here, you can see that the find()
method returns the start index of the substring python
. On the other hand, it returns -1 for the substring aditya
as it is not present in myStr
.
To find all occurrences of a substring in a string using the find()
method in python, we will use the following steps.
- First, we will find the length of the input string and store it in the variable
str_len
. - Next, we will find the length of the substring and store it in the variable
sub_len
. We will also create a list namedsub_indices
to store the starting index of the occurrences of the substring. - After this, we will iterate through the input string using a for loop.
- During iteration, we will invoke the
find()
method on the input string with the substring as its first input argument, the current index as its second input argument, and the current index+sub_len
as its third input argument. Basically, we are checking if the current substring from index to index+sub_len
is the string we are searching for or not. - If the
find()
method returns a value other than -1, we will append it tosub_indices
. This is due to the reason that the find() method returns the start index of a substring if it is found in the string. Then, we will move to the next execution of the for loop. - If the
find()
method returns -1, we will move to the next execution of the for loop.
After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices
list. You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for temp in range(str_len-sub_len):
index = myStr.find(substring, temp, temp + sub_len)
if index != -1:
sub_indices.append(index)
else:
continue
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
Again, the above approach gives indices of overlapping sequences in the output. To find the occurrences of non-overlapping substrings of a string in python, we can use a while loop and the find()
method as shown below.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
index = myStr.find(substring, temp, temp + sub_len)
if index != -1:
sub_indices.append(index)
temp = temp + sub_len
else:
temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
Substring in a String Using the startswith() Method in Python
The startswith()
method is used to find if a string starts with a certain substring or not in python. The syntax for the startswith()
method is as follows.
myStr.startswith(sub_string, start, end)
Here,
myStr
is the input string that we have to check if it starts with thesub_string
.- The
start
andend
parameters are optional. They accept the starting index and end index of the string between which we have to check if the string starts withsub_string
at the indexstart
.
When we invoke the startswith()
method on a string, it takes a substring as its input argument. After execution, it returns True
if the string starts with the substring. Otherwise, it returns False. You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
print(myStr.startswith("python"))
print(myStr.startswith("I am"))
OUtput:
False
True
Here, you can observe that the startswith()
method returns
False for the substring python
. On the other hand, it returns True
for the substring I am
. This is due to the reason that myStr
starts with I am
and not with python
.
To find all the occurrences of a substring in a string in python using the startswith()
method, we will use the following steps.
- First, we will find the length of the input string and store it in the variable
str_len
. - Next, we will find the length of the substring and store it in the variable
sub_len
. We will also create a list namedsub_indices
to store the starting index of the occurrences of the substring. - After this, we will iterate through the input string using a for loop.
- During iteration, we will invoke the
startswith()
method on the input string with the substring as its first input argument and the current index as its second input argument. - If the
startswith()
method returnsFalse
, It means that the substring doesn’t start at the current index. Hence, we will move to the next execution of the for loop. - If the
startswith()
method returnsTrue
, it means that the substring starts at the current index. Hence, we will append the current index tosub_indices
. After this, we will move to the next iteration of the for loop.
After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices
list. You can observe this in the following example.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for temp in range(str_len-sub_len):
index = myStr.startswith(substring, temp)
if index:
sub_indices.append(temp)
else:
continue
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
The approaches using the for loop give the indices of overlapping occurrences of a substring in a string. To find the indices of non-overlapping indices of a substring in a string in python, you can use the startswith()
method and the while loop as shown below.
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
index = myStr.startswith(substring, temp)
if index:
sub_indices.append(temp)
temp = temp + sub_len
else:
temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
Find All Occurrences of a Substring in a String Using Regular Expressions in Python
Regular expressions provide us with one of the most efficient ways to manipulate text data in Python. We can also find all occurrences of a substring in a string in python using the finditer()
method provided in the re
module. The syntax of the finditer()
method is as follows.
re.finditer(sub_string, input_string)
Here, the input_string
is the string in which we have to search the occurrences of sub_string
.
The finditer()
method takes the substring as its first input argument and the original string as its second argument. After execution, it returns an iterator containing the match objects for the substring. The match objects contain information about the start and end indices of the substring. We can obtain the start and end indices of the match object by invoking the start()
method and the end()
method on the match object.
To find all occurrences of a substring in a string in python using the finditer()
method, we will use the following steps.
- First, we will create a list named
sub_indices
to store the starting index of the occurrences of the substring. - After that, we will obtain the iterator containing the match objects for the substring.
- Once we get the iterator, we will use a for loop to iterate through the match objects.
- While iteration, we will invoke the
start()
method on the current match object. It will return the start index of the substring in the original string. We will append the index tosub_indices
.
After execution of the for loop, we will get the starting index of all the occurrences of the given substring in the input string. You can observe this in the following example.
import re
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
sub_indices = []
match_objects = re.finditer(substring, myStr)
for temp in match_objects:
index = temp.start()
sub_indices.append(index)
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
Instead of using the for loop, you can also use list comprehension to find all occurrences of a substring in a string in python as shown below.
import re
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
match_objects = re.finditer(substring, myStr)
sub_indices = [temp.start() for temp in match_objects]
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]
In the above example, we have first obtained the match objects using the finditer()
method. After that, we used list comprehension and the start()
method to find the starting indices of the substring in myStr
.
All Occurrences of a Substring in a String Using Using flashtext module in Python
Instead of using all the above-discussed approaches, you can use the flashtext
module to find all occurrences of a substring in a string in Python. You can install the flashtext
module using PIP using the following statement.
pip3 install flashtext
To find all the occurrences of a substring in a string using the flashtext module, we will use the following steps.
- First, we will create a keyword processor object using the
KeywordProcessor()
function. - After creating the keyword processor, we will add the substring to the keyword processor object using the
add_keyword()
method. Theadd_keyword()
method, when invoked on the keyword processor object, will take the substring as its input argument. - Then, we will invoke the
extract_keywords()
method on the keyword processor object. It returns a list of tuples. Each tuple contains the substring as its first element, the start index of the substring as its second element, and the end index as its third element. - Finally, we will create an empty list named
sub_indices
and extract the starting indices of the substring from the list of tuples using a for loop.
After execution of the for loop, we will get the starting index of all the occurrences of the given substring in the input string in sub_indices
. You can observe this in the following example.
from flashtext import KeywordProcessor
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
sub_indices = []
kwp = KeywordProcessor()
kwp.add_keyword(substring)
result_list = kwp.extract_keywords(myStr,span_info=True)
for tuples in result_list:
sub_indices.append(tuples[1])
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[40, 74]
Instead of using the for loop in the last step, you can also use list comprehension to find all occurrences of a substring in a string in python as shown below.
from flashtext import KeywordProcessor
myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
kwp = KeywordProcessor()
kwp.add_keyword(substring)
result_list = kwp.extract_keywords(myStr, span_info=True)
sub_indices = [tuples[1] for tuples in result_list]
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))
Output:
The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[40, 74]
In this approach, you can observe that the keyword process only extracts two instances of the substring. This is due to the reason that the keyword processor searches for an entire word. If the substring is not present as an entire word, it won’t be included in the results.
Conclusion
In this article, we have discussed different ways to find all occurrences of a substring in a string in Python. Out of all these methods, I would suggest you use the approach using regular expressions with list comprehension. It will give you results in the quickest possible time because it is the most efficient approach.
To learn more about python programming, you can read this article on how to remove all occurrences of a character in a list in Python. You might also like this article on how to check if a python string contains a number.
Stay tuned for more informative articles.
Happy Learning!
The post Find All Occurrences of a Substring in a String in Python appeared first on PythonForBeginners.com.