Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22464

Python for Beginners: Find All Occurrences of a Substring in a String in Python

$
0
0

A substring is a contiguous sequence of one or more characters in a string. In this article, we will discuss different ways to find all occurrences of a substring in a string in python. 

Find All Occurrences of a Substring in a String Using For Loop in Python

With the help of a for loop, we can iterate through the characters in a string. To find all the occurrences of a substring in a string in python using a for loop, we will use the following steps.

  • First, we will find the length of the input string and store it in the variable str_len.
  • Next, we will find the length of the substring and store it in the variable sub_len.
  • We will also create a list named sub_indices to store the starting index of the occurrences of the substring.
  • After this, we will iterate through the input string using a for loop. 
  • During iteration, we will check if the substring of length sub_len starting from the current index equals the input substring or not.
  • If yes, we will store the current index in the sub_indices list using the append() method. The append() method, when invoked on sub_indices, takes the current index as its input argument and appends it to sub_indices.

After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices list. You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for i in range(str_len - sub_len):
    if myStr[i:i + sub_len] == substring:
        sub_indices.append(i)
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

Here, you can observe that we have obtained the starting indices of the substring python in the entire string.

Find All Occurrences of a Substring in a String Using While Loop in Python

Iterating each character of the string using a for loop is costly in terms of time. It also gives the index of overlapping substrings in the output. To reduce the execution time and find the occurrences of non-overlapping substrings of a string in python, we can use a while loop. For this, we will use the following steps.

  • First, we will find the length of the input string and store it in the variable str_len.
  • Next, we will find the length of the substring and store it in the variable sub_len.
  • We will also create an empty list named sub_indices to store the starting index of the occurrences of the substring and a variable temp initialized to 0.
  • After this, we will iterate through the input string using a while loop. 
  • During iteration, we will check if the substring of length sub_len starting from the index temp equals the input substring or not.
  • If yes, we will store temp in the sub_indices list using the append() method. Then, we will increment temp by sub_len.
  • If we don’t find the required substring at the index temp, we will increment temp by 1.  
  • After execution of the while loop, we will get the starting index of all the occurrences of the input substring in the sub_indices list.

You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
    if myStr[temp:temp + sub_len] == substring:
        sub_indices.append(temp)
        temp = temp + sub_len
    else:
        temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

In this example, we will only get the occurrences of all the non-overlapping substrings of the input string.

Find All Occurrences of a Substring in a String Using the find()  Method in Python

The find() method is used to find the first occurrence of any substring in a string in python. The syntax of the find() method is as follows.

myStr.find(sub_string, start, end)

Here, 

  • myStr is the input string in which we have to find the location of the sub_string.
  • The start and end parameters are optional. They accept the starting index and end index of the string between which we have to search for the sub_string.

When we invoke the find() method on a string, it takes a substring as its input argument. After execution, it returns the start index of the substring if it is found. Otherwise, it returns -1. You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
print(myStr.find("python"))
print(myStr.find("Aditya"))

Output:

5
-1

Here, you can see that the find() method returns the start index of the substring python. On the other hand, it returns -1 for the substring aditya as it is not present in myStr.

To find all occurrences of a substring in a string using the find() method in python, we will use the following steps. 

  • First, we will find the length of the input string and store it in the variable str_len.
  • Next, we will find the length of the substring and store it in the variable sub_len. We will also create a list named sub_indices to store the starting index of the occurrences of the substring.
  • After this, we will iterate through the input string using a for  loop. 
  • During iteration, we will invoke the find() method on the input string with the substring as its first input argument, the current index as its second input argument, and the current index+sub_len as its third input argument. Basically, we are checking if the current substring from index to index+sub_len is the string we are searching for or not.
  • If the find() method returns a value other than -1, we will append it to sub_indices. This is due to the reason that the find() method returns the start index of a substring if it is found in the string. Then, we will move to the next execution of the for loop.
  • If the find() method returns -1, we will move to the next execution of the for loop.

After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices list. You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for temp in range(str_len-sub_len):
    index = myStr.find(substring, temp, temp + sub_len)
    if index != -1:
        sub_indices.append(index)
    else:
        continue
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

Again, the above approach gives indices of overlapping sequences in the output. To find the occurrences of non-overlapping substrings of a string in python, we can use a while loop and the find() method as shown below. 

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
    index = myStr.find(substring, temp, temp + sub_len)
    if index != -1:
        sub_indices.append(index)
        temp = temp + sub_len
    else:
        temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

Substring in a String Using the startswith()  Method in Python

The startswith() method is used to find if a string starts with a certain substring or not in python. The syntax for the startswith() method is as follows.

myStr.startswith(sub_string, start, end)

Here, 

  • myStr is the input string that we have to check if it starts with the sub_string.
  • The start and end parameters are optional. They accept the starting index and end index of the string between which we have to check if the string starts with sub_string at the index start.

When we invoke the startswith() method on a string, it takes a substring as its input argument. After execution, it returns True if the string starts with the substring. Otherwise, it returns False. You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
print(myStr.startswith("python"))
print(myStr.startswith("I am"))

OUtput:

False
True

Here, you can observe that the startswith() method returns False for the substring python. On the other hand, it returns True for the substring I am. This is due to the reason that myStr starts with I am and not with python.

To find all the occurrences of a substring in a string in python using the startswith() method, we will use the following steps.

  • First, we will find the length of the input string and store it in the variable str_len.
  • Next, we will find the length of the substring and store it in the variable sub_len. We will also create a list named sub_indices to store the starting index of the occurrences of the substring.
  • After this, we will iterate through the input string using a for  loop. 
  • During iteration, we will invoke the startswith() method on the input string with the substring as its first input argument and the current index as its second input argument. 
  • If the startswith() method returns False, It means that the substring doesn’t start at the current index. Hence, we will move to the next execution of the for loop. 
  • If the startswith() method returns True, it means that the substring starts at the current index. Hence, we will append the current index to sub_indices. After this, we will move to the next iteration of the for loop.

After execution of the for loop, we will get the starting index of all the occurrences of the input substring in the sub_indices list. You can observe this in the following example.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
for temp in range(str_len-sub_len):
    index = myStr.startswith(substring, temp)
    if index:
        sub_indices.append(temp)
    else:
        continue
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

The approaches using the for loop give the indices of overlapping occurrences of a substring in a string. To find the indices of non-overlapping indices of a substring in a string in python, you can use the startswith() method and the while loop as shown below.

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
str_len = len(myStr)
sub_len = len(substring)
sub_indices = []
temp = 0
while temp <= str_len - sub_len:
    index = myStr.startswith(substring, temp)
    if index:
        sub_indices.append(temp)
        temp = temp + sub_len
    else:
        temp = temp + 1
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

Find All Occurrences of a Substring in a String Using Regular Expressions in Python

Regular expressions provide us with one of the most efficient ways to manipulate text data in Python. We can also find all occurrences of a substring in a string in python using the finditer() method provided in the re module. The syntax of the finditer() method is as follows.

re.finditer(sub_string, input_string)

Here, the input_string is the string in which we have to search the occurrences of sub_string.

The finditer() method takes the substring as its first input argument and the original string as its second argument. After execution, it returns an iterator containing the match objects for the substring. The match objects contain information about the start and end indices of the substring. We can obtain the start and end indices of the match object by invoking the start() method and the end() method on the match object.

To find all occurrences of a substring in a string in python using the finditer() method, we will use the following steps.

  • First, we will create a list named sub_indices to store the starting index of the occurrences of the substring.
  • After that, we will obtain the iterator containing the match objects for the substring. 
  • Once we get the iterator, we will use a for loop to iterate through the match objects. 
  • While iteration, we will invoke the start() method on the current match object. It will return the start index of the substring in the original string. We will append the index to sub_indices.

After execution of the for loop, we will get the starting index of all the occurrences of the given substring in the input string. You can observe this in the following example.

import re

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
sub_indices = []
match_objects = re.finditer(substring, myStr)
for temp in match_objects:
    index = temp.start()
    sub_indices.append(index)
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

Instead of using the for loop, you can also use list comprehension to find all occurrences of a substring in a string in python as shown below.

import re

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
match_objects = re.finditer(substring, myStr)
sub_indices = [temp.start() for temp in match_objects]
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[5, 40, 74]

In the above example, we have first obtained the match objects using the finditer() method. After that, we used list comprehension and the start() method to find the starting indices of the substring in myStr.

All Occurrences of a Substring in a String Using Using flashtext module in Python

Instead of using all the above-discussed approaches, you can use the flashtext module to find all occurrences of a substring in a string in Python. You can install the flashtext module using PIP using the following statement.

pip3 install flashtext

To find all the occurrences of a substring in a string using the flashtext module, we will use the following steps.

  • First, we will create a keyword processor object using the KeywordProcessor() function.
  • After creating the keyword processor, we will add the substring to the keyword processor object using the add_keyword() method. The add_keyword() method, when invoked on the keyword processor object, will take the substring as its input argument. 
  • Then, we will invoke the extract_keywords() method on the keyword processor object. It returns a list of tuples. Each tuple contains the substring as its first element, the start index of the substring as its second element, and the end index as its third element.
  • Finally, we will create an empty list named sub_indices and extract the starting indices of the substring from the list of tuples using a for loop.

After execution of the for loop, we will get the starting index of all the occurrences of the given substring in the input string in sub_indices. You can observe this in the following example.

from flashtext import KeywordProcessor

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
sub_indices = []
kwp = KeywordProcessor()
kwp.add_keyword(substring)
result_list = kwp.extract_keywords(myStr,span_info=True)
for tuples in result_list:
    sub_indices.append(tuples[1])
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[40, 74]

Instead of using the for loop in the last step, you can also use list comprehension to find all occurrences of a substring in a string in python as shown below.

from flashtext import KeywordProcessor

myStr = "I am pythonforbeginners. I provide free python tutorials for you to learn python."
substring = "python"
kwp = KeywordProcessor()
kwp.add_keyword(substring)
result_list = kwp.extract_keywords(myStr, span_info=True)
sub_indices = [tuples[1] for tuples in result_list]
print("The string is:", myStr)
print("The substring is:", substring)
print("The starting indices of the occurrences of {} in the string are:{}".format(substring, sub_indices))

Output:

The string is: I am pythonforbeginners. I provide free python tutorials for you to learn python.
The substring is: python
The starting indices of the occurrences of python in the string are:[40, 74]

In this approach, you can observe that the keyword process only extracts two instances of the substring. This is due to the reason that the keyword processor searches for an entire word. If the substring is not present as an entire word, it won’t be included in the results.

Conclusion

In this article, we have discussed different ways to find all occurrences of a substring in a string in Python. Out of all these methods, I would suggest you use the approach using regular expressions with list comprehension. It will give you results in the quickest possible time because it is the most efficient approach.

To learn more about python programming, you can read this article on how to remove all occurrences of a character in a list in Python. You might also like this article on how to check if a python string contains a number.

Stay tuned for more informative articles.

Happy Learning!

The post Find All Occurrences of a Substring in a String in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22464

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>