How to match alphabets in Python?

How to match alphabets in Python?

In this article, Iam going to explain you about how you will create  a program in python, by which you can match alphabets and show a palindrome program.Both things are explain in this program.

And we will show you full explanation, of the re in python.

Introduction

Text preprocessing is important task in Natural Language Processing (NLP). You may want to remove all punctuation marks from text documents before they can be used for text classification. Similarly, you may want to extract numbers from a text string. Writing manual scripts for such preprocessing tasks requires a lot of effort and is prone to errors. Keeping in view the importance of these preprocessing tasks, the Regular Expressions (re) have been developed in different languages in order to ease these text preprocessing tasks.
A Regular Expression is a text string that describes a search pattern which can be used to match or replace patterns inside a string with a minimal amount of code. In this tutorial, we will implement different types of regular expressions in the Python language.
To implement regular expressions, the Python's re package can be used. Import the Python's re package with the following command:
import  re

Program :-




A Practical Guide to Regular Expressions (RegEx) In JavaScript



print('---------------------CASE-1-----------------------\n')



 
str01 = 'abut-1-tuba'
punct = '''-1@'''
no_punct = ""
str02 = str01[::-1]
 
for char in str01:
   if char not in punct:
no_punct = no_punct + char
 


print('String Given: ',str01)
print('string after remove bad characters : ',no_punct)
 
if str01 == str02:
    print('\nYES, it is a palindrome.')
else:
    print('\nNO,it is not a palindrome.')
 
 
print('-------------------------CASE-2-------------------\n')
 
str01 = '@allula'
punct = '''-1@'''
no_punct = ""
str02 = str01[::-1]
 
for char in str01:
   if char not in punct:
no_punct = no_punct + char
 
print('String Given: ',str01)                   
print('string after remove bad characters :',no_punct)
 
if str01 == str02:
    print('\nYES, it is a palindrome.')
else:
    print('\nNO, it is not a palindrome.')

Searching Patterns in a String

One of the most common NLP tasks is to search if a string contains a certain pattern or not. For instance, you may want to perform an operation on the string based on the condition that the string contains a number.
To search a pattern within a string, the match and findall function of there package is used.

The match Function

Initialize a variable text with a text string as follows:
text = "text"
Let's write a regex expression that matches a string of any length and any character:
result = re.match(r".*", text)
The first parameter of the match function is the regex expression that you want to search. Regex expression starts with the alphabet r followed by the pattern that you want to search. The pattern should be enclosed in single or double quotes like any other string.
The above regex expression will match the text string, since we are trying to match a string of any length and any character. If a match is found, the match function returns sre.SRE  object as shown below:
type(result)
Output:
sre.SRE
Now to find the matched string, you can use the following command:
result.group(0)
Output:
'text'
In case if no match is found by the match function, a null object is returned.
Now the previous regex expression matches a string with any length and any character. It will also match an empty string of length zero. To test this, update the value of text variable with an empty string:
text = ""
Now, if you again execute the following regex expression, a match will be found:
result = re.match(r".*", text)
Since we specified to match the string with any length and any character, even an empty string is being matched.
To match a string with a length of at least 1, the following regex expression is used:
result = re.match(r".+", text)
Here the plus sign specifies that the string should have at least one character.

Searching Alphabets

Shape Context descriptor and fast characters recognition

The match function can be used to find any alphabet letters within a string. Let's initialize the text variable with the following text:
text = "your text"
Now to find all the alphabet letter, both uppercase and lowercase, we can use the following regex expression:
result = re.match(r"[a-zA-z]+", text)
This regex expression states that match the text string for any alphabets from small a to small z or capital A to capital Z. The plus sign specifies that string should have at least one character. Let's print the match found by the above expression:
print(result.group(0))
Output:
your text
In the output, you can see that the first word i.e. The is returned. This is because the match function only returns the first match found. In the regex we specified that find the patterns with both small and capital alphabets from a to z. The first match found was The. After the wordThe there is a space, which is not treated as an alphabet letter, therefore the matching stopped and the expression returned just The, which is the first match.
However, there is a problem with this. If a string starts with a number instead of an alphabet, the match function will return null even if there are alphabets after the number. Let's see this in action:
text = "text"
result = re.match(r"[a-zA-z]+", text)
type(result)
Output:
NoneType
In the above script, we have updated the text variable and now it starts with a digit. We then used the match function to search for alphabets in the string. Though the text string contains alphabets, null will be returned since match function only matches the first element in the string.
To solve this problem we can use the search function.

The search Function

The search function is similar to the match function i.e. it tries to match the specified pattern. However, unlike the match function, it matches the pattern globally instead of matching only the first element. Therefore, the search function will return a match even if the string doesn't contain an alphabet at the start of the string but contains an alphabet elsewhere in the string, as shown below:
text = "text"
result = re.search(r"[a-zA-z]+", text)
print(result.group(0))
Output:
text
The search function returns "was" since this is the first match that is found in the text string.

Matching String from the Start

To check if a string starts with a specific word, you can use the carrot key i.e. ^ followed by the word to match with the search function as shown below. Suppose we have the following string:
text = "text"
If we want to find out whether the string starts with "text", we can use the search function as follows:
result = re.search(r"^text", text)
type(result)
In the output, null will be returned since the text string doesn't contain "text" directly at the start.
Now let's change the content text variable and add "1998" at the beginning and then check if "1998" is found at the beginning or not. Execute the following script:
text = "new data"
if re.search(r"^new", text):
    print("Match found")
else:
    print("Match not found")
Output:
Match found

Matching Strings from the End

To check whether a string ends with a specific word or not, we can use the word in the regular expression, followed by the dollar sign. The dollar sign marks the end of the statement. Take a look at the following example:
text = "new data"
ifre.search(r"data$", text):
    print("Match found")
else:
    print("Match not found")
In the above script, we tried to find if the text string ends with "1998", which is not the case.
Output:
Match not found
Now if we update the string and add "1998" at the end of the text string, the above script will return ‘Match found' as shown below:
text = "In 2000"
if re.search(r"2000$", text):
    print("Match found")
else:
    print("Match not found")
Output:
Match found

Substituting text in a String

Till now we have been using regex to find if a pattern exists in a string. Let's move forward with another advanced regex function i.e. substituting text in a string. The sub function is used for this purpose.
Let's take a simple example of the substitute function. Suppose we have the following string:
text = "In 2010"
To replace the string "Pulp Fiction" with "Forrest Gump" (another movie released in 1994) we can use the subfunction as follows:
result = re.sub(r"Pulp Fiction", "Forrest Gump", text)
The first parameter to the sub function is the regular expression that finds the pattern to substitute. The second parameter is the new text that you want as a replacement for the old text and the third parameter is the text string on which the substitute operation will be performed.
If you print the result variable, you will see the new string.
Now let's substitute all the alphabets in our string with character "X". Execute the following script:
text = "year 2080"
result = re.sub(r"[a-z]", "X", text)
print(result)
Output:
yXXX 1994
It can be seen from the output that all the characters have been replaced except the capital ones. This is because we specified a-z only and not A-Z. There are two ways to solve this problem. You can either specifyA-Z in the regular expression along with a-z as follows:
result = re.sub(r"[a-zA-Z]", "X", text)
Or you can pass the additional parameter flags to the sub function and set its value to re.I which refers to case insensitive, as follows:
result = re.sub(r"[a-z]", "X", text, flags=re.I)
More details about different types of flags can be found at Python regex.

Patterns


A Binary Numbers Tutorial with 1 and 0 |

You can group multiple patterns to match or substitute in a string using the square bracket. In fact, we did this when we matched capital and small letters. Let's group multiple punctuation marks and remove them from a string:

text = "'@new' was ? one _ in % $ year 2000."
result = re.sub(r"[,@\'?\.$%_]", "", text, flags=re.I)
print(result)
Output:
new was  one  in   year 2000
You can see that the string in the text variable had multiple punctuation marks, we grouped all these punctuations in the regex expression using square brackets. It is important to mention that with a dot and a single quote we have to use the escape sequence i.e. backward slash. This is because by default the dot operator is used for any character and the single quote is used to denote a string.


Reactions

Post a Comment

0 Comments