Created
December 27, 2022 09:10
-
-
Save kshirsagarsiddharth/8c443217be7bec28dae1e8546a9078e8 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import string | |
def clean_text(text): | |
# Create a translation table to remove punctuation and special characters we are replacing space. | |
translator = str.maketrans('', '', string.punctuation + string.printable.replace(' ','')[62:]) | |
# Use the translate method to remove the characters | |
clean_text = text.translate(translator) | |
# Remove leading and trailing whitespace | |
clean_text = clean_text.strip() | |
return clean_text | |
# Test the function | |
text = "This is a sample text with punctuation (like commas and exclamation points)! It also includes letters (both uppercase and lowercase), numbers (like 123 and 456), and special characters (like # and $). Some people might find it confusing or difficult to read, but with the right tools and techniques, it's easy to clean and analyze this text." | |
clean_text(text) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment