Skip to content

Instantly share code, notes, and snippets.

@kshirsagarsiddharth
Created December 27, 2022 09:38
Show Gist options
  • Save kshirsagarsiddharth/e9a435a076d8c440ecdcc051b8189d42 to your computer and use it in GitHub Desktop.
Save kshirsagarsiddharth/e9a435a076d8c440ecdcc051b8189d42 to your computer and use it in GitHub Desktop.
import re
def clean_social_media_data(text):
# Remove hashtags and mentions
text = re.sub(r'#\w+', '', text)
text = re.sub(r'@\w+', '', text)
# Remove emojis
text = re.sub(r'[^\x00-\x7F]+', '', text)
# Remove URL links
text = re.sub(r'http\S+', '', text)
# Remove punctuation and non-alphabetic characters
text = re.sub(r'[^\w\s]', '', text)
# Remove leading and trailing whitespace
text = text.strip()
return text
# Test the function
text = ""I had a great time at the party last night! 😎 #party #friends @siddharth @sid""
clean_social_media_data(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment