Skip to content

Instantly share code, notes, and snippets.

@aychedee
Last active December 15, 2015 14:39
Show Gist options
  • Save aychedee/5276197 to your computer and use it in GitHub Desktop.
Save aychedee/5276197 to your computer and use it in GitHub Desktop.
Regex to flexibly match anything that looks url like
import re
url_regex = re.compile(r"""[^\s] # not whitespace
[a-zA-Z0-9:/\-]+ # the protocol and domain name
\.(?!\.) # A literal '.' not followed by another
[\w\-\./\?=&%~#]+ # country and path components
[^\s] # not whitespace""", re.VERBOSE)
test_corpus = """
http://www.example.com and some text. www.example.com/?q=4897&7845%20 example.co.nz example.com/#my-tab
"""
print url_regex.findall(test_corpus)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment