-
regex: Regular expressions are text patterns that define the form a text string should have.
- useful for email checking patern
- matching word "color" and "colour"
- extra specific info like postal code LOL: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."
-
How regex started (birth of grep) Ken Thompson's work didn't end in just writing a paper. He included support for these regular expressions in his version of QED. To search with a regular expression in QED, the following had to be written: g//p In the preceding line of code, g means global search and p means print. If, instead of writing regular expression, we write the short form re, we get g/re/p, and therefore, the beginnings of the venerable UNIX command-line tool grep
?: match single char (file?.xml matches file1.xml and file9.xml but not file99.xml)
- : match any numder of char
in file?.xml: literals -> file and xml metacharacters -> ? (or '*' )
/a\w*/ ==> matches any word starting with 'a'
Metachars can coexist but what if need to use metachar as luterals? 3 ways to do it:
- escape the metachar by preceding with a backlash
- in python , use "re.escape"
- Quoting with \Q and \E: (not supported in Python)
There are 12 metachar that should be escaped when needed to use as char: \ backslash ^ Caret $ Dollar Sign . Dot | Pipe Symbol ? Question
- Asterik
- Plus sign ( ) [ {
Character classes allow us to define a char that will match if any of defined char on set is present
for example to match "license" and "licene" --> /licen[sc]e/ we can use range of chars [b-e] or num [2-9] Ranges can be combined : [0-9a-zA-z]
- Negation of ranges [^0-9] match anything not a number but there has to be a char e.g. /hello[^0-9]/ wont match hello as there no char in its place
| Element | Description | . | matches any char except newline | \d | matches any decimal , equivalent to [0-9] | \D | matches any non-digit , eq to [^0-9] | \s | matches any whitespace class: eq to [ \t\n\r\f\v ] | \S | matches non-whitespace , eq to [ ^ \t\n\r\f\v ] | \w | matches any alphanumeric eq to [0-9a-zA-Z_]
[^/] -> matches any char thats not a backslash or slash