If you have encounter Regular Expressions, they may seem like a random string of gibberish. While they might look awkward (confusing syntax), they are also extremely useful. Understanding regular expressions will make you a much more effective programmer. In order to fully understand the regex world you first need to learn the basics concepts, on which you can later build.
A regular expression is a string that describes a pattern e.g., email addresses and phone numbers. In JavaScript, regular expressions are objects. JavaScript provides the built-in RegExp type that allows you to work with regular expressions effectively.
Through this tutorial, you’ll learn a little about some of the JavaScript regular expressions. After the tutorial, you’ll know how to use somne of the regular expressions effectively to search and replace strings.
- Anchors
- Quantifiers
- Grouping Constructs
- Bracket Expressions
- Character Classes
- The OR Operator
- Flags
- Character Escapes
Anchors have special meaning in regular expressions. They do not match any character. Instead, they match a position before or after characters:
^ – The caret anchor matches the beginning of the text. $ – The dollar anchor matches the end of the text. See the following example:
let str = 'JavaScript';
console.log(/^J/.test(str));
output:
true
Feature | Syntax | Description | Example |
---|---|---|---|
String anchor | ^ (caret) | Matches at the start of the string the regex pattern is applied to. | ^. matches a in abc\ndef |
String anchor | $ (dollar) | Matches at the end of the string the regex pattern is applied to. | .$ matches f in abc\ndef |
Line anchor | ^ (caret) | Matches after each line break in addition to matching at the start of the string, thus matching at the start of each line in the string. | ^. matches a and d in abc\ndef |
Line anchor | $ (dollar) | Matches before each line break in addition to matching at the end of the string, thus matching at the end of each line in the string. | $ matches c and f in abc\ndef |
Quantifiers match a number of instances of a character, group, or character class in a string. Quantity Exact count {n} A number in curly braces {n}is the simplest quantifier. When you append it to a character or character class, it specifies how many characters or character classes you want to match.
For example, the regular expression /\d{4}/ matches a four-digit number. It is the same as /\d\d\d\d/:
let str = 'ECMAScript 2020';
let re = /\d{4}/;
let result = str.match(re);
console.log(result);
output:
["2020"]
Feature | Syntax | Description | Example |
---|---|---|---|
The range | {n,m} | The range matches a character or character class from n to m times. | to find numbers that have two, three or four digits, use the regular expression /\d{2,4}/g: |
Shorthands + | + (plus) | The quantifier {1,} means one or more which has the shorthand as + | The \d+ searches for numbers |
Shorthands ? | ? (question mark) | The quantifier ? means zero or one. It is the same as {0,1}. | /colou?r/ will match both color and colour |
Lazy quantifier | ?? | Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. | abc?? matches ab or abc |
It could be common characters or regexp qualifiers like the anchors. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. For Example:
// without grouping
> 'red reform read arrest'.replace(/reform|rest/g, 'X')
< "red X read arX"
// with grouping
> 'red reform read arrest'.replace(/re(form|st)/g, 'X')
< "red X read arX"
Feature | Syntax | Description | Example |
---|---|---|---|
Capturing group | (regex) | Parentheses group the regex between them. They allow you to apply regex operators to the entire grouped regex. | (abc){3} matches abcabcabc. First group matches abc. |
Backreference | \1 through \9 | Substituted with the text matched between the 1st through 9th numbered capturing group | (abc |
Non-capturing group | (?:regex) | Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything. | (?:abc){3} matches abcabcabc. No groups. |
Brackets indicate a set of characters to match. Any individual character between the brackets will match, for example [a-h] will match all the letters from a to h. Ranges can also be digits like [0-9] or capital letters like [A-Z].
var regex = /[a-z]ear/;
console.log(regex.test('fear'));
// returns true
Feature | Syntax | Description | Example |
---|---|---|---|
Literal opening bracket | "[" (opening square bracket) | An opening square bracket is a literal character that adds an opening square bracket to the character class. | [ab[cd]ef] matches aef], bef], [ef], cef], and def]. |
Backslash escapes a metacharacter | \ (backslash) followed by any of ^-]. | A backslash escapes special characters to suppress their special meaning. | [^\]]] matches ^ or ] |
Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits. For example:
const chessStory = 'He played the King in a8 and she moved her Queen in c2.';
const regexpCoordinates = /\w\d/g;
console.log(chessStory.match(regexpCoordinates));
// expected output: Array [ 'a8', 'c2']
Feature | Syntax | Description | Example |
---|---|---|---|
Literal character | Any character except ^-]\ | All characters except the listed special characters are literal characters that add themselves to the character class. | [abc] matches a, b or c |
Meta-characters | \n, \r and \t | Meta-characters are characters with a special meaning. There are many meta character but I am going to cover the most important ones here. | \d — Match any digit character ( same as [0-9] ). |
The logical OR (||) operator (logical disjunction) for a set of operands is true if and only if one or more of its operands is true. It is typically used with Boolean (logical) values. When it is, it returns a Boolean value. However, the || operator actually returns the value of one of the specified operands, so if this operator is used with non-Boolean values, it will return a non-Boolean value. for example:
const a = 3;
const b = -2;
console.log(a > 0 || b > 0);
// expected output: true
Feature | Syntax | Description | Example |
---|---|---|---|
Alternation | expr1 (OR) expr2 | If a value can be converted to true, the value is so-called truthy. If a value can be converted to false, the value is so-called falsy. | If expr1 can be converted to true, returns expr1; else, returns expr2. |
Flags, in a regular expression, are tokens that modify its behavior of searching. Flags are optional parameters that can be added to a plain expression to make it search in a different way. Each flag is denoted by a single alphabetic character, and serves different purposes in modifying the expression's searching behaviour. For example the flag i, which stands for ignore casing, does the job of carrying out a case-insensitive search. Similarly, the flag g, which stands for global, serves to extend the searching to find all matches for a given expression inside a string, instead of stopping on the first match.
Syntax | Description |
---|---|
i | With this flag the search is case-insensitive: no difference between A and a. |
g | With this flag the search looks for all matches, without it – only the first match is returned. |
s | Enables “dotall” mode, that allows a dot . to match newline character \n |
u | Enables full Unicode support. The flag enables correct processing of surrogate pairs |
y | “Sticky” mode: searching at the exact position in the text |
The / character used as a delimiter for RegExp objects. Althoug you can manually escape the metacharacters where needed, you'll have to somehow escape all the metacharacters while constructing the regexp. For example:
> function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
};
For more examples you can visit https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
Feature | Syntax | Description | Example |
---|---|---|---|
Character escape | \n, \r and \t | Match an LF character, CR character and a tab character respectively. | \r\n matches a Windows CRLF line break. |
Control character escape | \cA through \cZ | Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A | \cM\cJ matches a Windows CRLF line break |
Mohamed Mesahel, PMP Current Student and Web Developer at Full-stack Coding Bootcamp, University of Washington.