Skip to content

Instantly share code, notes, and snippets.

@MohamedMesahel
Last active September 11, 2021 00:21
Show Gist options
  • Save MohamedMesahel/ac9ee13771dd5bdd4267c6daea7aa4a2 to your computer and use it in GitHub Desktop.
Save MohamedMesahel/ac9ee13771dd5bdd4267c6daea7aa4a2 to your computer and use it in GitHub Desktop.
Regex Tutorial - JavaScript

Regex Tutorial - JavaScript

If you have encounter Regular Expressions, they may seem like a random string of gibberish. While they might look awkward (confusing syntax), they are also extremely useful. Understanding regular expressions will make you a much more effective programmer. In order to fully understand the regex world you first need to learn the basics concepts, on which you can later build.

A regular expression is a string that describes a pattern e.g., email addresses and phone numbers. In JavaScript, regular expressions are objects. JavaScript provides the built-in RegExp type that allows you to work with regular expressions effectively.

Summary

Through this tutorial, you’ll learn a little about some of the JavaScript regular expressions. After the tutorial, you’ll know how to use somne of the regular expressions effectively to search and replace strings.

Table of Contents

Regex Components

Anchors

Anchors have special meaning in regular expressions. They do not match any character. Instead, they match a position before or after characters:

^ – The caret anchor matches the beginning of the text. $ – The dollar anchor matches the end of the text. See the following example:

let str = 'JavaScript';
console.log(/^J/.test(str));

output:

true
Feature Syntax Description Example
String anchor ^ (caret) Matches at the start of the string the regex pattern is applied to. ^. matches a in abc\ndef
String anchor $ (dollar) Matches at the end of the string the regex pattern is applied to. .$ matches f in abc\ndef
Line anchor ^ (caret) Matches after each line break in addition to matching at the start of the string, thus matching at the start of each line in the string. ^. matches a and d in abc\ndef
Line anchor $ (dollar) Matches before each line break in addition to matching at the end of the string, thus matching at the end of each line in the string. $ matches c and f in abc\ndef

Quantifiers

Quantifiers match a number of instances of a character, group, or character class in a string. Quantity Exact count {n} A number in curly braces {n}is the simplest quantifier. When you append it to a character or character class, it specifies how many characters or character classes you want to match.

For example, the regular expression /\d{4}/ matches a four-digit number. It is the same as /\d\d\d\d/:

let str = 'ECMAScript 2020';
let re = /\d{4}/;

let result = str.match(re);

console.log(result);

output:

["2020"]
Feature Syntax Description Example
The range {n,m} The range matches a character or character class from n to m times. to find numbers that have two, three or four digits, use the regular expression /\d{2,4}/g:
Shorthands + + (plus) The quantifier {1,} means one or more which has the shorthand as + The \d+ searches for numbers
Shorthands ? ? (question mark) The quantifier ? means zero or one. It is the same as {0,1}. /colou?r/ will match both color and colour
Lazy quantifier ?? Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. abc?? matches ab or abc

Grouping Constructs

It could be common characters or regexp qualifiers like the anchors. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. For Example:

// without grouping
> 'red reform read arrest'.replace(/reform|rest/g, 'X')
< "red X read arX"
// with grouping
> 'red reform read arrest'.replace(/re(form|st)/g, 'X')
< "red X read arX"
Feature Syntax Description Example
Capturing group (regex) Parentheses group the regex between them. They allow you to apply regex operators to the entire grouped regex. (abc){3} matches abcabcabc. First group matches abc.
Backreference \1 through \9 Substituted with the text matched between the 1st through 9th numbered capturing group (abc
Non-capturing group (?:regex) Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything. (?:abc){3} matches abcabcabc. No groups.

Bracket Expressions

Brackets indicate a set of characters to match. Any individual character between the brackets will match, for example [a-h] will match all the letters from a to h. Ranges can also be digits like [0-9] or capital letters like [A-Z].

var regex = /[a-z]ear/;
console.log(regex.test('fear'));
// returns true
Feature Syntax Description Example
Literal opening bracket "[" (opening square bracket) An opening square bracket is a literal character that adds an opening square bracket to the character class. [ab[cd]ef] matches aef], bef], [ef], cef], and def].
Backslash escapes a metacharacter \ (backslash) followed by any of ^-]. A backslash escapes special characters to suppress their special meaning. [^\]]] matches ^ or ]

Character Classes

Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits. For example:

const chessStory = 'He played the King in a8 and she moved her Queen in c2.';
const regexpCoordinates = /\w\d/g;
console.log(chessStory.match(regexpCoordinates));
// expected output: Array [ 'a8', 'c2']
Feature Syntax Description Example
Literal character Any character except ^-]\ All characters except the listed special characters are literal characters that add themselves to the character class. [abc] matches a, b or c
Meta-characters \n, \r and \t Meta-characters are characters with a special meaning. There are many meta character but I am going to cover the most important ones here. \d — Match any digit character ( same as [0-9] ).

The OR Operator

The logical OR (||) operator (logical disjunction) for a set of operands is true if and only if one or more of its operands is true. It is typically used with Boolean (logical) values. When it is, it returns a Boolean value. However, the || operator actually returns the value of one of the specified operands, so if this operator is used with non-Boolean values, it will return a non-Boolean value. for example:

const a = 3;
const b = -2;

console.log(a > 0 || b > 0);
// expected output: true
Feature Syntax Description Example
Alternation expr1 (OR) expr2 If a value can be converted to true, the value is so-called truthy. If a value can be converted to false, the value is so-called falsy. If expr1 can be converted to true, returns expr1; else, returns expr2.

Flags

Flags, in a regular expression, are tokens that modify its behavior of searching. Flags are optional parameters that can be added to a plain expression to make it search in a different way. Each flag is denoted by a single alphabetic character, and serves different purposes in modifying the expression's searching behaviour. For example the flag i, which stands for ignore casing, does the job of carrying out a case-insensitive search. Similarly, the flag g, which stands for global, serves to extend the searching to find all matches for a given expression inside a string, instead of stopping on the first match.

Syntax Description
i With this flag the search is case-insensitive: no difference between A and a.
g With this flag the search looks for all matches, without it – only the first match is returned.
s Enables “dotall” mode, that allows a dot . to match newline character \n
u Enables full Unicode support. The flag enables correct processing of surrogate pairs
y “Sticky” mode: searching at the exact position in the text

Character Escapes

The / character used as a delimiter for RegExp objects. Althoug you can manually escape the metacharacters where needed, you'll have to somehow escape all the metacharacters while constructing the regexp. For example:

> function escapeRegExp(string) {
   return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
 };

For more examples you can visit https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Feature Syntax Description Example
Character escape \n, \r and \t Match an LF character, CR character and a tab character respectively. \r\n matches a Windows CRLF line break.
Control character escape \cA through \cZ Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A \cM\cJ matches a Windows CRLF line break

Author

Mohamed Mesahel, PMP Current Student and Web Developer at Full-stack Coding Bootcamp, University of Washington.

Github

Mohamed Mesahel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment