Skip to content

Instantly share code, notes, and snippets.

@nicolasdao
Last active August 1, 2024 17:59
Show Gist options
  • Save nicolasdao/9d1e20d06c678675f7d55597fbbe298b to your computer and use it in GitHub Desktop.
Save nicolasdao/9d1e20d06c678675f7d55597fbbe298b to your computer and use it in GitHub Desktop.
My most common JS RegEx. Just sick of restarting from scratch each time I need a freaking RegEx. Keywords: regex regexp regular expression rege reg

REGEX GUIDE

Table of Contents

The dot matches (almost) any character

The belief that the dot can match any character leads to many mistakes. The exceptions to this rule are line-break characters. Line-break characters are:

  • \n: New line.
  • \r\n: New line in Windows.

A typical mistake occurs when remove content between a start and end pattern in a multiline document. For example, let's assume that we have the following piece of text:

Hello,

I'm Charles
delelemestart
I hate you all, and I want you to go to hell.
delelemeend
How can I help you my dear friend?

If we wish to delete Charles' nasty block of thought, the following will not replace anything:

text.replace(/delelemestart(.*?)delelemeend/g,'')

Instead, we need to use:

text.replace(/delelemestart((.|\n)*?)delelemeend/g,'')

Non-greedy RegEx

Example:

"This is another <Hello World> test -> Yeaaaaaaaahh!!!".match(/<(.*?)>/)[1];
//> 'Hello World'

If you want to support multiline, replace (.*?) with ((.|\n)*?)

How do you match the content located between 2 specific symbols? Example, you want to retrieve the text between < and > in the following text: This is another <Hello World> test -> Yeaaaaaaaahh!!!.

In the example above you expect the result of your regex to be Hello World, but bad surprise, you're getting Hello World> test ->.

You're most likely using the standard greedy capture

"This is another <Hello World> test -> Yeaaaaaaaahh!!!".match(/<(.*)>/)[1];
//> 'Hello World> test -'

What you need instead is a non-greedy capture:

"This is another <Hello World> test -> Yeaaaaaaaahh!!!".match(/<(.*?)>/)[1];
//> 'Hello World'

If you need all the matches in your text, use the g option:

"This is another <Hello World> test -> Yeaaaaaaaahh!!! So <happy>".match(/<(.*?)>/g);
//> ['<Hello World>', '<happy>']

Left-Non-Greedy RegEx

Basic

The above is great, and is generally referred as a right-non-greedy regex. To highlight the difference, let's have a look at the following example:

"This is another <Hello <World> test -> Yeaaaaaaaahh!!!".match(/<(.*?)>/)[1];
//> 'Hello <World'

The result of the above regex is Hello <World, but what if we wanted world?

"This is another <Hello <World> test -> Yeaaaaaaaahh!!!".match(/<([^<]*?)>/)[1];
//> 'World'

Advanced

The above example works well when the delimiters are single characters. However, this trick stops working when the delimiters are made of words or multiple characters. The following example is quite difficult to solve with a regex:

"This is another _bla_Hello _bla_World_blip_ test -> Yeaaaaaaaahh!!!"

Where the opening delimiter is _bla_ and the ending delimiter is _blip_.

The trick is to escape the delimiter with a rare ASCII (i.e., an ASCII that should probably never be inserted in the string). If the context of your problem allows for such ASCII, then the following trick will work:

"This is another _bla_Hello _bla_World_blip_ test -> Yeaaaaaaaahh!!!".replace(/_bla_/g, '░').match(/░([^░]*?)_blip_/)[1];
//> 'World'

Matching Series Of Words

// This will match any string that starts (^) with '/ar/' (\/ar\/) or (|) '/es/' (\/es\/)
const regex = /^\/ar\/|\/es\//
"/ar/learn/overview/".match(regex) // true
"/es/learn/overview/".match(regex) // true
"/it/learn/overview/".match(regex) // false

Splitting a String

Not Including The Delimiter

This is the easiest:

"hello_world".split(/_.{1}/g)
// > [ 'hello', 'orld' ]

Including The Delimiter

This is where the regex magic happens. Use a positive lookahead:

"hello_world".split(/(?=_.{1})/g)
// > [ 'hello', '_world' ]

Negating a Reg Exp

Simply use a negative look around:

(?!regexp)

or, if the you're negating a series of characters:

[^regexp]

If we take the example above

// This will match any string that DOES NOT start (^) with '/ar/' (\/ar\/) or (|) '/es/' (\/es\/)
const regex = /^(?!\/ar\/|\/es\/)/
"/ar/learn/overview/".match(regex) // false
"/es/learn/overview/".match(regex) // false
"/it/learn/overview/".match(regex) // true

// The following will replace all non-alphanumeric characters:
const slurp = name.replace(/[^a-zA-Z0-9]/g, '')

Replacing Characters With a Pattern Including The Matching Characters

Use $&

"https://neap.co".replace(/(neap|co)/g, 'hello_$&')
// 'https://hello_neap.hello_co'

Validate A String With Specific Characters

For example, if you want to only allow capital and lowercase as well as +, -, & and % in your input: /^[a-zA-Z\+\-&%]+$/

Frequent Password Validation Rules

Minimum eight characters, at least one lowercase letter, one uppercase letter and one number:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).{8,}$

Minimum eight characters, at least one lowercase letter, one uppercase letter, one number and one special character:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#\$%\^&\*\(\)_\-\+=\[{\]}\\\|;:'",<\.>\/\?`~]).{8,}$

Converting a URL To A Regex

url.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')

Replacing characters except the nth ones?

Use the index argument. The next example skips the first character:

'HelloWorld'.replace(/[A-Z]/g, (l,idx) => idx ? ` ${l.toLowerCase()}` : l) // 'Hello world'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment