Regex is a powerful pattern matching tool that allows you to fit a variety of uses.

Here are some of the different pattern matching variables you can use to create a pattern that suits your need.

Character Characteristics

  • . - any character except newline
  • \w\d\s - word, digit, whitespace
  • \W\D\S - not word, digit, whitespace
  • [abc] - any of a, b, or c
  • [^abc] - not a, b, or c
  • [a-g] - character between a & g

Anchors

  • ^abc$ - start / end of the string
  • \b\B - word, not-word boundary

Escaped Characters

  • ^abc$ - start / end of the string
  • \b\B - word, not-word boundary

Groups & Lookaround

  • (abc) - capture group
  • \1 - backreference to group #1
  • (?:abc) - non-capturing group
  • (?=abc) - positive lookahead
  • (?!abc) - negative lookahead

Quantifiers & Alternation

  • a*a+a? - 0 or more, 1 or more, 0 or 1
  • a{5}a{2,} - exactly five, two or more
  • a{1,3} - between one & three
  • a+?a{2,}? - match as few as possible
  • ab|cd - match ab or cd

Scenario 1

You would like to match on the all variants of a domain, including protocols, subdomains or subdirectories. In this example we will match on "example.com" with the regex:

(https?:\/\/(.+?\.)?example\.com(\/[A-Za-z0-9\-\._~:\/\?#\[\]@!$&'\(\)\*\+,;\=]*)?)

The above expression will match on any of the following URLs:

https://subdomain.example.com/...
https://example.com/folder/...
http://www.example.com/...
https://www.example.com/...
https://example.com/subdirectory...
http://example.com
https://example.com

Explanation

(https?:\/\/(.+?\.)?example\.com(\/[A-Za-z0-9\-\._~:\/\?#\[\]@!$&'\(\)\*\+,;\=]*)?)

1st Capturing Group

(https?:\/\/(.+?\.)?example\.com(\/[A-Za-z0-9\-\._~:\/\?#\[\]@!$&'\(\)\*\+,;\=]*)?)

http matches the characters http literally (case-sensitive)

s? matches the character s literally (case-sensitive)

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed

: matches the character : literally (case-sensitive)

\/ matches the character / literally (case-sensitive)

\/ matches the character / literally (case-sensitive)

2nd Capturing Group

(.+?\.)?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed

.+? matches any character (except for line terminators)

\. matches the character . literally (case-sensitive)

example matches the characters example literally (case-sensitive)

\. matches the character . literally (case-sensitive)

com matches the characters com literally (case-sensitive)