In this article, we are going to explain and see the usage of regular expressions in Java with examples. Regular expressions, which are very useful at times, can also be somewhat complex.
What is a regular expression in Java?
We can define a regular expression as a sequence of characters that form a pattern or sequence that can be automated in some way.
For example, if we want to remove all the words “the” from a text, we can use a pattern to remove them. A regular expression will allow us to search or replace a sequence.
What do we need in Java to create a regular expression?
In order to use regular expressions in Java, we need to import the regex package, which was introduced in version 1.4 of Java.
The regex package provides us with the following classes:
- Matcher: This class allows us to match against the sequence of characters defined by the Pattern.
- MatchResult: The result of the match operation.
- Pattern: It is the representation of the regular expression.
- PatternSyntaxException: An Unchecked Exception is thrown to indicate a pattern expression error.
How to create a regular expression in Java
To create a regular expression, we will use quantifiers and metacharacters.
Quantifiers for a regular expression in Java
We have special characters that indicate the number of repetitions of the expression. The following table shows the characters:
Cuantificador | Descripción |
---|---|
n+ | Find any string with at least one “n” |
n* | Find zero or more occurrences of n |
n? | Find the occurrence of “n” zero or one time(s) in the string |
n{x} | Find the sequence of “n” occurring x times. |
n{x,} | Find a sequence occurring “X” times, as indicated by “n”. |
Metacharacters in a regular expression in Java
Metacaracter | Descripción |
---|---|
| | Symbol to indicate OR |
. | Find any character |
^ | Used to match at the beginning of the string. |
$ | Matches at the end of a string. |
\d | Find digits |
\s | Find a space |
\b | Matches at the beginning of a word. |
\uxxxx | Find the Unicode character specified by the hexadecimal number xxxx. |
Metacharacters and examples with regular expressions
Expresión regular | Descripción |
---|---|
. | Matches with any character. |
^regex | Find any expression that matches at the beginning of a line. |
regex$ | Find the expression that matches at the end of a line. |
[abc] | Establishes the definition of the expression, for example, the written expression would match with a, b, or c. |
[abc][vz] | Establish a definition where it matches with a, b, or c followed by either v or z. |
[^abc] | When the ^ symbol appears at the beginning of an expression after [, it negates the defined pattern. For example, the previous pattern negates the pattern, that is, it matches everything except a, b, or c. |
[e-f] | When we use -, we define ranges. For example, in the previous expression, we aim to match a letter between e and f. |
Y|X | Set an OR, find either Y or X. |
HO | Find HO |
$ | Check if the end of a line follows. |
Grouping in regular expressions
We can group a part of our regular expression with parentheses. In addition to grouping, we can create a back reference to the expression, that is, a reference that later stores the part of the string that matches the group.
To refer to a specific group, we will use the $ symbol, for example:
Remove the words “hello” and “goodbye”:
String pattern = "(\\hola)(\\qué)(adios)"; System.out.println(TEXT.replaceAll(pattern, "$1$3"));
That is, with the previous code we have taken patterns 1 and 3 and applied them to our text.
Backslash in regular expressions in Java
In regular expressions, the backslash () is used as an escape character to indicate that the next character should be treated literally instead of having a special meaning. For example, if you want to match a period (.), which has a special meaning in regular expressions, you need to use a backslash before it: ..
However, the backslash itself also has a special meaning in regular expressions, so if you want to match a backslash character, you need to escape it with another backslash: \.
Use OR in a pattern.
To use OR in a pattern, we use |, for example:
.*(coche|azul).*
In the previous example, it would match the words “car” or “blue” within a text.
Negation of a pattern
If we need the negation of a pattern, we can use (?!pattern). For example, if we want to find the word “plátano” that is not followed by “s”:
plátano(?!s)
Specific actions in a pattern
We can add three different actions or modes to our regular expression:
- (?i): makes our expression case-insensitive.
- (?m) or Pattern.MULTILINE: enables the multiline mode, which makes the caret (^) and dollar ($) match at the beginning and end of each line.
- (?s) or Pattern.DOTALL: enables the single line mode, which makes the dot (.) match any character, including newline characters.
If we want to apply all three modes, we can use the following format: (?ismx).
Use Pattern and Matcher to create regular expressions
We have already seen how we can create or what we need to create regular expressions. Now let’s see how we can create regular expressions using Pattern and Matcher.
The first thing we need to do is to use Pattern to define the regular expression. Then we will use Matcher with the created Pattern by passing a String as a parameter.
For example:
Pattern pattern = Pattern.compile("\\w+"); Matcher matcher = pattern.matcher("Say Hi");
Examples of regular expressions in Java
Next, we will see different examples of regular expressions in Java:
Find all the words in a text string:
The pattern for this would be:
@Test void given_text_find_any_word() { String text = "Say Hi"; Pattern pattern = Pattern.compile("\\w+"); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
Find all the words ending in ‘id’ followed by ‘null’
@Test void given_text_find_any_word_that_end_with_id_and_null() { String text = "carId:null"; Pattern pattern = Pattern.compile("\\w*Id:null"); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
In the previous example, we defined a regular expression to find all the words ending in “Id:null”.
Find Dates with a Regular Expression
@Test void given_text_find_dates() { String text = "2014-02-02"; Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2}"); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
The previous example would search for any date in a text with the specified format (yyyy-mm-dd).
Find the character sequence that matches a UUID format
@Test void given_text_find_email() { String text = "refactorizando.web@gmail.com"; Pattern pattern = Pattern.compile("^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,6}$", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
The use of UUID to create IDs in databases is quite common, the previous expression finds any UUID in a text.
Find the character sequence that matches an email address
@Test void given_text_find_email() { String text = "refactorizando.web@gmail.com"; Pattern pattern = Pattern.compile("^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,6}$", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
The previous example defines a pattern and a format of possible values for an email address, validating any email that appears in a text.
Note that the double backslash (\) is used to escape the dot symbol.
Find duplicates words
@Test void given_text_find_duplicated_words() { String text = "Hola que que"; Pattern pattern = Pattern.compile("\b(\w+)\s+\1\b", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); assertTrue(matcher.find()); }
\b is used to limit the word, and \1 is used to refer to the match of the first group.
Conclusion about regular expressions in Java
The use of regular expressions in Java will make it easier for us to create and define tasks for cases where we need to work with strings, such as replacing a UUID or changing the format of a date.