Regular Expression (RegEx) is a powerful tool for searching and replacing text
that I have implemented in the X-Word Replacer since its first version in 2018.
I hadn't used it much in the past until recently when I needed to match a pattern in a text.
I discovered that RegEx is very useful for tasks related to web page text processing, especially when replacing multiple patterns at once.
For example, in this blog post, can you find all email addresses, such as my personal email, LHViet88@Gmail.com, in just a couple of seconds?
X-Word-Replacer
My first extension is a Word Count add-on for Firefox, which I developed in 2013.
X-Word Replacer is my second extension, released for Chrome in 2018.
As the name suggests, X-Word Replacer helps users replace words or phrases on a web page in Chromium-based browsers in real-time.
It is a simple tool, but it is very useful for web developers, content creators, and anyone who needs to work with text processing on the web.
The UI of X-Word Replacer is straightforward, with:
- A top bar for action buttons: Search and Highlight, Add New Rules, and Replace.
- A list of rules: checkboxes for enabling/disabling, input fields for searching and replacing, and a button for deleting.
- A footer for advanced settings, such as Match Case (case-sensitive), RegEx, Search/replace in inputs and textareas, and Search in the whole webpage or do nothing.
sucoivasannu-5428@yopmail.com
Below are some use cases for X-Word Replacer:
- Replace a word or phrase with another word or phrase
- Highlight a word or phrase
- Multi-highlight and replace with RegEx
- Count the number of words or phrases on a page
One use case that I found from the Reddit community is correcting wrong machine translations to read untranslated web novels.
Regular Expression (RegEx) in X-Word-Replacer
In most cases, we know the exact word or phrase we want to search or replace, such as changing "a" to "b" or correcting wrong words to the right ones.
But sometimes, we need to search for a pattern of strings, not an exact word or phrase.
For example, we might want to search, highlight, and count all possible email addresses in this post.
Here is the third email in a Quote, with the embedded hyperlink leading to the extension X Word Replacer: fogugita.icubiqip@gotgel.org
A possible regular expression for searching email is
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
.
Note that, the extension already support /g
flag for global search, so we don't need to add it in the pattern, but only regex value
Or any string that you can find a rule to match it, for example:
- An accountant searching for all words that start with "a" and contain a "z", such as "Amazing", "Azure", "Analyze", etc., which can be matched with the pattern
\ba\w*z\w*\b
.
- A security researcher investigating a security incident and trying to collect evidence across multiple web pages or reports.
- Finding hashes, such as highlighted hashes in this post: https://www.trendmicro.com/en_us/research/24/c/cve-2024-21412--darkgate-operators-exploit-microsoft-windows-sma.html.
- IP addresses.
- URLs, domain URLs, etc.
- Email addresses, such as "sucoivasannu-5428@yopmail.com", etc.
To ensure the RegEx pattern is correct, you can test it in tools like RegExr and Regex101, which are very helpful for both beginners and experts alike.
And make sure that the check box Use Regular Expression
is checked.
Make sure that only RegEx value in the Search input.
If your full regex pattern is something like this /a\w*z\w*/gi
, then you should input only a\w*z\w*
in the Search field.
No /
at the beginning and no /gi
at the end.
- By default, the regex pattern you input will be applied globally (multiple matches). Your input
\ba\w*z\w*\b
will be converted in the background to /g
flag, i.e., /a\w*z\w*/g
.
- The
Match Case (regex /i)
ensures that all regex patterns are case-insensitive, but you can uncheck this checkbox to make it case-sensitive. It will add /i to your regex if unchecked. For example, \ba\w*z\w*\b
becomes /a\w*z\w*/gi
.
- The
Input/TextArea
option allows you to replace (no search/highlight for input fields) strings in input fields and text areas.
- The
Web page
option should be checked in most cases to search or replace in the whole webpage. If unchecked, the extension will do nothing with plain text on the webpage.
- The
Raw HTML (modify html may break your page)
option is only for testing purposes if you want to edit the raw HTML of a webpage.
For example, to replace all <p>
tags with <h4>
tags, you can input <p>
in the search field and <h4>
in the replace field, and </p>
and </h4>
.
Conclusion
There are numerous potential use cases for RegEx in X-Word Replacer, and I hope this post helps you understand how to utilize RegEx in the extension.
If you have any questions or feedback, feel free to leave a comment below or contact me via email.
And finally, how many email addresses did you find in this post? 😉
Troubleshooting
Sometimes, if you see a pattern is not highlighted correctly, it might be due to the RegEx pattern.
For example, to highlight IP addresses in this page: https://www.volexity.com/blog/2024/05/15/detecting-compromise-of-cve-2024-3400-on-palo-alto-networks-globalprotect-devices/
If you use this pattern \b((25[0-5]|(2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(25[0-5]|(2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))\b
, it can match all IP addresses in the text,
but because of optional group |
in the pattern, it recognize an ip address like this 172.233.228.93
as multiple groups, and therefore splitting the match into multiple parts
Original text
Inaccurate search pattern is used
To fix it, you can find a better pattern or revise the current one to make it to capture the whole IP address match
instead of groups. For example, using the non-capturing group (?:...)
instead of capturing group (...)
.
\b(?:(?:25[0-5]|(?:2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:25[0-5]|(?:2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))\b
Correct search pattern is used
Demo