Match everything except for specified strings
You could use a look-ahead assertion:
This example matches three digits other than
But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.
A compatible regular expression with basic syntax only would be:
This does also match any three digits sequence that is not
If you want to match a word A in a string and not to match a word B. For example: If you have a text:
1. I have a two pets - dog and a cat 2. I have a pet - dog
If you want to search for lines of text that HAVE a dog for a pet and DOESN’T have cat you can use this regular expression:
It will find only second line:
2. I have a pet - dog
Regular expression to match a line that doesn’t contain a word?
hoho hihi haha hede
grep "<Regex for 'doesn't contain hede'>" input
hoho hihi haha
The notion that regex doesn’t support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:
The regex above will match any string, or line without a line break, not containing the (sub)string ‘hede’. As mentioned, this is not something regex is “good” at (or should do), but still, it is possible.
And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing
s in the following pattern):
or use it inline:
/.../ are the regex delimiters, i.e., not part of the pattern)
If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class
A string is just a list of
n characters. Before, and after each character, there’s an empty string. So a list of
n characters will have
n+1 empty strings. Consider the string
┌────┬───┬────┬───┬────┬───┬────┬───┬────┬───┬────┬───┬────┬───┬────┬───┬────┐ S = │ e1 │ A │ e2 │ B │ e3 │ h │ e4 │ e │ e5 │ d │ e6 │ e │ e7 │ C │ e8 │ D │ e9 │ └────┴───┴────┴───┴────┴───┴────┴───┴────┴───┴────┴───┴────┴───┴────┴───┴────┘ index 0 1 2 3 4 5 6 7
e‘s are the empty strings. The regex
(?!hede). looks ahead to see if there’s no substring
"hede" to be seen, and if that is the case (so something else is seen), then the
. (dot) will match any character except a line break. Look-arounds are also called zero-width-assertionsbecause they don’t consume any characters. They only assert/validate something.
So, in my example, every empty string is first validated to see if there’s no
"hede" up ahead, before a character is consumed by the
. (dot). The regex
(?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times:
((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed:
As you can see, the input
"ABhedeCD" will fail because on
e3, the regex
(?!hede) fails (there is
"hede" up ahead!).