Triggering non-english characters in regular expressions

Posted by anakunda on 2017-11-19 06:43

Hello, can I get an advice how to correctly build regular expressions that properly trigger various non-english characters from extended character set, such as á, ü etc. Simply putting the letters as-is into regexp doesn't match them in searched pattern, either because of code page mismatch or something different,
thank you for any helpful info.

grahams
ActiveState Staff
Fri, 2017-11-24 11:34

The fundamental issue is characters vs bytes:
http://www.perlmonks.org/?node_id=330567

Since RegEx matches "characters", in order to deal with UTF byte strings, you typically need to use the Encode module, as in this example:
http://www.perlmonks.org/?node_id=462670