How To Find Private Char Utf8 In A Text?
In the UTF-8 encoding table and Unicode characters, i use the Supplementary private use area because there are single char that i m sure they won't be used in any text. The fact i
Solution 1:
There are three private use areas:
- One in the Basic Multilingual Plane,
\uE000-\uF8FF
, - Plane 15,
\u{F0000}-\u{FFFFD}
, and - Plane 16,
\u{100000}-\u{10FFFD}
.
You may use
/[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu
to match all the occurrences of these characters with the ES6 compliant regex.
See Regex modifier /u in JavaScript? to learn more about u
modifier. Here, it is necessary to support \u{XXXXX}
notation.
The ES5 compliant pattern is
/(?:[\uE000-\uF8FF]|[\uDB80-\uDBBE\uDBC0-\uDBFE][\uDC00-\uDFFF]|[\uDBBF\uDBFF][\uDC00-\uDFFD])/g
To get the array of hex code for the code points matched use some additional JavaScript code:
const str = "\u{f0001} hahrehr \u{f0002} eryteryte \u{f0003}\n yfukguk\u{f0004}\nggikggk</";
const regex = /[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu;
console.log(
str.match(regex).map(x =>Array.from(x)
.map((v) => v.codePointAt(0).toString(16))
.map((hex) =>"0000".substring(0, 4 - hex.length) + hex))
);
Post a Comment for "How To Find Private Char Utf8 In A Text?"