How To Find Private Char Utf8 In A Text?
In the UTF-8 encoding table and Unicode characters, i use the Supplementary private use area because  there are single char that i m sure they won't be used in any text. The fact i
Solution 1:
There are three private use areas:
- One in the Basic Multilingual Plane, 
\uE000-\uF8FF, - Plane 15, 
\u{F0000}-\u{FFFFD}, and - Plane 16, 
\u{100000}-\u{10FFFD}. 
You may use
/[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu
to match all the occurrences of these characters with the ES6 compliant regex.
See Regex modifier /u in JavaScript? to learn more about u modifier. Here, it is necessary to support \u{XXXXX} notation.
The ES5 compliant pattern is
/(?:[\uE000-\uF8FF]|[\uDB80-\uDBBE\uDBC0-\uDBFE][\uDC00-\uDFFF]|[\uDBBF\uDBFF][\uDC00-\uDFFD])/g
To get the array of hex code for the code points matched use some additional JavaScript code:
const str = "\u{f0001} hahrehr \u{f0002} eryteryte \u{f0003}\n yfukguk\u{f0004}\nggikggk</";
const regex = /[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu;
console.log(
  str.match(regex).map(x =>Array.from(x)
    .map((v) => v.codePointAt(0).toString(16))
    .map((hex) =>"0000".substring(0, 4 - hex.length) + hex))
);
Post a Comment for "How To Find Private Char Utf8 In A Text?"