How To Find Private Char Utf8 In A Text?

February 10, 2024 Post a Comment

In the UTF-8 encoding table and Unicode characters, i use the Supplementary private use area because there are single char that i m sure they won't be used in any text. The fact i

Solution 1:

There are three private use areas:

One in the Basic Multilingual Plane, \uE000-\uF8FF,
Plane 15, \u{F0000}-\u{FFFFD}, and
Plane 16, \u{100000}-\u{10FFFD}.

You may use

/[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu

to match all the occurrences of these characters with the ES6 compliant regex.

See Regex modifier /u in JavaScript? to learn more about u modifier. Here, it is necessary to support \u{XXXXX} notation.

The ES5 compliant pattern is

/(?:[\uE000-\uF8FF]|[\uDB80-\uDBBE\uDBC0-\uDBFE][\uDC00-\uDFFF]|[\uDBBF\uDBFF][\uDC00-\uDFFD])/g

To get the array of hex code for the code points matched use some additional JavaScript code:

const str = "\u{f0001} hahrehr \u{f0002} eryteryte \u{f0003}\n yfukguk\u{f0004}\nggikggk</";
const regex = /[\uE000-\uF8FF\u{F0000}-\u{FFFFD}\u{100000}-\u{10FFFD}]/gu;
console.log(
  str.match(regex).map(x =>Array.from(x)
    .map((v) => v.codePointAt(0).toString(16))
    .map((hex) =>"0000".substring(0, 4 - hex.length) + hex))
);

JavaScript OfficialSite

How To Find Private Char Utf8 In A Text?

Solution 1:

Post a Comment for "How To Find Private Char Utf8 In A Text?"