Regex That Splits Long Text In Separate Sentences With Match()
Solution 1:
I simplified it a lot, either match the end of a line (new line) or a sentence followed by punctuation:
var tregex = /\n|([^\r\n.!?]+([.!?]+|$))/gim;
I also believe the m
flag for multiline is important
Solution 2:
You can use the following regex:
/((?:\S[^\.\?\!]*)[\.\?\!]*)/g
Lets break this down:
"g" is for flag for global match, meaning keep matching after the first occurrence
Working from the inside out, (?:) is a delimiter that allows us to group an expression, but discard the matched result from the output. We are matching \S (non-whitespace) that does not contain a period, question mark, or exclamation point.
You stated you wanted to keep this punctuation, so the next part following the match [.\?!] is a series which contains these same punctuation symbols so they are included in the outer delimiters. EDIT: I added the asterisk for this to include any number of punctuation, or none at all at the end of a sentence.
Check out the matched groups using http://www.pagecolumn.com/tool/regtest.htm, or a similar Javascript regex tester.
Post a Comment for "Regex That Splits Long Text In Separate Sentences With Match()"