Javascript: Regex Help Needed Please
Solution 1:
Try
/^https?:\/\/(?:www\.)?google(?:\.[a-z]{2,3}){1,2}\/.*[&\?]q=[^&]+?/i
The (?:\.[a-z]{2,3}){1,2}
is to match like .com.au
, .co.uk
etc.
Solution 2:
^https?://www\.google\.[a-z]{2,3}/q=
assuming just 2-3 letters for tld would be ok. If you're using it between forward slashes (/), you'd want to escape them on this regex.
Solution 3:
Don't try to tailor a regular expression. It will be unmaintainable -- if you can't find the problem with it today, what hope does the maintainer have to find a problem with it tomorrow?
Parse the URL properly, perhaps by using the regular expression which won't need to be maintained because the core URL syntax doesn't change.
From RFC 3986:
The following line is the regular expression for breaking-down a well-formed URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression
<n>
as$<n>
. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related
Using that, you can check your URL in JavaScript by doing the following:
var match = url.match(/^(([^:/?#]+):)?(\/\/([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/);
if (!match) { throw new Error('not a URL'); }
var url = {
protocol: match[2],
authority: match[4], // host, port, username, password
path: match[5],
query: match[6],
fragment: match[8]
};
if (url.protocol !== 'http' && url.protocol !== 'https') {
throw new Error('bad protocol');
}
if (!/^www.google.[a-z]+$/.test(url.authority || '')) {
throw new Error('bad host');
}
if (!/[?&]q=/.test(url.query || '')) {
throw new Error('bad query');
}
It's more code, but it's much easier to debug, maintain, and as a bonus, you can tailor your explanation of why the URL is problematic.
Solution 4:
Add ^https?:// to the front of the pattern you already have
- ^ anchors the pattern to the beginning of the string
- http is just http
- s? means 1 or 0 s's
- : is just itself
- backslashes need to be escaped
so this is the whole pattern:
(^https?:\/\/www\.google.*?[?&]q=[^&]+)
what i like about the pattern you have: it does not assume that TLDs are two or three characters long.
Solution 5:
var regex = /^https?:\/\/(www\.)?google\.[a-z]{2,3}\/([^/]*[\&]|[\?])q=.+$/i;
Post a Comment for "Javascript: Regex Help Needed Please"