Javascript: Regex Help Needed Please

October 31, 2022 Post a Comment

When it comes to Regex I am dumber than a door nail, so when making a Firefox extension I asked a friend for help and he gave me this: if( doc.location.href.match(/(www\.google.*?[

Solution 1:

Try

/^https?:\/\/(?:www\.)?google(?:\.[a-z]{2,3}){1,2}\/.*[&\?]q=[^&]+?/i

The (?:\.[a-z]{2,3}){1,2} is to match like .com.au, .co.uk etc.

Solution 2:

^https?://www\.google\.[a-z]{2,3}/q=

assuming just 2-3 letters for tld would be ok. If you're using it between forward slashes (/), you'd want to escape them on this regex.

Solution 3:

Don't try to tailor a regular expression. It will be unmaintainable -- if you can't find the problem with it today, what hope does the maintainer have to find a problem with it tomorrow?

Parse the URL properly, perhaps by using the regular expression which won't need to be maintained because the core URL syntax doesn't change.

From RFC 3986:

The following line is the regular expression for breaking-down a well-formed URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to
 http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
 $1 = http:
 $2 = http
 $3 = //www.ics.uci.edu
 $4 = www.ics.uci.edu
 $5 = /pub/ietf/uri/
 $6 = <undefined>
 $7 = <undefined>
 $8 = #Related
 $9 = Related

Using that, you can check your URL in JavaScript by doing the following:

var match = url.match(/^(([^:/?#]+):)?(\/\/([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/);
if (!match) { throw new Error('not a URL'); }
var url = {
  protocol: match[2],
  authority: match[4],  // host, port, username, password
  path: match[5],
  query: match[6],
  fragment: match[8]
};
if (url.protocol !== 'http' && url.protocol !== 'https') {
  throw new Error('bad protocol');
}
if (!/^www.google.[a-z]+$/.test(url.authority || '')) {
  throw new Error('bad host');
}
if (!/[?&]q=/.test(url.query || '')) {
  throw new Error('bad query');
}

It's more code, but it's much easier to debug, maintain, and as a bonus, you can tailor your explanation of why the URL is problematic.

Solution 4:

Add ^https?:// to the front of the pattern you already have

^ anchors the pattern to the beginning of the string
http is just http
s? means 1 or 0 s's
: is just itself
backslashes need to be escaped

so this is the whole pattern:

(^https?:\/\/www\.google.*?[?&]q=[^&]+)

what i like about the pattern you have: it does not assume that TLDs are two or three characters long.

Solution 5:

var regex = /^https?:\/\/(www\.)?google\.[a-z]{2,3}\/([^/]*[\&]|[\?])q=.+$/i;

JavaScript OfficialSite