URL Blacklist filter format

The format of filters for the URLBlacklist and URLWhitelist policies, as of Chrome 15, is:

    [scheme://][.]host[:port][/path]
  • Scheme can be http, https, ftp, chrome, etc. This field is optional, and must be followed by '://'.
  • An optional '.' (dot) can prefix the host field to disable subdomain matching, see below for details.
  • The host field is required, and is a valid hostname or an IP address. It can also take the special '*' value, see below for details.
  • An optional port can come after the host. It must be a valid port value from 1 to 65535.
  • An optional path can come at the end. Any string can be used here.
The format is very similar to the URL format, with some exceptions:
  • user:pass fields can be included but will be ignored (e.g. http://user:pass@ftp.example.com/pub/bigfile.iso).
  • The host can be '*'. It can also have a '.' as a prefix.
  • URL parameters will be ignored.
The filter selected for a URL is the most specific match found:
  1. First, the filters with the longest host match will be selected;
  2. Among these, filters with a non-matching scheme or port are discarded;
  3. Among these, the filter with the longest matching path is selected;
  4. If no valid filter is left at step 3, the host is reduced by removing the left-most subdomain, and trying again from step 1;
  5. If a filter is available at step 3, its decision (block or allow) is enforced. If no filter ever matches, the default is to allow the request.
The special '*' host will be the last searched, and matches all hosts. When both a blacklist and whitelist filter apply at step 3 (with the same path length), the whitelist filter takes precedence. If a filter has a '.' (dot) prefixing the host, only exact host matches will be filtered:
  • "example.com" matches "example.com", "www.example.com" and "sub.www.example.com";
  • ".www.example.com" only matches exactly "www.example.com".
Example of searching for a match for "http://mail.example.com/mail/inbox":
  1. First find filters for "mail.example.com", and go to step 2. If that fails, then try again with "example.com", "com" and finally "".
  2. Among the current filters, remove those that have a scheme which is not http.
  3. Among the current filters, remove those that have an exact port number and it not 80;
  4. Among the current filters, remove those that don't have "/mail/inbox" as a prefix of the path;
  5. Pick the filter with the longest path prefix, and apply it. If no such filter exists, go back to step 1 and try the next subdomain.
Some examples:
  • "example.com" blocks all requests to that domain and any subdomain;
  • "http://example.com" blocks all HTTP requests to that domain and any subdomain; Requests with other schemes (such as https, ftp, etc.) are still allowed;
  • "https://*" blocks all HTTPS requests to any domain;
  • "mail.example.com" blocks this domain but not "www.example.com" nor "example.com";
  • ".example.com" blocks exactly "example.com", and won't block subdomains;
  • "*" blocks all requests; only whitelisted URLs will be allowed;
  • "*:8080" blocks all requests to port 8080;
  • "example.com/stuff" blocks all requests to any subdomain of "example.com" that have "/stuff" as a prefix of the path;
  • "192.168.1.2" blocks requests to this exact IP address.
Example: allowing only a small set of sites:
  • Block "*"
  • Allow selected sites: "mail.example.com", "wikipedia.org", "google.com"
Example: block all access to a domain, except to the mail server using HTTPS and to the main page:
  • Block "example.com"
  • Allow "https://mail.example.com"
  • Allow ".example.com", and maybe ".www.example.com"
Comments