URL Blocklist filter format
The format of filters for the URLBlocklist and URLAllowlist policies, as of Chrome 52, is:
- Scheme can be http, https, ftp, chrome, etc. This field is optional, and must be followed by '://'.
- An optional '.' (dot) can prefix the host field to disable subdomain matching, see below for details.
- The host field is required, and is a valid hostname or an IP address. It can also take the special '*' value, see below for details.
- An optional port can come after the host. It must be a valid port value from 1 to 65535.
- An optional path can come after port. Any string can be used here.
- An optional query can come in the end, which is a set of key-value and key-only tokens delimited by '&'. The key-value tokens are separated by '='. A query token can optionally end with a '*' to indicate prefix match. Token order is ignored during matching.
The format is very similar to the URL format, with some exceptions:
- user:pass fields can be included but will be ignored (e.g. http://user:email@example.com/pub/bigfile.iso).
- If a reference separator '#' is present, it is ignored along with everything that appears after it.
- The host can be '*'. It can also have a '.' as a prefix.
- The host can have a '/' or '.' as suffix. If it is the case, that suffix is ignored.
The filter selected for a URL is the most specific match found:
- First, the filters with the longest host match will be selected;
- Among these, filters with a non-matching scheme or port are discarded;
- Among these, the filter with the longest matching path is selected;
- Among these, the filter with the longest set of query tokens are selected;
- If no valid filter is left at step 3, the host is reduced by removing the left-most subdomain, and trying again from step 1;
- If a filter is available at step 3, its decision (block or allow) is enforced. If no filter ever matches, the default is to allow the request.
The special '*' host will be the last searched, and matches all hosts. When both a blocklist and allowlist filter apply at step 4 (with the same path length and number of query tokens), the allowlist filter takes precedence. If a filter has a '.' (dot) prefixing the host, only exact host matches will be filtered:
- "example.com" matches "example.com", "www.example.com" and "sub.www.example.com";
- ".www.example.com" only matches exactly "www.example.com".
- The patterns "custom://*" or "custom:*" are valid and match "custom:app".
- The patterns "custom:app" or "custom://app" are invalid.
The scheme and as of Chrome 52 the host are case insensitive, while path and query are case sensitive.
- "http://example.com" matches "HTTP://Example.com", "http://example.COM" and "http://example.com";
- "http://example.com/path?query=1" doesn't match "http://example.com/path?Query=1", "http://example.com/Path?query=1" but matches "http://Example.com/path?query=1";
Example of searching for a match for "http://mail.example.com/mail/inbox":
- First find filters for "mail.example.com", and go to step 2. If that fails, then try again with "example.com", "com" and finally "".
- Among the current filters, remove those that have a scheme which is not http.
- Among the current filters, remove those that have an exact port number and it not 80;
- Among the current filters, remove those that don't have "/mail/inbox" as a prefix of the path;
- Pick the filter with the longest path prefix, and apply it. If no such filter exists, go back to step 1 and try the next subdomain.
- "example.com" blocks all requests to that domain and any subdomain;
- "http://example.com" blocks all HTTP requests to that domain and any subdomain; Requests with other schemes (such as https, ftp, etc.) are still allowed;
- "https://*" blocks all HTTPS requests to any domain;
- "mail.example.com" blocks this domain but not "www.example.com" nor "example.com";
- ".example.com" blocks exactly "example.com", and won't block subdomains;
- "*" blocks all requests; only allowlisted URLs will be allowed;
- "*:8080" blocks all requests to port 8080;
- "example.com/stuff" blocks all requests to any subdomain of "example.com" that have "/stuff" as a prefix of the path;
- "192.168.1.2" blocks requests to this exact IP address;
- Any request with the query "?video=100" is blocked by "*?v*", "*?video*", "*?video=*" and "*?video=100*";
- "*?a=1&b=2" blocks any request with the query "?b=2&a=1", "?a=1&b=2", "?a=1&c=3&b=2", ...;
- For a blocklist any occurrence of the key-value pair is sufficient, i.e., blocklisting "youtube.com/watch?v=xyz" would block "youtube.com/watch?v=123&v=xyz".
- For an allowlist every occurrence of the key should have a matching value, i.e., allowlisting "youtube.com/watch?v=V2" does not allow "youtube.com/watch?v=V1&v=V2", it allows "youtube.com/watch?v=V2&v=V2" though.
Example: allowing only a small set of sites:
- Block "*"
- Allow selected sites: "mail.example.com", "wikipedia.org", "google.com"
Example: block all access to a domain, except to the mail server using HTTPS and to the main page:
- Block "example.com"
- Allow "https://mail.example.com"
- Allow ".example.com", and maybe ".www.example.com"
Example: block all access to youtube, except for selected videos.
- Block "youtube.com"
- Allow "youtube.com/watch?v=V1"
- Allow "youtube.com/watch?v=V2"