Incorrect case sensitivity for BigQuery domain

When using a BigQuery query with data containing URLs, we noticed that the function DOMAINdoes not behave as it does with the URL.

This can be demonstrated with this simple query:

SELECT
    domain('WWW.FOO.COM.AU'),
    domain(LOWER('http://WWW.FOO.COM.AU/')),
    domain('http://WWW.FOO.COM.AU/')

The result of the uppercase URL does not seem correct, and the documentation does not mention anything about the case in the URLs.

Query result

+4
source share
1 answer

DOMAIN ( URL- SQL), , . standard SQL ( " SQL" "" ), . fooobar.com/questions/548930/... ,

CREATE TEMPORARY FUNCTION GetDomain(url STRING) AS (
  REGEXP_EXTRACT(url, r'^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)'));

WITH T AS (
  SELECT url
  FROM UNNEST(['WWW.FOO.COM.AU:8080', 'google.com',
               'www.abc.xyz', 'http://example.com']) AS url)
SELECT
  url,
  GetDomain(url) AS domain
FROM T;

+---------------------+----------------+
|         url         |     domain     |
+---------------------+----------------+
| www.abc.xyz         | abc.xyz        |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com          | google.com     |
| http://example.com  | example.com    |
+---------------------+----------------+
+1

Source: https://habr.com/ru/post/1649341/


All Articles