Split url using SQL and add to database

I am trying to split the url and get each part as domain , category , subcategory etc. and insert each part into the table. For instance:

 "www.mydomain.com/toolsanddownloads/dailymealplanner.html?languageid=6" 

The goal is to perform a 404 redirect if the page does not exist. I am trying to write a SQL usinng CTE statement and get every part of the domain

 ;with cte AS ( SELECT CASE WHEN RIGHT(RTRIM(URL),1) = '/' THEN LEFT(URL,LEN(URL)-1) WHEN RIGHT(RTRIM(URL),5) = '.html' THEN LEFT(URL,LEN(URL)-5) ELSE URL END AS URL1, StartPos = CharIndex('//', URL)+2 FROM [dbo].[404RedirectTemp] ) SELECT URL1, SUBSTRING(URL1, 8, CHARINDEX('/', URL1, 9) - 8) AS DomainName, REVERSE(SUBSTRING(REVERSE(URL1), CHARINDEX('?', REVERSE(URL1)) + 1, CHARINDEX('/', REVERSE(URL1)) - CHARINDEX('?', REVERSE(URL1)) -1)) AS CategoryName, SUBSTRING(URL1, CHARINDEX('?', URL1) + 1, LEN(URL1)) AS QueryParameter FROM cte; 

I always get the last bit for the category name and itโ€™s wrong, because some kind of URL is http://www.mydomain.com/toolsanddownloads/dailymealplanner.html?languageid=6

some

 "www.mydomain.com/toolsanddownloads" "www.mydomain.com/toolsanddownloads/dailymealplanner.html" 

What I want to achieve is regardless of how many section URLs I want to get as columns: domain , categories , subcategories , brand , product

If the domain has only categories to receive categories, if categories and subcategories receive subcategories

I have over 4000 URLs in temp table that I want to pass through each of them and update another table for 404 redirect

+4
source share
1 answer

How to convert to strings and process as an array index. For instance:

Allows you to customize the sample environment.

 create table #url (id int, url varchar(500)); insert into #url select 1, 'http://stackoverflow.com/questions/18660573/split-url-using-sql-and-add-to-database'; insert into #url select 2, 'www.mydomain.com/toolsanddownloads'; insert into #url select 3, 'www.mydomain.com/toolsanddownloads?test=2&b=4'; insert into #url select 4, 'www.mydomain.com/toolsanddownloads/dailymealplanner.html' 

Clean the data a bit (problem in temp table to leave only separate logs)

 update #url set url = replace(url, 'http://',''); update #url set url = replace(url, '?','/^'); update #url set url = replace(url, '&','^'); 

now fun stuff

 with rslt as ( SELECT row_number() OVER( partition by id ORDER BY (SELECT 1)) depth , value = yivalue('.', 'nvarchar(4000)') FROM ( SELECT id, x = CONVERT(XML, '<i>' + REPLACE(url, '/', '</i><i>') + '</i>').query('.') from #url ) AS a CROSS APPLY x.nodes('i') AS y(i) ) select case when value like '^%' then 'querystring' when depth= 1 then 'Domain' when depth=2 then 'categories' when depth=3 then 'subcategories' when depth=4 then 'brand' when depth=5 then 'product' end section , case when depth>1 and charindex('.', value)>0 then left(value,charindex('.', value)-1) else value end section from rslt; 

The results are as follows:

 Domain stackoverflow.com categories questions subcategories 18660573 brand split-url-using-sql-and-add-to-database Domain www.mydomain.com categories toolsanddownloads Domain www.mydomain.com categories toolsanddownloads querystring ^test=2^b=4 Domain www.mydomain.com categories toolsanddownloads subcategories dailymealplanner 
+2
source

Source: https://habr.com/ru/post/1500948/


All Articles