I would like to use PCRE to take a list of URIs and separate it.
Start
http://abcd.tld/products/widget1 http://abcd.tld/products/widget2 http://abcd.tld/products/review http://1234.tld/
Done
http://abcd.tld/products/widget1 http://1234.tld/
Any ideas, dear StackOverflow members?
You can use simple tools like uniq .
See the kobi example in the comments:
grep -o "^[^/]*//[^/]*/" urls.txt | sort | uniq
While INSANELY is inefficient, this can be done ...
(?<!^http://\2/.*?$.*)^(http://(.*?)/.*?$)
Please do not use this
URI, . URL-, , .
Ruby:
require 'uri' unique_links = {} links.each do |l| u = URI.parse(l) unique_links[u.host] = l end unique_links.values # returns an Array of the unique links
If you can work with the whole file as one line, and not from line to line, then why this should not be like this work. (I'm not sure about char ranges.)
s!(\w+://[a-zA-Z0-9.]+/\S+/)([^ /]+)\n(\1[^ /]+\n)+!\1\2!
if you have (g) awk on your system
awk -F"/" '{ s=$1 for(i=2;i<NF;i++){ s=s"/"$i } if( !(s in a) ){ a[s]=$NF } } END{ for(i in a) print i"/"a[i] } ' file
Output
$ ./shell.sh http://abcd.tld/products/widget1 http://1234.tld/
Source: https://habr.com/ru/post/1771040/More articles:Python list: how to read the previous item when used for a loop? - pythonGVim regular expression to remove duplicate domains from list - windowsWhat is the variable ARGV in ruby? - command-linehide status bar - iphoneWhat is the best technology to become a mobile C # or java developer? - javaвозможен фильтр TClientDataset с использованием нечувствительного случая? - delphiSpring best practice: deploying a non-bean component - javaA good way to keep the number of elements between braces? - c #Access HashMap Using Struts 2 - hashmapYourKit in production - javaAll Articles