I originally asked this question: Regular expression in gVim to remove duplicate domains from a list
However, I understand that I am more likely to find a working solution if I "expand the scope of my business" in terms of what decision I want to make.
So, I rephrase my question and maybe I will get a better solution ... here goes:
I have a large list of URLs in a TXT file (I use Windows Vista 32bit), and I need to remove DOMAINS duplicates (and all the corresponding URL for each duplicate), leaving after each domain appears for the first time, This particular file contains approximately 6,000,000 URLs in the following format (the URLs obviously have no place in them, I just had to do this because I donβt have enough messages to publish a lot of βliveβ URLs):
http://www.exampleurl.com/something.php
http://exampleurl.com/somethingelse.htm
http://exampleurl2.com/another-url
http://www.exampleurl2.com/a-url.htm
http://exampleurl2.com/yet-another-url.html
http://exampleurl.com/
http://www.exampleurl3.com/here_is_a_url
http://www.exampleurl5.com/something
, :
http://www.exampleurl.com/something.php
http://exampleurl2.com/another-url
http://www.exampleurl3.com/here_is_a_url
http://www.exampleurl5.com/something
, , .
- , - , , .
, , -, Windows, , , Windows, " ", ( - ).