I need a regular expression written for use in gVim that will remove duplicate domains from the list of URLs (gVim can be downloaded here: http://www.vim.org/download.php
I have a list of over 6,000,000 URLs in a .txt file (which opens in gVim for editing).
URLs are in this format:
http://www.example.com/some-url.php
http://example2.com/another_url.html
http://example3.com/
http://www.example4.com/anotherURL.htm
http://www.example.com/some-url2.htm
http://example.com/some-url3.html
http://www.example2.com/somethingelse.php
http://example5.com
In other words, there is no specific format for URLs. Some have WWW, some do not, they all have different formats.
I need a regular expression written for gVim that will remove all DOMAIN duplicates from the list (and the corresponding URL), leaving the first instance it found.
, , , :
http://www.example.com/some-url.php
http://example2.com/another_url.html
http://example3.com/
http://www.example4.com/anotherURL.htm
http://example5.com
, , gVim:
http://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/gvim/gvim.html#Vi-Regular-Expressions
http://www.softpanorama.org/Editors/Vimorama/vim_regular_expressions.shtml