What is the best and fastest way to compare two urls?

I have two tables with a list of URLs obtained from different sources.

I want to find shared records and put them in a separate table.

This is what I do:

  • find the md5 hash of the url by extracting them.
  • Save them in a column.
  • I take one table as an array, run a loop through it and insert the values ​​from another table, where the md5 hash is the same.

EDIT: Should I remove the URLs "http: //" and "www".

I want to know any other method that is better and faster using what I can do above.

I am using PHP + MySQL

+3
source share
3 answers

MD5 , . MurmurHash

:

+3

- :

INSERT INTO table3  (SELECT url FROM table1, table2 WHERE table1.hash = table2.hash)

SQL-, URL- table1 table2, 3.

EDIT: URL- (, GET-), , tabel1 table2. http www " https://somesite" " http://somesite", "www.somesite.com" "somesite.com" .

0
SELECT * FROM table1 WHERE hash IN (SELECT hash FROM table2)

You might also want to take a look at the concept of joining tables.

0
source

Source: https://habr.com/ru/post/1736130/


All Articles