Indexing URLs in SQL Server 2005

Question

Indexing URLs in SQL Server 2005

What is the best way to handle a URL for storage and indexing in SQL Server 2005?

I have a WebPage table that stores metadata and content about web pages. I also have many other tables related to the WebPage table. They all use the URL as a key.

The problem with the URL can be very large, and using them as a key makes indexes larger and slower. As far as I don't know, but I read many times, using large fields for indexing should be avoided. Assuming the URL is nvarchar (400), these are huge fields to use as the primary key.

What are the alternatives?

How much pain it will be, perhaps using a URL as a key instead of a small field.

I looked at a WebPage table that has an identity column, and then used this as the primary key for WebPage. This reduces and improves the efficiency of all related indexes, but makes data import a little sick. Each import for linked tables must first check what the URL identifier is before inserting data into tables.

I also played using a hash in the url to create a smaller index, but I'm still not sure if this is the best way to do something. This will not be a unique index and will be subject to a small number of collisions. So I'm not sure if the foreign key will be used in this case ...

There will be millions of web page entries stored in the database, and there will be many batch updates. There will also be quite a few read and aggregate data operations.

Any thoughts?

+3

performance sql-server

Andrew Rimmer 05 . '08 15:54

6

URL- , , RFC . , (Google ).

stackoverflow.com      -> com.stackoverflow  
blog.stackoverflow.com -> com.stackoverflow.blog

Google , , , .

http://en.wikipedia.org/wiki/Uniform_Resource_Locator

+2

jason saldo 05 . '08 18:15

-. .

GUID .

+1

David Robbins 05 . '08 16:00

", URL- - nvarchar (400)"

, URL nvarchar, varchar .

+1

Eyvind 07 . '08 9:22

. IDENTITY GUID WebPage. . id , .

varchar .

0

Jan 07 . '08 8:52

. .

URI , , URI. , ( , ). URI, , - www.somedomain.com/p.aspx?id=123456789, URI , , .

, URI "" , URI "", , "", URI .

0

One monkey Oct 7 '08 at 9:14

source share

Dylan Beattie · Accepted Answer · 2008-10-05T16:17:15+0000

. :

. , URL- .

, , , , , , .

SQL Server 2005 GetUrlId,

CREATE FUNCTION GetUrlId (@Url nvarchar(400)) 
RETURNS int
AS BEGIN
  DECLARE @UrlId int
  SELECT @UrlId = Id FROM Url WHERE Url = @Url
  RETURN @UrlId
END

URL- URL- NULL URL-, . - -

INSERT INTO 
  UrlHistory(UrlId, Visited, RemoteIp) 
VALUES 
  (dbo.GetUrlId('http://www.stackoverflow.com/'), @Visited, @RemoteIp)

, , , , .

Indexing URLs in SQL Server 2005

More articles: