How to create a faxed search using SQL Server

I have an application that I will access the SQL server to return data that has been filtered by selecting from the application, like any general facet search. I saw some of the box solutions, but they are expensive, and I prefer to build something custom, but I just don’t know where to start.

The database structure looks like this: enter image description here

Data from the PRODUCT table will look for tags from the TAG table. The values ​​that will be found in the TAG table will be something like this:

ID NAME ---------------------- 1 Blue 2 Green 3 Small 4 Large 5 Red 

They will be associated with products through the ProductTag table.

I need to return two groups of data from this installation:

  • Products related to selected tags only, whether single or multiple
  • The remaining tags, which are also available for selecting products that have already been refined by one or more selected tags.

I would like it to be all with SQL Server, if possible 2 separately as stored procedures.

Most websites have this feature built in at this time, that is: http://www.gnc.com/family/index.jsp?categoryId=2108294&cp=3593186.3593187 (They called it “Narrow”)

I was looking for some time how to do this, and I assume that if the stored procedure should be created in this way, there must be one parameter that takes CSV values, for example this:

  [dbo].[GetFacetedProducts] @Tags_Selected = '1,3,5' [dbo].[GetFacetedTags] @Tags_Selected = '1,3,5' 

So, with this architecture, does anyone know what types of queries should be written for these stored procedures or any kind of architecture error? Has anyone created a faceted search before this was so? If so, what types of queries will be required to do something like this? I guess I’m just having problems enveloping his head, and there aren’t so many that show someone how to do something like this.

+4
source share
3 answers

In other places where you can get examples of turning a CSV parameter into a table variable. Assuming you have completed this part, your query comes down to the following:

GetFacetedProducts: Find product records where all tags passed to are assigned to each product.

If you wrote this manually, you can end up with:

 SELECT P.* FROM Product P INNER JOIN ProductTag PT1 ON PT1.ProductID = P.ID AND PT1.TagID = 1 INNER JOIN ProductTag PT2 ON PT1.ProductID = P.ID AND PT1.TagID = 3 INNER JOIN ProductTag PT3 ON PT1.ProductID = P.ID AND PT1.TagID = 5 

While this selects only those products that have tags, it will not work with a dynamic list. In the past, some people created SQL and executed it dynamically, do not do this.

Instead, suppose that the same tag cannot be applied to the product twice, so we can change our question: Find products in which the number of tag matches (dynamic list) is equal to the number of tags in the dynamic list

 DECLARE @selectedTags TABLE (ID int) DECLARE @tagCount int INSERT INTO @selectedTags VALUES (1) INSERT INTO @selectedTags VALUES (3) INSERT INTO @selectedTags VALUES (5) SELECT @tagCount = COUNT(*) FROM @selectedTags SELECT P.ID FROM Product P JOIN ProductTag PT ON PT.ProductID = P.ID JOIN @selectedTags T ON T.ID = PT.TagID GROUP BY P.ID, P.Name HAVING COUNT(PT.TagID) = @tagCount 

This returns only the product identifier matching all your tags, after which you can join it back to the product table if you want more than just an identifier, otherwise you are done.

As in the second query, if you have product identifiers that match, you need a list of all tags for those product identifiers that are not listed on your list:

 SELECT DISTINCT PT2.TagID FROM aProductTag PT2 WHERE PT2.ProductID IN ( SELECT P.ID FROM aProduct P JOIN aProductTag PT ON PT.ProductID = P.ID JOIN @selectedTags T ON T.ID = PT.TagID GROUP BY P.ID, P.Name HAVING COUNT(PT.TagID) = @tagCount ) AND PT2.TagID NOT IN (SELECT ID FROM @selectedTags) 
+2
source

RDBMS for use in faceted searches is the wrong tool to work with. Boundary search is a multidimensional search that is difficult to express in a set-based SQL language. Using a data cube or the like may give you some of the desired functionality, but there will be quite a bit of work to create.

When we faced similar requirements, we eventually decided to use the Apache Solr search engine, which supports polishing, as well as many other search-oriented functions and functions.

+2
source

You can perform a fax search in SQL Server. However, do not try to use the data tables of your live products. Instead, create a de-normalized fact table that contains each product (rows) and each tag (columns) so that the intersection is your product tag value. You can refill it periodically from your main product table.

Now it’s easy and relatively efficient to get facet calculations for matching records for each tag that the user checks.

The approach I described is great for small cases, for example. 1000 lines of goods and 50-100 tags (attributes). There is also an interesting feature with the upcoming SQL Server 2014, which can put tables in memory - this should allow much larger fact tables.

I also used Solr, and as STW points out, this is the “right” tool for finding facets. This is an order of magnitude faster than SQL Server.

However, there are some serious drawbacks to using Solr. The main problem is that you need to configure not only another platform (Solr), but all the attributes that come with it - Java and some kind of Java servlet (of which there are several). Although Solr works pretty well on Windows, you will soon plunge into the world of command lines and edit configuration files and environment variables that will remind you of everything that was wonderful in the 1980s ... or maybe not. And when all this works, you need to export your product data to it using various methods - there is a SQL Server connector that works quite well, but many prefer to publish the data as XML. And then you need to create a process of type webservice in your application in order to send it a user request and analyze the list of matches and recount it back into your application (again, XML is probably the best method).

So, if your dataset is relatively small, I will stick with SQL Server. You can still get a second answer, and SQL 2014 will hopefully allow a lot more datasets. If your data set is large, then Solr will give very fast results (this is very fast), but be prepared to make large investments in training and supporting a completely new platform.

0
source

Source: https://habr.com/ru/post/1498100/


All Articles