Structuring large sets of SQL strings and using them in .NET.

Take a look at this psuedo diagram (note that this is a simplification, so please try not to comment too much on the "appropriateness" of the diagram itself). Suppose the indices are inside FK.

TABLE Lookup ( Lookup_ID int not null PK Name nvarchar(255) not null ) TABLE Document ( Document_ID int not null PK Previous_ID null FK REFERENCES Document(Document_ID) ) TABLE Document_Lookup ( Document_ID int not null FK REFERENCES Document(Document_ID) Lookup_ID int not null FK REFERENCES Lookup(Lookup_ID) ) 

Volumes: document, 4 million lines, of which 90% have a value of the field Previous_ID zero; Search, 6,000 lines, Medium search queries attached to each document 20, giving Document_Lookup 80 million lines.

Now in the .NET Service there is a structure for representing the Lookup string as follows: -

  struct Lookup { public int ID; public string Name; public List<int> DocumentIDs; } 

and that search strings are stored in Dictionary<int, Lookup> , where the key is the identifier for the search. The important point here is that this dictionary should contain entries in which at least one document refers to the search, i.e. The list of DocumentIDs must have Count> 0.

My task effectively fills this dictionary. So a simple approach: -

  SELECT dl.Lookup_ID, l.Name, dl.Document_ID FROM Document_Lookup dl INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID WHERE d.Previous_ID IS NULL ORDER BY dl.Lookup_ID, dl.Document_ID 

Then it could be used to fill out the dictionary quite efficiently.

Question : Does the optimization of the basic rowset (TDS?) Perform? It seems to me that queries that de-normalize data are very common, so the possibility that field values ​​do not change from one row to another is high, so it would be advisable to optimize the stream without sending field values, t has been changed. Does anyone know if such wholesale exists? (Optimize does not exist).

What more complex query can be used to eliminate duplication (I think I’m specifically repeating the name value)? I heard what a "nested row set" is, can this be created? Will it be more revealing? How to access it in .NET?

I would fulfill two queries; one to populate the Search Dictionary, and then a second to populate the list. Then I would add code to knock out the unused Lookup. However, imagine that my predictions are wrong, and Lookup turned out to be 1 million lines with only a quarter that any document refers to?

+6
source share
2 answers
  • As long as the names are relatively short in practice, optimization may not be necessary.

  • The simplest optimization is to split it into two queries, one to get the names, and the other to get the Document_ID list. (maybe in a different order if this makes it easier to populate your data structures).

Example:

 /*First get the name of the Lookup*/ select distinct dl.Lookup_ID, l.Name FROM Document_Lookup dl INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID WHERE d.Previous_ID IS NULL ORDER BY dl.Lookup_ID, dl.Document_ID /*Now get the list of Document_IDs for each*/ SELECT dl.Lookup_ID, dl.Document_ID FROM Document_Lookup dl INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID WHERE d.Previous_ID IS NULL ORDER BY dl.Lookup_ID, dl.Document_ID 
  • There are also various tricks that you can use to massage them into a single table, but I suppose it's not worth it.

  • The heroic ranks you are thinking of are the MSDASHAPE OLEDB provider. They can do what you offer, but they would limit you to using the OLEDB provider for SQL, which may not be what you want.

  • Finally, consider thorough XML

For instance:

 select l.lookup_ID as "@l", l.name as "@n", ( select dl.Document_ID as "node()", ' ' as "node()" from Document_Lookup dl where dl.lookup_ID = l.lookup_ID for xml path(''), type ) as "*" from Lookup l where l.lookup_ID in (select dl.lookup_ID from Document_Lookup dl) for xml path('dl') 

returns:

 <dl l="1" n="One">1 2 </dl> <dl l="2" n="Two">2 </dl> 
+1
source

When you ask about "nested row sets", do you refer to the use of the DbDataReader.NextResult () method?

if your query has two "exits" (two select statements that return separate result objects), you can loop through the first using DbDataReader.Next (), and when it returns false, you can call DbDataReader.NextResult () and then use DbDataReader.Next () again to continue.

 var reader = cmd.ExecuteReader(); while(reader.Read()){ // load data } if(reader.NextResult()){ while(reader.Read()){ // lookup record from first result // load data from second result } } 

I did this often to reduce duplicate data in a similar situation, and it works very well:

 SELECT * FROM tableA WHERE [condition] SELECT * FROM tableB WHERE EXISTS (SELECT * FROM tableA WHERE [condition] AND tableB.FK = tableA.PK) 

Disclaimer: I have not tried this with a set of results as much as possible.

The disadvantage of this is that you need to map the second set of results to the first using a hash table or list of orders.

0
source

Source: https://habr.com/ru/post/902107/


All Articles