How to formulate IQueryable to query a recursive database table?

I have a database table as follows:

Entity --------------------- ID int PK ParentID int FK Code varchar Text text 

The ParentID field is a foreign key with a different record in the same table (recursive). Thus, the structure represents the Tree.

I am trying to write a method to query this table and get 1 specific entity based on the path. The path will be a string representing the Code properties of the object and the parent objects. Thus, the approximate path would be "foo/bar/baz" , which means one particular Entity, of which Code == "baz" , the parent Code == "bar" and the parent element of the parent Code == "foo" .

My attempt:

 public Entity Single(string path) { string[] pathParts = path.Split('/'); string code = pathParts[pathParts.Length -1]; if (pathParts.Length == 1) return dataContext.Entities.Single(e => e.Code == code && e.ParentID == 0); IQueryable<Entity> entities = dataContext.Entities.Where(e => e.Code == code); for (int i = pathParts.Length - 2; i >= 0; i--) { string parentCode = pathParts[i]; entities = entities.Where(e => e.Entity1.Code == parentCode); // incorrect } return entities.Single(); } 

I know this is wrong because Where inside the for loop just adds more conditions to the current Entity instead of the parent Entity , but how can I fix this? In words, I would like for-loop to say "and the parent code must be x, and the parent of this parent code must be y, and the parent of this parent of this parent code must be z .... etc." . Also, for performance reasons, I would like it to be one IQueryable, so there will be only one query in the database.

+4
source share
3 answers

How to formulate IQueryable to query a recursive database table? I would like it to be one IQueryable, so there will be only one query to the database.

I do not think that in the Entity Framework it is currently possible to move a hierarchical table using a single translated query. The reason is because you need to implement a loop or recursion, and as far as I know, none of them can be translated into an EF object store request.

UPDATE

@Bazzz and @Steven made me think, and I must admit that I was completely wrong: it is possible and quite easy to build IQueryable for these requirements.

The following function can be called recursively to build a query:

 public static IQueryable<TestTree> Traverse(this IQueryable<TestTree> source, IQueryable<TestTree> table, LinkedList<string> parts) { var code = parts.First.Value; var query = source.SelectMany(r1 => table.Where(r2 => r2.Code == code && r2.ParentID == r1.ID), (r1, r2) => r2); if (parts.Count == 1) { return query; } parts.RemoveFirst(); return query.Traverse(table, parts); } 

The root query is a special case; here is a working example of calling Traverse :

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var parts = new LinkedList<string>(path.Split('/')); var table = context.TestTrees; var code = parts.First.Value; var root = table.Where(r1 => r1.Code == code && !r1.ParentID.HasValue); parts.RemoveFirst(); foreach (var q in root.Traverse(table, parts)) Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

The database is requested only once using this generated code:

 exec sp_executesql N'SELECT [Extent3].[ID] AS [ID], [Extent3].[ParentID] AS [ParentID], [Extent3].[Code] AS [Code] FROM [dbo].[TestTree] AS [Extent1] INNER JOIN [dbo].[TestTree] AS [Extent2] ON ([Extent2].[Code] = @p__linq__1) AND ([Extent2].[ParentID] = [Extent1].[ID]) INNER JOIN [dbo].[TestTree] AS [Extent3] ON ([Extent3].[Code] = @p__linq__2) AND ([Extent3].[ParentID] = [Extent2].[ID]) WHERE ([Extent1].[Code] = @p__linq__0) AND ([Extent1].[ParentID] IS NULL)',N'@p__linq__1 nvarchar(4000),@p__linq__2 nvarchar(4000),@p__linq__0 nvarchar(4000)',@p__linq__1=N'bar',@p__linq__2=N'baz',@p__linq__0=N'foo' 

And although I like the execution plan for the raw request (see below) a little better, the approach is really and possibly useful.

End UPDATE

Using IEnumerable

The idea is to grab the relevant data from the table at a time, and then do a move in the application using LINQ to Objects.

Here's a recursive function that will get a node from the sequence:

 static TestTree GetNode(this IEnumerable<TestTree> table, string[] parts, int index, int? parentID) { var q = table .Where(r => r.Code == parts[index] && (r.ParentID.HasValue ? r.ParentID == parentID : parentID == null)) .Single(); return index < parts.Length - 1 ? table.GetNode(parts, index + 1, q.ID) : q; } 

You can use the following:

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.TestTrees.GetNode(path.Split('/'), 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

This will execute one database query for each part of the path, so if you want the database to be requested only once, use this instead:

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.TestTrees .ToList() .GetNode(path.Split('/'), 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

The obvious optimization is to exclude codes that are not in our path before passing:

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var parts = path.Split('/'); var q = context .TestTrees .Where(r => parts.Any(p => p == r.Code)) .ToList() .GetNode(parts, 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

This request should be fast enough if most of your objects do not have similar codes. However, if you absolutely need maximum performance, you can use raw requests.

SQL Server Raw Query

For SQL Server, a CTE-based query is likely to be the best:

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.Database.SqlQuery<TestTree>(@" WITH Tree(ID, ParentID, Code, TreePath) AS ( SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath FROM dbo.TestTree WHERE ParentID IS NULL UNION ALL SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512)) FROM dbo.TestTree INNER JOIN Tree ON Tree.ID = TestTree.ParentID ) SELECT * FROM Tree WHERE TreePath = @path", new SqlParameter("path", path)).Single(); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

Limiting data to the root of a node is easy and can be very useful in terms of performance:

 using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.Database.SqlQuery<TestTree>(@" WITH Tree(ID, ParentID, Code, TreePath) AS ( SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath FROM dbo.TestTree WHERE ParentID IS NULL AND Code = @parentCode UNION ALL SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512)) FROM dbo.TestTree INNER JOIN Tree ON Tree.ID = TestTree.ParentID ) SELECT * FROM Tree WHERE TreePath = @path", new SqlParameter("path", path), new SqlParameter("parentCode", path.Split('/')[0])) .Single(); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); } 

Footnote

All this has been tested with .NET 4.5, EF 5, SQL Server 2012. Data configuration script:

 CREATE TABLE dbo.TestTree ( ID int not null IDENTITY PRIMARY KEY, ParentID int null REFERENCES dbo.TestTree (ID), Code nvarchar(100) ) GO INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'foo') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'bar') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'baz') INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'bla') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'blu') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'blo') INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'baz') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'foo') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'bar') 

All examples in my test returned a "baz" object with identifier 3. He suggested that the entity actually exists. Error handling is beyond the scope of this publication.

UPDATE

To address the @Bazzz comment, the path data is shown below. The code is unique in level, not worldwide.

 ID ParentID Code TreePath ---- ----------- --------- ------------------- 1 NULL foo foo 4 NULL bla bla 7 NULL baz baz 2 1 bar foo/bar 5 1 blu foo/blu 8 1 foo foo/foo 3 2 baz foo/bar/baz 6 2 blo foo/bar/blo 9 2 bar foo/bar/bar 
+4
source

The trick is to do it the other way around and create the following query:

 from entity in dataContext.Entities where entity.Code == "baz" where entity.Parent.Code == "bar" where entity.Parent.Parent.Code == "foo" where entity.Parent.Parent.ParentID == 0 select entity; 

A slightly naive (hard-coded) solution would look like this:

 var pathParts = path.Split('/').ToList(); var entities = from entity in dataContext.Entities select entity; pathParts.Reverse(); for (int index = 0; index < pathParts.Count+ index++) { string pathPart = pathParts[index]; switch (index) { case 0: entities = entities.Where( entity.Code == pathPart); break; case 1: entities = entities.Where( entity.Parent.Code == pathPart); break; case 2: entities = entities.Where(entity.Parent.Parent.Code == pathPart); break; case 3: entities = entities.Where( entity.Parent.Parent.Parent.Code == pathPart); break; default: throw new NotSupportedException(); } } 

Doing this dynamically by building expression trees is not trivial, but can be done by carefully examining what the C # compiler generates (for example, using ILDasm or Reflector). Here is an example:

 private static Entity GetEntityByPath(DataContext dataContext, string path) { List<string> pathParts = path.Split(new char[] { '/' }).ToList<string>(); pathParts.Reverse(); var entities = from entity in dataContext.Entities select entity; // Build up a template expression that will be used to create the real expressions with. Expression<Func<Entity, bool>> templateExpression = entity => entity.Code == "dummy"; var equals = (BinaryExpression)templateExpression.Body; var property = (MemberExpression)equals.Left; ParameterExpression entityParameter = Expression.Parameter(typeof(Entity), "entity"); for (int index = 0; index < pathParts.Count; index++) { string pathPart = pathParts[index]; var entityFilterExpression = Expression.Lambda<Func<Entity, bool>>( Expression.Equal( Expression.Property( BuildParentPropertiesExpression(index, entityParameter), (MethodInfo)property.Member), Expression.Constant(pathPart), equals.IsLiftedToNull, equals.Method), templateExpression.Parameters); entities = entities.Where<Entity>(entityFilterExpression); // TODO: The entity.Parent.Parent.ParentID == 0 part is missing here. } return entities.Single<Entity>(); } private static Expression BuildParentPropertiesExpression(int numberOfParents, ParameterExpression entityParameter) { if (numberOfParents == 0) { return entityParameter; } var getParentMethod = typeof(Entity).GetProperty("Parent").GetGetMethod(); var property = Expression.Property(entityParameter, getParentMethod); for (int count = 2; count <= numberOfParents; count++) { property = Expression.Property(property, getParentMethod); } return property; } 
+3
source

You need a recursive function instead of your loop. Something like this should do the job:

 public EntityTable Single(string path) { List<string> pathParts = path.Split('/').ToList(); string code = pathParts.Last(); var entities = dataContext.EntityTables.Where(e => e.Code == code); pathParts.RemoveAt(pathParts.Count - 1); return GetRecursively(entities, pathParts); } private EntityTable GetRecursively(IQueryable<EntityTable> entity, List<string> pathParts) { if (!(entity == null || pathParts.Count == 0)) { string code = pathParts.Last(); if (pathParts.Count == 1) { return entity.Where(x => x.EntityTable1.Code == code && x.ParentId == x.Id).FirstOrDefault(); } else { pathParts.RemoveAt(pathParts.Count - 1); return this.GetRecursively(entity.Where(x => x.EntityTable1.Code == code), pathParts); } } else { return null; } } 

As you can see, I'm just returning the final parent node. If you would like to get a list of all EntityTable objects, I would force the recursive method to return the list of identifiers of the found nodes, and in the end, in the Single (...) method, start a simple LINQ query to get your IQueryable object using this list of identifiers.

Edit: I tried to complete your task, but I think there is a fundamental problem: there are times when you cannot define one path. For example, you have two templates "foo / bar / baz" and "foo / bar / baz / bak", where the "baz" objects are different. If you search for the path "foo / bar / baz", you will always find two matching patches (one of them will be partial for the path with four entities). Although you can correctly get your "baz" object, it is too confusing, and I would just rethink this: either put a unique constraint so that each object can be used only once, or store the full path in the "Code" column.

+1
source

Source: https://habr.com/ru/post/1446624/


All Articles