How to formulate IQueryable to query a recursive database table? I would like it to be one IQueryable, so there will be only one query to the database.
I do not think that in the Entity Framework it is currently possible to move a hierarchical table using a single translated query. The reason is because you need to implement a loop or recursion, and as far as I know, none of them can be translated into an EF object store request.
UPDATE
@Bazzz and @Steven made me think, and I must admit that I was completely wrong: it is possible and quite easy to build IQueryable for these requirements.
The following function can be called recursively to build a query:
public static IQueryable<TestTree> Traverse(this IQueryable<TestTree> source, IQueryable<TestTree> table, LinkedList<string> parts) { var code = parts.First.Value; var query = source.SelectMany(r1 => table.Where(r2 => r2.Code == code && r2.ParentID == r1.ID), (r1, r2) => r2); if (parts.Count == 1) { return query; } parts.RemoveFirst(); return query.Traverse(table, parts); }
The root query is a special case; here is a working example of calling Traverse :
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var parts = new LinkedList<string>(path.Split('/')); var table = context.TestTrees; var code = parts.First.Value; var root = table.Where(r1 => r1.Code == code && !r1.ParentID.HasValue); parts.RemoveFirst(); foreach (var q in root.Traverse(table, parts)) Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
The database is requested only once using this generated code:
exec sp_executesql N'SELECT [Extent3].[ID] AS [ID], [Extent3].[ParentID] AS [ParentID], [Extent3].[Code] AS [Code] FROM [dbo].[TestTree] AS [Extent1] INNER JOIN [dbo].[TestTree] AS [Extent2] ON ([Extent2].[Code] = @p__linq__1) AND ([Extent2].[ParentID] = [Extent1].[ID]) INNER JOIN [dbo].[TestTree] AS [Extent3] ON ([Extent3].[Code] = @p__linq__2) AND ([Extent3].[ParentID] = [Extent2].[ID]) WHERE ([Extent1].[Code] = @p__linq__0) AND ([Extent1].[ParentID] IS NULL)',N'@p__linq__1 nvarchar(4000),@p__linq__2 nvarchar(4000),@p__linq__0 nvarchar(4000)',@p__linq__1=N'bar',@p__linq__2=N'baz',@p__linq__0=N'foo'
And although I like the execution plan for the raw request (see below) a little better, the approach is really and possibly useful.
End UPDATE
Using IEnumerable
The idea is to grab the relevant data from the table at a time, and then do a move in the application using LINQ to Objects.
Here's a recursive function that will get a node from the sequence:
static TestTree GetNode(this IEnumerable<TestTree> table, string[] parts, int index, int? parentID) { var q = table .Where(r => r.Code == parts[index] && (r.ParentID.HasValue ? r.ParentID == parentID : parentID == null)) .Single(); return index < parts.Length - 1 ? table.GetNode(parts, index + 1, q.ID) : q; }
You can use the following:
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.TestTrees.GetNode(path.Split('/'), 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
This will execute one database query for each part of the path, so if you want the database to be requested only once, use this instead:
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.TestTrees .ToList() .GetNode(path.Split('/'), 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
The obvious optimization is to exclude codes that are not in our path before passing:
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var parts = path.Split('/'); var q = context .TestTrees .Where(r => parts.Any(p => p == r.Code)) .ToList() .GetNode(parts, 0, null); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
This request should be fast enough if most of your objects do not have similar codes. However, if you absolutely need maximum performance, you can use raw requests.
SQL Server Raw Query
For SQL Server, a CTE-based query is likely to be the best:
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.Database.SqlQuery<TestTree>(@" WITH Tree(ID, ParentID, Code, TreePath) AS ( SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath FROM dbo.TestTree WHERE ParentID IS NULL UNION ALL SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512)) FROM dbo.TestTree INNER JOIN Tree ON Tree.ID = TestTree.ParentID ) SELECT * FROM Tree WHERE TreePath = @path", new SqlParameter("path", path)).Single(); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
Limiting data to the root of a node is easy and can be very useful in terms of performance:
using (var context = new TestDBEntities()) { var path = "foo/bar/baz"; var q = context.Database.SqlQuery<TestTree>(@" WITH Tree(ID, ParentID, Code, TreePath) AS ( SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath FROM dbo.TestTree WHERE ParentID IS NULL AND Code = @parentCode UNION ALL SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512)) FROM dbo.TestTree INNER JOIN Tree ON Tree.ID = TestTree.ParentID ) SELECT * FROM Tree WHERE TreePath = @path", new SqlParameter("path", path), new SqlParameter("parentCode", path.Split('/')[0])) .Single(); Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code); }
Footnote
All this has been tested with .NET 4.5, EF 5, SQL Server 2012. Data configuration script:
CREATE TABLE dbo.TestTree ( ID int not null IDENTITY PRIMARY KEY, ParentID int null REFERENCES dbo.TestTree (ID), Code nvarchar(100) ) GO INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'foo') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'bar') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'baz') INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'bla') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'blu') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'blo') INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'baz') INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'foo') INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'bar')
All examples in my test returned a "baz" object with identifier 3. He suggested that the entity actually exists. Error handling is beyond the scope of this publication.
UPDATE
To address the @Bazzz comment, the path data is shown below. The code is unique in level, not worldwide.
ID ParentID Code TreePath ---- ----------- --------- ------------------- 1 NULL foo foo 4 NULL bla bla 7 NULL baz baz 2 1 bar foo/bar 5 1 blu foo/blu 8 1 foo foo/foo 3 2 baz foo/bar/baz 6 2 blo foo/bar/blo 9 2 bar foo/bar/bar