I am currently exploring the possibility of using MapReduce to support incremental views in SQL Server.
Basically, use MapReduce to create materialized views.
I'm a little stuck. thinking about how to split my map outputs. Now I really don't have a BigData situation, with a maximum size of 50 GB, but I have a lot of difficulties and associated performance issues. I want to see if my MapReduce / NoSQL approach might disappear.
Regarding MapReduce, I am currently having separation problems. Since I use SQL Server as a data source, the locality of the data is not really a problem, and therefore I do not need to send data everywhere, and every worker should be able to retrieve the data section based on the definition of map .
I intend to fully display the data using LINQ, and perhaps something like the Entity Framework, just to provide a familiar interface, it is somewhat more than that, but it is the current route that I am exploring.
Now, how do I share my data? I have a primary key, I have map and reduce definitions in terms of expression trees (AST, if you are not familiar with LINQ).
First, how do I create a way to split the entire input and split the original problem (I think I will need to use window aggregates in SQL Server such as ROW_NUMBER and TILE ).
Secondly, and more importantly, how can I do this gradually? That is, if I add or make changes to the original problem, how can I effectively minimize the number of recalculations that should take place?
I was looking for CouchDB for inspiration, and they seem to have a way to do this, but how can I take advantage of some of these benefits with SQL Server?
source share