Are Mondrian / OLAP tools for combining large sizes / sets compatible?

Summary. Most of the examples I've seen in MDX connections are related to relatively small sets, for example, tens or hundreds of elements. But I am also in the fact that I want to try to join (in particular, "non-empty connection") sets, each of which contains thousands or tens of thousands of units, and so far it does not work. I am wondering if this can be done, or if I might have to use something other than Mondrian / OLAP.

To be specific, I have a cube that registers interactions between Firms (n = 7000) and Clients (n = 27000). At present, both the Company and the Client are completely flat hierarchies; there is a level of "All" and "individual", without other levels. There is a table of basic facts and separate size tables for companies and customers.

My users, at least, want to receive summary reports on these lines, combining all non-empty interactions between Firms and Clients:

select [Measures].[Amount] on columns, NonEmptyCrossJoin([Firm].Children, [Client].Children) on rows from MyCube 

But this request and its variations do not work in my Mondrian test setup. Either I get an OutOfMemoryException (on a 2GB Java heap), or Java seems to spend an incredibly long time in the file mondrian.rolap.RolapResult $ AxisMember.mergeTuple (TupleCursor). (I could provide a more complete stack trace if that would help.) “Impossibly long” I mean that Java will be distracted by the request for hours and hours before I give up.

At first, I expected that the above query will be executed normally, because conceptually this can be done somewhat efficiently, simply by executing the SQL query in these lines:

 select Firm, Client, Sum(Amount) as n from fact, firm, client where fact.firmid = firm.firmid and fact.clientid = client.clientid group by Firm, Client 

(In fact, if I execute something similar directly in MySql, it does not take more than 15 seconds to execute).

But from the debug logs, Mondrian does not seem to be trying this optimization. Instead, it seems to make the connection inside, and in a way that is especially slow. I set mondrian.native.crossjoin.enable = true in my mondrian.properties files, but this is not like one of the types of connections that Mondrian can "make native". (If I turn on mondrian.native.unsupported.alert = ERROR, then I will get the corresponding exception.)

I have to wonder if I need to prevent users from trying to join such large sizes / sets, or maybe Mondrian is not the tool I'm looking for here. But maybe I'm just doing something wrong.

+6
source share
4 answers

To continue, I tried to configure a similar cube in Sql Server Analysis Services (Sql Server 2008), and it seems that icCube makes sense to use different OLAP tools:

Even before I learned a lot about the best SSAS methods, performance on this type of MDX was greatly improved. Query in these lines

 select [Measures].[Amount] on columns, NON EMPTY crossjoin([Firms].[Firm Name].Children, [Clients].[Client Name].Children) on rows from MyCube 

ceased to be unviable with Mondrian and took about ten seconds under Sql Server. It’s clear that this is due to MS 'Business Intelligence Development Studio, which directs me to create the default MOLAP cube, or, perhaps, SSAS has a more intelligent query scheduler.

Anyway, maybe it's fast enough for me. If not, I'm still not sure how much more optimized SSAS can get in this case. (One disappointing thing is that even when I re-run the request a second time, it still takes about 10 seconds, I was hoping that caching could have a more dramatic effect.)

Tangentially, you may notice that in the just-mentioned MDX, I replaced my original NonEmptyCrossJoin with the usual cross join in combination with NON EMPTY. This is because, at least in the Sql server world, NonEmptyCrossJoin seems to be considered obsolete bad practice. (This is noted in the Microsoft Language Language Reference . Mosh, one of the former SSAS developers, describes the situation in an article called MDX: NonEmpty, Exists and evil NonEmptyCrossJoin . The short version is that NonEmptyCrossJoin confuses semantics and a limited application, as well as Sql Server 2005 or so, the query optimizer is smart enough to make your query fast without NonEmptyCrossJoin.) So I replaced the more modern approved equivalent in the aforementioned MDX. (It still works with NonEmptyCrossJoin, although NonEmptyCrossJoin doesn't speed things up at all.)

+2
source

I am not 100% sure, but you tried setting:

mondrian.native.nonempty.enable = true

This optimization seems to push some operations like these down to the sql level - it looks like this might help.

+2
source

I will answer the OLAP part. There are three large families of OLAP tools. ROLAP, MOLAP and HOLAP.

ROLAP, Relational, is a relationship-based database. An MDX query, if the cache is skipped, is executed in a relational database using an SQL statement. They have the advantage of scalability by delagation, but depend on their performance in the underlying database. QoS can be tricky as it is QoS db.

MOLAP, InMemory, copy data to internal structures (memory). Here, QoS, response time, are more stable and faster, since all processing is performed on one server. The problem with MOLAP is scalability, since you can exit memory (> 100mio).

HOLAP are mixed ROLAP and MOLAP. I have no direct experience, but in theory they can bring the best of both worlds.

Looking at the numbers, you should not experience any problems with the MOLAP tools, this is really a small cube.

So, before leaving the OLAP world, enable MOLAP servers. For a list of OLAP servers, you can check wikipedia

+1
source

Mondrian OLAP does not support large databases.

Well, I am developing the OLAP Tool (BJIn OLAP), which is an open source Java based OLAP tool. This uses SQL dispersion syntax, not MDX.

Documentation here

Trial version here

0
source

Source: https://habr.com/ru/post/901870/


All Articles