Pentaho Analysis Services: Developer Notes

Mondrian Developer Notes

Logging Levels and Information
Default aggregate table recognition rules
Snowflakes and the DimensionUsage level attribute
Memory Monitoring
Implementing Roles

Logging Levels and Information

Some of the Mondrian classes are instrumented with Apache Log4J Loggers. For some of these classes there are certain logging setting that provide information for not just the code developer but also for someone setting up a Mondrian installation. The following is a list of some of those log setting and the associated information.

Category	Level	Description
`mondrian.rolap.aggmatcher.AggTableManager`	`INFO`	A list of the RolapStar fact table names (aliases) and for each fact table, a list of all of its associated aggregate tables.
`mondrian.rolap.aggmatcher.AggTableManager`	`DEBUG`	A verbose output of all RolapStar fact tables, their measures columns, and dimension tables and columnns, along with all of each fact table's aggregate tables, columns and dimension tables.
`mondrian.rolap.aggmatcher.DefaultDef`	`DEBUG`	For each candidate aggregate table, the Matcher regular expressions for matching: table name and the fact count, foreign key, level and measure columns. Helpful in finding out why an aggregate table was not recognized.
`mondrian.rolap.agg.AggregationManager`	`DEBUG`	For each aggregate Sql query, if an aggregate table can be used to fulfill the query, which aggregate it was along with bitKeys and column names.
`mondrian.rolap.RolapUtil`	`DEBUG`	Prints out all Sql statements and their execution time. If one set the Mondrian property, `mondrian.rolap.generate.formatted.sql` to true, then the Sql is pretty printed (very nice).
`mondrian.rolap.RolapConnection`	`DEBUG`	Prints out each MDX query prior to its execution. (No pretty printing, sigh.)
`mondrian.rolap.RolapSchema`	`DEBUG`	Prints out each Rolap Schema as it is being loaded.

There are more classes with logging, but their logging is at a lower, more detailed level of more use to code developers.

Log levels can be set in either a log4j.properties file or log4j.xml file. You have to make sure you tell Mondrian which one to use. For the log4j.properties, entries might look like:

log4j.category.mondrian.rolap.RolapConnection=DEBUG
log4j.category.mondrian.rolap.RolapUtil=DEBUG

while for the log4.xml:

<category name="mondrian.rolap.RolapConnection">
<priority value="DEBUG"/>
</category>
<category name="mondrian.rolap.RolapUtil">
<priority value="DEBUG"/>
</category>

Default aggregate table recognition rules

The default Mondrian rules for recognizing aggregate tables are specified by creating an instance of the rule schema found in the file: MONDRIAN_HOME/src/main/rolap/aggmatcher/DefaultRulesSchema.xml. The instance of this schema that is built into the mondrian.jar after a build is in the same directory, MONDRIAN_HOME/src/main/rolap/aggmatcher/DefaultRules.xml.

There are six different default rules that are used to match and map a candidate aggregate table: table name, ignore column, fact count column, foreign key column, level column and measure column. All of these rules are defined by creating an instance of the DefaultRulesSchema.xml grammar. The DefaultRulesSchema.xml instance, the DefaultRules.xml file mentioned above, that by default is built as part of the mondrian.jar does not contain an ignore column rule. This grammar has base/supporting classes that are common to the above rules. In XOM terms, these are classes and super classes of the rule elements.

The first XOM class dealing with matching is the CaseMatcher class. This has an attribute "charcase" that takes the legal values of

"ignore" (default)
"exact"
"upper"
"lower"

When the value of the attribute is "ignore", then the regular expression formed by an element extending the CaseMatcher class will be case independent for both any parameters used to instantiate the regular expression template as well as for the text in the post-instantiated regular expression. On the other hand, when the "charcase" attribute take any of the other three values, it is only the parameter values themselves that are "exact", unchanged, "lower", converted to lower case, or "upper", converted to upper case.

The class NameMatcher extends the CaseMatcher class. This class has pre-template and post-template attributes whose default values is the empty string. These attributes are prepended/appended to a parameter to generate a regular expression. As an example, the TableMatcher element extends NameMatcher class. The parameter in this case is the fact table name and the regular expression would be:

pre-template-attribute${fact_table_name}post-template-attribute

For Mondrian, the builtin rule has the pre template value "agg_.+_" and the post template attribute value is the default so the regular expression becomes:

agg_.+_${fact_table_name}

Also, the NameMatcher has an attribute called basename which is optional. If set, then its value must be a regular expression with a single capture group. A capture group is an regular expression component surrounded by "(" and ")". As an example, "(.*)" is a capture group and if this was the total regular expression, then it would match anything and the single capture would match the same. On the other hand if the total regular expression was "RF_(.*)_TBL", then a name such as "RF_SHIPPMENTS_TBL" would match the regular expression while the capture group would be "SHIPPMENTS". Now, if the basename attribute is defined, then it is applied to each fact table name allowing one to strip away information and get to the "base" name. This might be needed because a DBA might prepend or append a tag to all of your fact table names and the DBA might wish to have a different tag prepend or append to all of your aggregate table names (RF_SHIPPMENTS_TBL as the fact table and RA_SHIPPMENTS_AGG_14 as an example aggregate name (the DBA prepended the "RA_" and you appended the "_AGG_14")).

Both the FactCountMatch and ForeignKeyMatch elements also extend the NameMatcher class. In these cases, the builtin Mondrian rule has no pre or post template attribute values, no regular expression, The FactCountMatch takes no other parameter from the fact table (the fact table does not have a fact count column) rather it takes a fact count attribute with default value "fact_count", and this is used to create the regular expression. For the ForeignKeyMatch matcher, its the fact table's foreign key that is used as the regular expression.

The ignore, asdf level and measure column matching elements have one or more Regex child elements. These allow for specifying multiple possible matches (if any match, then its a match). The IgnoreMap, LevelMap and MeasureMap elements extend the RegexMapper which holds an array of Regex elements. The Regex element extends CaseMatcher It has two attributes, space with default value '_' which says how space characters should be mapped, and dot with default value '_' which says how '.' characters should be mapped. If a name were the string "Unit Sales.Case" then (with the default values for the space and dot attributes and with CaseMatcher mapping to lower case ) this would become "unit_sales_case".

The IgnoreMap element has NO template parameter names. Each Regex value is simply a regular expression. As an example (Mondrian by default does not include an IgnoreMap by default), a regular expression that matches all aggregate table columns then end with '_DO_NOT_USE' would be:

.*_DO_NOT_USE

One might want to use an IgnoreMap element to filter out aggregate columns if, for example, the aggregate table is a materialized view, since with each "normal" column of such a materialized view there is an associated support column used by the database which has no significance to Mondrian. In the process of recognizing aggregate tables, Mondrian logs a warning message for each column whose use can not be determined. Materialized views have so many of these support columns that if, in fact, there was a column whose use was desired but was not recognized (for instance, the column name is misspelt) all of the materialized view column warning message mask the one warning message that one really needs to see.

The IgnoreMap regular expressions are applied before any of the other column matching actions. If one sets the IgnoreMap regular expression to, for example,

.*

then all columns are marked as "ignore" and there are no other columns left to match anything else. One must be very careful when choosing IgnoreMap regular expressions not just for your current columns but for columns that might be created in the future. Its best to document this usage in your organization.

The following is what the element might look like in a DefaultRules.xml file:

    <IgnoreMap id="ixx" >
      <Regex id="physical" charcase="ignore">
          .*_DO_NOT_USE
      </Regex>
    </IgnoreMap>

The LevelMap element has the four template parameter names (hardcoded):

hierarchy_name
level_name
level_column_name
usage_prefix

These are names that can be used in creating template regular expressions. The builtin Mondrian default rules for level matching defines three Regex child elements for the LevelMap element. These define the template regular expressions:

${hierarchy_name}_${level_name}
${hierarchy_name}_${level_column_name}
${usage_prefix}${level_column_name}
${level_column_name}

Mondrian while attempting to match a candidate aggregate table against a particular fact table, iterates through the fact table's cube's hierarchy name, level name and level colum names looking for matches.

The MeasureMap element has the three template parameter names (hardcoded):

measure_name
measure_column_name
aggregate_name

which can appear in template regular expressions. The builtin Mondrian default rules for measure matching defines three Regex child elements for the MeasureMap element. These are

${measure_name}
${measure_column_name}
${measure_column_name}_${aggregate_name}

and Mondrian attempts to match a candidate aggregate table's column names against these as it iterators over a fact table's measures.

A grouping of FactCountMatch, ForeignKeyMatch, TableMatcher, LevelMap, and MeasureMap make up a AggRule element, a rule set. Each AggRule has a tag attribute which is a unique identifier for the rule. There can be multiple AggRule elements in the outer AggRules element. Each AggRule having its own tag attribute. When Mondrian runs, it selects (via the mondrian.rolap.aggregates.rule.tag property) which rule set to use.

One last wrinkle, within a AggRule the FactCountMatch, ForeignKeyMatch, TableMatcher, LevelMap, and MeasureMap child elements can be either defined explicitly within the AggRule element or by reference FactCountMatchRef, ForeignKeyMatchRef, TableMatcherRef, LevelMapRef, and MeasureMapRef The references are defined as child elements of the top level AggRules element. With references the same rule element can be used by more than one AggRule (code reuse).

Below is an example of a default rule set with rather different matching rules.

<AggRules tag="your_mamas_dot_com">
  <AggRule tag="default" >
    <FactCountMatch id="fca" factCountName="FACT_TABLE_COUNT"
      charcase="exact" />
    <ForeignKeyMatch id="fka" pretemplate="agg_" />
    <TableMatch id="ta" pretemplate="agg_" posttemplate="_.+"/>
    <LevelMap id="lxx" >
      <Regex id="logical" charcase="ignore" space="_" dot="_">
          ${hierarchy_name}_${level_name}
      </Regex>
      <Regex id="mixed" charcase="ignore" >
          ${hierarchy_name}_${level_name}_${level_column_name}
      </Regex>
      <Regex id="mixed" charcase="ignore" >
          ${hierarchy_name}_${level_column_name}
      </Regex>
      <Regex id="usage" charcase="exact" >
          ${usage_prefix}${level_column_name}
      </Regex>
      <Regex id="physical" charcase="exact" >
          ${level_column_name}_.+
      </Regex>
    </LevelMap>
    <MeasureMap id="mxx" >
      <Regex id="one" charcase="lower" >
          ${measure_name}(_${measure_column_name}(_${aggregate_name})?)?
      </Regex>
      <Regex id="two" charcase="exact" >
        ${measure_column_name}(_${aggregate_name})?
      </Regex>
    </MeasureMap>
  </AggRule>
</AggRules>

First, all fact count columns must be called FACT_TABLE_COUNT exactly, no ignoring case. Next, foreign key columns match the regular expression

agg_${foreign_key_name}

that is, the fact table foreign key column name with "agg_" prepened such as agg_time_id. The aggregate table names match the regular expression

agg_${fact_table_name}_.+

For the FoodMart sales_fact_1997 fact table, an aggregate could be named,

agg_sales_fact_1997_01
agg_sales_fact_1997_lost_time_id
agg_sales_fact_1997_top

If the hierarchy, level and level column names were:

hierarchy_name="Sales Location"
level_name="State"
level_column_name="state_location"
usage_prefix=null

then the following aggregate table column names would be recognizing as level column names:

SALES_LOCATION_STATE
Sales_Location_State_state_location
state_location_level.

If in the schema file the DimensionUsage for the hierarchy had a usagePrefix attribute,

usage_prefix="foo_"

then with the above level and level column names and usage_prefix the following aggregate table column names would be recognizing as level column names:

SALES_LOCATION_STATE
Sales_Location_State_state_location
state_location_level.
foo_state_location.

In the case of matching measure columns, if the measure template parameters have the following values:

measure_name="Unit Sales"
measure_column_name="m1"
aggregate_name="Avg"

then possible aggregate columns that could match are:

unit_sales_m1
unit_sales_m1_avg
m1
m1_avg

The intent of the above example default rule set is not that they are necessarily realistic or usable, rather, it just shows what is possible.

Snowflakes and the DimensionUsage level attribute

Mondrian supports dimensions with all of their levels lumped into a single table (with all the duplication of data that that entails), but also snowflakes. A snowflake dimension is one where the fact table joins to one table (generally the lowest) and that table then joins to a table representing the next highest level, and so on until the top level's table is reached. For each level there is a separate table.

As an example snowflake, below is a set of Time levels and four possible join element blocks, relationships between the tables making up the Time dimension. (In a schema file, the levels must appear after the joins.)

<Level name="Calendar Year" table="TimeYear" column="YEAR_SID"
  nameColumn="YEAR_NAME" levelType="TimeYears" uniqueMembers="true"/>
<Level name="Quarter" table="TimeQtr" column="QTR_SID"
  nameColumn="QTR_NAME" levelType="TimeQuarters" uniqueMembers="true"/>
<Level name="Month" table="TimeMonth" column="MONTH_SID"
  nameColumn="MONTH_ONLY_NAME" levelType="TimeMonths" uniqueMembers="false"/>
<Level name="Day" table="TimeDay" column="DAY_SID" nameColumn="DAY_NAME"
  levelType="TimeDays" uniqueMembers="true"/>


  <Join leftAlias="TimeYear" leftKey="YEAR_SID" 
        rightAlias="TimeQtr" rightKey="YEAR_SID" >
    <Table name="RD_PERIOD_YEAR" alias="TimeYear" /> 
    <Join leftAlias="TimeQtr" leftKey="QTR_SID" 
        rightAlias="TimeMonth" rightKey="QTR_SID" >
        <Table name="RD_PERIOD_QTR" alias="TimeQtr" />
        <Join leftAlias="TimeMonth" leftKey="MONTH_SID" 
            rightAlias="TimeDay" rightKey="MONTH_SID" >
            <Table name="RD_PERIOD_MONTH" alias="TimeMonth" />
            <Table name="RD_PERIOD_DAY" alias="TimeDay" />
        </Join>
    </Join>
  </Join>

  <Join leftAlias="TimeQtr" leftKey="YEAR_SID" 
        rightAlias="TimeYear" rightKey="YEAR_SID" >
    <Join leftAlias="TimeMonth" leftKey="QTR_SID" 
        rightAlias="TimeQtr" rightKey="QTR_SID" >
        <Join leftAlias="TimeDay" leftKey="MONTH_SID" 
            rightAlias="TimeMonth" rightKey="MONTH_SID" >
            <Table name="RD_PERIOD_DAY" alias="TimeDay" />
            <Table name="RD_PERIOD_MONTH" alias="TimeMonth" />
        </Join>
        <Table name="RD_PERIOD_QTR" alias="TimeQtr" />
    </Join>
    <Table name="RD_PERIOD_YEAR" alias="TimeYear" /> 
  </Join>

  <Join leftAlias="TimeMonth" leftKey="MONTH_SID" 
        rightAlias="TimeDay" rightKey="MONTH_SID" >
    <Join leftAlias="TimeQtr" leftKey="QTR_SID"
        rightAlias="TimeMonth" rightKey="QTR_SID" >
        <Join leftAlias="TimeYear" leftKey="YEAR_SID" 
            rightAlias="TimeQtr" rightKey="YEAR_SID" >
            <Table name="RD_PERIOD_YEAR" alias="TimeYear" /> 
            <Table name="RD_PERIOD_QTR" alias="TimeQtr" />
        </Join>
        <Table name="RD_PERIOD_MONTH" alias="TimeMonth" />
    </Join>
    <Table name="RD_PERIOD_DAY" alias="TimeDay" />
  </Join>

  <Join leftAlias="TimeDay" leftKey="MONTH_SID" 
        rightAlias="TimeMonth" rightKey="MONTH_SID" >
    <Table name="RD_PERIOD_DAY" alias="TimeDay" />
    <Join leftAlias="TimeMonth" leftKey="QTR_SID" 
        rightAlias="TimeQtr" rightKey="QTR_SID" >
        <Table name="RD_PERIOD_MONTH" alias="TimeMonth" />
        <Join leftAlias="TimeQtr" leftKey="YEAR_SID" 
            rightAlias="TimeYear" rightKey="YEAR_SID" >
            <Table name="RD_PERIOD_QTR" alias="TimeQtr" />
            <Table name="RD_PERIOD_YEAR" alias="TimeYear" />
        </Join>
    </Join>
  </Join>

Viewed as trees these can be represented as follows:

            |
    ---------------
    |             |
   Year     --------------
            |            |
         Quarter     ---------
                     |       |
                   Month    Day

                  |
           ----------------
           |              |
        --------------   Year
        |            |
    ---------     Quarter
    |       |
   Day     Month

                  |
           ----------------
           |              |
        --------------   Day
        |            |
    ---------      Month
    |       |
   Year   Quarter

            |
    ---------------
    |             |
   Day      --------------
            |            |
          Month      ---------
                     |       |
                   Quarter  Year

It turns out that these join block are equivalent; what table joins to what other table using what keys. In addition, they are all (now) treated the same by Mondrian. The last join block is the canonical representation; left side components are levels of greater depth than right side components, and components of greater depth are higher in the join tree than those of lower depth:

            |
    ---------------
    |             |
   Day      --------------
            |            |
          Month      ---------
                     |       |
                   Quarter  Year

Mondrian reorders these join blocks into the canonical form and uses that to build subtables in the RolapStar.

In addition, if a cube had a DimensionUsage of this Time dimension with, for example, its level attribute set to Month, then the above tree is pruned

              |
        --------------
        |            |
      Month      ---------
                 |       |
               Quarter  Year

and the pruned tree is what is used to create the subtables in the RolapStar. Of course, the fact table must, in this case, have a MONTH_SID foreign key.

Note that the Level element's table attribute MUST use the table alias and NOT the table name.

Memory monitoring, Java5 and memory usage

With Java5, developers using its memory monitoring capabilities need to make sure the code they create will best use this new feature. In particular, if a given algorithm which uses significant memory is surrounded by block in which a MemoryMonitor.Listener has been registered with the MemoryMonitor, then the code must periodically check if a memory notification has occurred. If the algorithm has long stretches of allocating memory for data structures that will exist throughout the life-time of the algorithm's execution during which it does not check for memory notifications, then it is possible that an OutOfMemoryError could still occur. You can see for the ResultSet object where, basically, all memory is created in its constructor, throughout the Member determination and value evaluation code, the Query object's checkCancelOrTimeout method is called repeatedly.

The Java5 memory management mechanism is not fool proof, so to speak. If one, as an example, attempts to allocate a very big array, an OutOfMemoryError will occur. This technique works best when memory is allocated incrementally between checks for memory notifications allowing the developer to take steps before a possible OOME gotterdammerung.

One last issue, if a developer needs to embed Mondrian in a Web or Application server and the server has its own way of dealing with Java5 memory notification system, then it is important that Mondrian be a good application citizen in the server. It is much like the use of JAAS in an application. A JVM allows for a single JAAS instance and most servers register their mechanism with the JVM. It is bad for the application in the server to use its own JAAS rather than register with the server's. So, if Mondrian is in a Web or Application server that has its own dealings with the Java5 memory notification system and the server expects applications to use its mechanism, then the developer must create an instance of the MemoryManager interface that communicates the Webserver/Appserver mechanism and uses a System property to instruct the MemoryManagerFactory to create that specialized version.

Implementing Roles

The developer can create their own Roles by implementing the Role interface or by taking an existing Role instance and wrapping it in an object derived from and overriding some of the methods of the DelegatingRole and DelegatingRole.HierarchyAccess classes. In both cases, some care must be taken not to stray too far from the semantics of the default Mondrian Role implementation, the RoleImpl class.

When implementing one's own Role the Role interface has methods that return an Access object for Schema, Cube, Dimension, Hierarchy, Level, Member and NamedSet all of which must have implementations. One reason one might wish to create one's own Role implementations is to avoid defining Roles in the Schema definition. This allows the Mondrian container to dynamically generate new Roles while using the same Schema definitio nand, therefore, the same in-memory caches associated with that Schema object. Such Roles do not need to be registered with the Schema object; they are associated with the Connection. Another reason one might wish to implement one's own Roles is that there might be an existing permission system and, rather than have duplicate information: in the permission system and the Schema definition, one simply creates Roles based upon permission system information.

If one wishes simply to alter or extend the semantics of the existing Role implementation, the RoleImpl class, then using the DelegatingRole class is a reasonably utilitarian approach. This requires that one create a Role implementation derived from the DelegatingRole class, in the Mondrian container call the Schema.lookupRole(String) method to get the Role whose semantics are to be modified, create an instance of the Role derived from the DelegatingRole that wraps the underlying Role, and, finally, set the Connection's Role by calling Connection.setRole(Role) with this wrapping Role.

The following code is an example where the underlying Role is wrapped in a class that extends the DelegatingRole class. Here, the user has no access to the store where "Joe Bob" is the manager.

public class RoleExample extends DelegatingRole {
  .....
  public static class HierarchyAccessExample
    extends DelegatingRole.HierarchyAccess {
    .....
    public Access getAccess(Member member) {
      Access access = hierarchyAccess.getAccess(member);
      return getAccess(member, access);
    }
  .....
  }
  .....
  public Access getAccess(Member member) {
    Access access = role.getAccess(member);
    return getAccess(member, access);
  }
  .....
  // no one see's information about the store where "Joe Bob" is manager.
  protected Access getAccess(Member member, Access access) {
    final String storeNamelevel =
      "[Store].[Store Country].[Store State].[Store City].[Store Name]";
    if (member.getLevel().getUniqueName().equals(storeNamelevel)) {
      Object o = member.getProperty("Store Manager");
      return (o != null && o.equals("Joe Bob")) ? Access.NONE : access;
    } else {
      return access;
    }
  }
}

In this case, special care must be taken if one over rides one of the methods that yield a Member's Access. This is because there are two such methods. The first is in the Role itself Role.getAccess(Member) and the second is Role.HierarchyAccess.getAccess(Member). Internally, Mondrian is certain code paths calls one of the methods while in other code paths it calls the other, thus they should be overridden in a consistent manner.

Author: Julian Hyde, Richard Emberson; last updated August, 2006.
Version: $Id: //open/mondrian/doc/developer_notes.html#12 $ (log)
Copyright (C) 2005-2007 Julian Hyde

Contents

Logging Levels and Information

Default aggregate table recognition rules

Snowflakes and the DimensionUsage level attribute

Memory monitoring, Java5 and memory usage

Implementing Roles