Generating a Planning Generation Table with Groovy

If you haven’t read our earlier blog post on exporting the outline you should check it out first.
A generation table has it’s uses but I personally get the most use when I combine it with a fact table and / or a parent/child table which is what we’ve created already.

Normally I wouldn’t resort to using Groovy for something like this. I’d just do it in sql but short of creating a procedure and hoping you have the source and destination on the same database you’ll end up needing a technology that can connect to seperate databases and has the ability to add columns depending on how much you need.
For example, if you’re exporting all dimensions you might need 10 generations, but if you’re only working with a select few you might only need 2. How will you know if you don’t add another generation to your database? If you’re not dynamically generating the table you could accidentally be excluding metadata without even knowing it.

We’ll start with our config file. This file needs to be included with your script. I called mine Config.groovy but the name is really unimportant. All that matters is you add whatever environments you need as well as the connection details actually line up with your source / target database info.

//Config.groovy
environments {
    dev {
        sourceDatabaseConnection {
            url = "jdbc:oracle:thin:@HOSTNAME:1521:DATABASE"
            userName = "USERNAME"
            passWord = 'PASSWORD'
            dataSource = "oracle.jdbc.pool.OracleDataSource"
        }
        targetDatabaseConnection {
            url = "jdbc:oracle:thin:@HOSTNAME:1521:DATABASE"
            userName = "USERNAME"
            passWord = 'PASSWORD'
            dataSource = "oracle.jdbc.pool.OracleDataSource"
        }
    }
}

Next we have to create our groovy script. I called mine ExportGenerationHierarchy.groovy but the name is really unimportant. Just remember what it is. You’ll want to start by defining your connections.

def configEnvironment = "dev"
def configFileName = "Config.groovy"

ConfigObject configObject = new ConfigSlurper(configEnvironment).parse(new File(configFileName).toURI().toURL())

def con_Source = Sql.newInstance(configObject.sourceDatabaseConnection.url,
        configObject.sourceDatabaseConnection.userName,
        configObject.sourceDatabaseConnection.passWord,
        configObject.sourceDatabaseConnection.dataSource)

def con_Destination = Sql.newInstance(configObject.targetDatabaseConnection.url,
        configObject.targetDatabaseConnection.userName,
        configObject.targetDatabaseConnection.passWord,
        configObject.targetDatabaseConnection.dataSource)

We’ll need to get max number of generations. We used a similar technique in the prior article on outline extraction.
The magic is the CONNECT BY PL/SQL which we wrap in a sub-query. Groovy will run this SQL against the source database and get the MAX(GEN_NUM) and return it into a variable.

int max_gen_num = GetMaxGenNum(con_Source, sqlFilter)
def GetMaxGenNum(def con_Source, def sqlFilter) {
    def sql_get_max_gen_num = """
        SELECT
          MAX(GEN_NUM) AS MAX_GEN_NUM
        FROM (
        SELECT
                OD.OBJECT_NAME AS DIMNAME,
                SYS_CONNECT_BY_PATH(O.OBJECT_NAME, '|') AS PATH,
                LEVEL AS GEN_NUM,
                O.OBJECT_NAME AS MEMBER_NAME,
                OP.OBJECT_NAME AS PARENT_NAME,
            O.HAS_CHILDREN
        FROM
          HSP_OBJECT O
          INNER JOIN HSP_MEMBER M
            ON O.OBJECT_ID = M.MEMBER_ID
          INNER JOIN HSP_OBJECT OD
            ON M.DIM_ID = OD.OBJECT_ID
            AND OD.OBJECT_TYPE = 2
          INNER JOIN HSP_OBJECT OP
            ON O.PARENT_ID = OP.OBJECT_ID
            START WITH
                OD.OBJECT_ID = O.OBJECT_ID
            CONNECT BY NOCYCLE PRIOR
                O.OBJECT_ID = O.PARENT_ID
            ) SQ1
        WHERE HAS_CHILDREN = 0
        """
    sql_get_max_gen_num += sqlFilter

    int max_gen_num

    con_Source.eachRow(sql_get_max_gen_num) {
        max_gen_num = it.MAX_GEN_NUM
    }
    return max_gen_num
}

Next we will drop/create our target table. We want to get notified if there are any issues but we know a specific error we can safely ignore. “Table does not exist” so we’ll catch that.

def CreateTable(int max_gen_num, def con_Destination) {
    println "Dropping Table: TMP_PLANNING_GEN_EXPORT"
    def sql_drop_deleteme_ct = "DROP TABLE TMP_PLANNING_GEN_EXPORT"
    try { con_Destination.execute(sql_drop_deleteme_ct) }
    catch (SQLSyntaxErrorException e) {
        if (e.message.toUpperCase().contains("ORA-00942")) {println "Drop not executed. Table does not exist."}
        else {throw e}
    }

    println "Creating Table: TMP_PLANNING_GEN_EXPORT"
    def sql_create_deleteme_ct = """
        CREATE TABLE TMP_PLANNING_GEN_EXPORT (
            DIMNAME VARCHAR2(80),
            PATH VARCHAR2(4000),
            GEN_NUM VARCHAR2(80)"""
    (1..max_gen_num).each { sql_create_deleteme_ct += ",\r\n    GEN${it} VARCHAR2(80)" }
    sql_create_deleteme_ct += ",\r\n    LEVEL0 VARCHAR2(80)\r\n)"
    con_Destination.execute(sql_create_deleteme_ct)
}

Finally we get to load some data. Two things to note here. We’ll be opening two connections at the same time but we’ll need to avoid trying to load up the entire resultset into memory. We’ll want to use batches on both sides, the reading and the writing.
To acomplish this we use two methods.

def LoadData(def con_Source, def con_Destination, def max_gen_num, def sqlFilter, def bRun, def batchSize) {

    def sql_get_gen_table = """
        SELECT
          DIMNAME,
          PATH,
          GEN_NUM,
        """
    (1..max_gen_num).each { sql_get_gen_table += "REGEXP_SUBSTR (PATH, '[^|]+', 1, ${it}) AS GEN${it}," }
    sql_get_gen_table += """
          MEMBER_NAME AS LEVEL0
        FROM (
        SELECT
                OD.OBJECT_NAME AS DIMNAME,
                SYS_CONNECT_BY_PATH(O.OBJECT_NAME, '|') AS PATH,
                LEVEL AS GEN_NUM,
                O.OBJECT_NAME AS MEMBER_NAME,
                OP.OBJECT_NAME AS PARENT_NAME,
            O.HAS_CHILDREN
        FROM
          HSP_OBJECT O
          INNER JOIN HSP_MEMBER M
            ON O.OBJECT_ID = M.MEMBER_ID
          INNER JOIN HSP_OBJECT OD
            ON M.DIM_ID = OD.OBJECT_ID
            AND OD.OBJECT_TYPE = 2
          INNER JOIN HSP_OBJECT OP
            ON O.PARENT_ID = OP.OBJECT_ID
            START WITH
                OD.OBJECT_ID = O.OBJECT_ID
            CONNECT BY NOCYCLE PRIOR
                O.OBJECT_ID = O.PARENT_ID
            ) SQ1
        WHERE HAS_CHILDREN = 0
        """
    sql_get_gen_table += sqlFilter

    def sql_insert_deleteme_ct = """INSERT INTO TMP_PLANNING_GEN_EXPORT (DIMNAME, PATH, GEN_NUM,"""
    (1..max_gen_num).each { sql_insert_deleteme_ct += "GEN${it}," }
    sql_insert_deleteme_ct += "LEVEL0) VALUES (?,?,?,"
    (1..max_gen_num).each { sql_insert_deleteme_ct += "?," }
    sql_insert_deleteme_ct += "?)"

    int totalInserts = 0

    if(bRun) {
        con_Source.withStatement { stmt -> stmt.fetchSize = batchSize}
        con_Source.rows(sql_get_gen_table).each { row ->
            totalInserts++
            con_Destination.withBatch(batchSize,sql_insert_deleteme_ct) { stmt ->
                stmt.addBatch(row.values().toArray())
            }
        }
    }
    return totalInserts
}

The withStatement allows us to set a batch side for the reading.

con_Source.withStatement { stmt -> stmt.fetchSize = batchSize}

And withBatch allows us to set a batch size for writing. Combined they make a pretty useful pair.

            con_Destination.withBatch(batchSize,sql_insert_deleteme_ct) { stmt ->
                stmt.addBatch(row.values().toArray())
            }

And finally, here’s what it looks like when we pull it together.

//ExportGenerationHierarchy.groovy
import groovy.sql.Sql
import groovy.time.TimeCategory
import groovy.time.TimeDuration
import java.sql.SQLSyntaxErrorException

def timeStart = new Date()

def configEnvironment = "dev"
def configFileName = "Config.groovy"

ConfigObject configObject = new ConfigSlurper(configEnvironment).parse(new File(configFileName).toURI().toURL())

def con_Source = Sql.newInstance(configObject.sourceDatabaseConnection.url,
        configObject.sourceDatabaseConnection.userName,
        configObject.sourceDatabaseConnection.passWord,
        configObject.sourceDatabaseConnection.dataSource)

def con_Destination = Sql.newInstance(configObject.targetDatabaseConnection.url,
        configObject.targetDatabaseConnection.userName,
        configObject.targetDatabaseConnection.passWord,
        configObject.targetDatabaseConnection.dataSource)

int batchSize = 100
def totalInserts = 0

boolean bRun = true

//def sqlFilter = ""
def sqlFilter = "AND DIMNAME = 'Period'"

int max_gen_num = GetMaxGenNum(con_Source, sqlFilter)

CreateTable(max_gen_num, con_Destination)

totalInserts = LoadData(con_Source, con_Destination, max_gen_num, sqlFilter, bRun, batchSize)

ReportRuntimeStats(timeStart, totalInserts, batchSize)

def ReportRuntimeStats(def timeStart, def totalInserts, def batchSize) {
    def timeStop = new Date()
    TimeDuration duration = TimeCategory.minus(timeStop, timeStart)
    println "Total Inserts: " + totalInserts
    println "Batch Size:    " + batchSize
    println "Duration:      " + duration

}

def CreateTable(int max_gen_num, def con_Destination) {
    println "Dropping Table: TMP_PLANNING_GEN_EXPORT"
    def sql_drop_deleteme_ct = "DROP TABLE TMP_PLANNING_GEN_EXPORT"
    try { con_Destination.execute(sql_drop_deleteme_ct) }
    catch (SQLSyntaxErrorException e) {
        if (e.message.toUpperCase().contains("ORA-00942")) {println "Drop not executed. Table does not exist."}
        else {throw e}
    }

    println "Creating Table: TMP_PLANNING_GEN_EXPORT"
    def sql_create_deleteme_ct = """
        CREATE TABLE TMP_PLANNING_GEN_EXPORT (
            DIMNAME VARCHAR2(80),
            PATH VARCHAR2(4000),
            GEN_NUM VARCHAR2(80)"""
    (1..max_gen_num).each { sql_create_deleteme_ct += ",\r\n    GEN${it} VARCHAR2(80)" }
    sql_create_deleteme_ct += ",\r\n    LEVEL0 VARCHAR2(80)\r\n)"
    con_Destination.execute(sql_create_deleteme_ct)
}

def GetMaxGenNum(def con_Source, def sqlFilter) {
    def sql_get_max_gen_num = """
        SELECT
          MAX(GEN_NUM) AS MAX_GEN_NUM
        FROM (
        SELECT
                OD.OBJECT_NAME AS DIMNAME,
                SYS_CONNECT_BY_PATH(O.OBJECT_NAME, '|') AS PATH,
                LEVEL AS GEN_NUM,
                O.OBJECT_NAME AS MEMBER_NAME,
                OP.OBJECT_NAME AS PARENT_NAME,
            O.HAS_CHILDREN
        FROM
          HSP_OBJECT O
          INNER JOIN HSP_MEMBER M
            ON O.OBJECT_ID = M.MEMBER_ID
          INNER JOIN HSP_OBJECT OD
            ON M.DIM_ID = OD.OBJECT_ID
            AND OD.OBJECT_TYPE = 2
          INNER JOIN HSP_OBJECT OP
            ON O.PARENT_ID = OP.OBJECT_ID
            START WITH
                OD.OBJECT_ID = O.OBJECT_ID
            CONNECT BY NOCYCLE PRIOR
                O.OBJECT_ID = O.PARENT_ID
            ) SQ1
        WHERE HAS_CHILDREN = 0
        """
    sql_get_max_gen_num += sqlFilter

    int max_gen_num

    con_Source.eachRow(sql_get_max_gen_num) {
        max_gen_num = it.MAX_GEN_NUM
    }

    return max_gen_num

}

def LoadData(def con_Source, def con_Destination, def max_gen_num, def sqlFilter, def bRun, def batchSize) {

    def sql_get_gen_table = """
        SELECT
          DIMNAME,
          PATH,
          GEN_NUM,
        """
    (1..max_gen_num).each { sql_get_gen_table += "REGEXP_SUBSTR (PATH, '[^|]+', 1, ${it}) AS GEN${it}," }
    sql_get_gen_table += """
          MEMBER_NAME AS LEVEL0
        FROM (
        SELECT
                OD.OBJECT_NAME AS DIMNAME,
                SYS_CONNECT_BY_PATH(O.OBJECT_NAME, '|') AS PATH,
                LEVEL AS GEN_NUM,
                O.OBJECT_NAME AS MEMBER_NAME,
                OP.OBJECT_NAME AS PARENT_NAME,
            O.HAS_CHILDREN
        FROM
          HSP_OBJECT O
          INNER JOIN HSP_MEMBER M
            ON O.OBJECT_ID = M.MEMBER_ID
          INNER JOIN HSP_OBJECT OD
            ON M.DIM_ID = OD.OBJECT_ID
            AND OD.OBJECT_TYPE = 2
          INNER JOIN HSP_OBJECT OP
            ON O.PARENT_ID = OP.OBJECT_ID
            START WITH
                OD.OBJECT_ID = O.OBJECT_ID
            CONNECT BY NOCYCLE PRIOR
                O.OBJECT_ID = O.PARENT_ID
            ) SQ1
        WHERE HAS_CHILDREN = 0
        """
    sql_get_gen_table += sqlFilter

    def sql_insert_deleteme_ct = """INSERT INTO TMP_PLANNING_GEN_EXPORT (DIMNAME, PATH, GEN_NUM,"""
    (1..max_gen_num).each { sql_insert_deleteme_ct += "GEN${it}," }
    sql_insert_deleteme_ct += "LEVEL0) VALUES (?,?,?,"
    (1..max_gen_num).each { sql_insert_deleteme_ct += "?," }
    sql_insert_deleteme_ct += "?)"

    int totalInserts = 0

    if(bRun) {
        con_Source.withStatement { stmt -> stmt.fetchSize = batchSize}
        con_Source.rows(sql_get_gen_table).each { row ->
            totalInserts++
            con_Destination.withBatch(batchSize,sql_insert_deleteme_ct) { stmt ->
                stmt.addBatch(row.values().toArray())
            }
        }
    }

    return totalInserts

}

This is the output we can expect when we set the filter to just one dimension. In this case Period. As we can see there are 4 generations needed for Period

def sqlFilter = "AND DIMNAME = 'Period'"

041517_2349_Generatinga2.png

And only three generations needed for Year. It’s definitely working.

def sqlFilter = "AND DIMNAME = 'Year'"

041517_2349_Generatinga1.png

My favorite part of the exercise is that it showcases a few cool concepts.

  • ETL using Groovy.
  • Using a config file with ConfigSlurper.
  • Dynamic SQL Using Groovy.

And how long does it take for my entire outline? Well I have over 100k members in total but how’s this look?
2017-04-15_213530

But please note that anytime you run custom SQL using groovy you have to make sure you are careful. Don’t want to blow up your database! Don’t say I didn’t warn you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s