Tuesday, March 1, 2011

Telco Churn Case Study Reloaded: Oracle Data Miner

The previous three-part series on mining star schemas (using a telco churn case study) demonstrated how to leverage SQL and the Oracle Data Mining API to capture complex methodology in a simpler, more performant approach.  I have been informed that there is this concept called a "user interface" which some people find enticing :-).  This post demonstrates the telco churn methodology using the shiny new Oracle Data Miner interface.  Oracle Data Miner is bundled with SQL Developer 3.0 (SQL Developer 3.0 is currently available as an early adopter release) and runs against Oracle database version 11gR2 with the Oracle Data Mining option enabled.

In addition to the four tables created in the previous post, this example includes a fifth table, CHURN_RISK, to demonstrate a deployment scenario.  The ODMtelcosetup.sql script can be invoked from within a clean data mining user schema to create and populate the five tables.  Alternatively, just run the CHURN_RISK table create statement from within the script if the rest of the schema is setup as described in the earlier post.

With the schema prepared, use Oracle Data Miner to import the pre-defined ODMTelcoChurn.xml workflow that captures the telco churn methodology.  The Oracle Data Miner navigator can be accessed from the Tools->Data Miner menu in SQL Developer 3.0.  From within the data miner navigator, right-click on the mining user and select New Project.  The first connection attempt via the Data Miner navigator will kick-off some setup work, including installation of a repository to support Oracle Data Miner (SYS account password is required for this setup).  Once an empty project is created, right-click the project, select Import Workflow, and browse to the ODMTelcoChurn.xml file.  Once imported, the workflow should look like:


Next, right-click the CHURN_RISK node (far-right) and select Run.  This will start a job in the database that will:
  1. Prepare the customer data
  2. Train and test a classification model named CHURN
  3. Score current, high-valued customers with the CHURN model
  4. Identify those current, high-valued customers who are at risk for churn
  5. Store the at-risk customers in the CHURN_RISK table
The transformation node that deserves the most attention is the COLLECT CDR node at the top.  This node is responsible for capturing usage minutes for each customer, broken out to different levels of granularity.  While it is very simple to specify using Oracle Data Miner due to the nature in which ODM handles transactional data in a star schema, the same operation requires dozens of nodes and expensive data transformations when attempted in other data mining tools.

After running the workflow, the CHURN node contains a Generalized Linear Model (GLM) built on customer data.  The GLM model can be inspected, and the results of a held-aside test can be viewed, via right-click menu entries from that node.  A right-click on the CHURN_RISK update table node offers the ability to view data. This data - the list of high-valued, at-risk customers - can be integrated within any business process.

More details on the individual workflow nodes are available in the full white paper.