Pages

Tuesday, July 5, 2011

Pentaho Community Edition (Data Ingegration)

Pentaho provides two different editions: Community Edition and Enterprise Edition. Community Edition is free and is what i want to discuss.

Pentaho seems to provide more comprehensive coverage of BI than Eclipse BIRT and Jaspersoft. It has the following components:

  • Data Integration - Kettle

  • Analysis Service (OLAP) - Mondrian

  • Reporting

  • Data Mining - Weka

  • Dashboard

  • Large Volume Data Handling (through Hadoop)


Since there is already a comparison of reporting functionality between Pentaho, Eclipse BIRT and JasperReports, I am not going to get deep into its reporting functionality.

The component that I've tried is the data integration. It helps me do the some data integration tasks without writing my own custom code. Here I just give a brief introduction to this component.



Data Integration - Kettle
Data Integration is the first thing i tried when I picked up Pentaho. It has a GUI tool (called Spoon) that is built with Eclipse RCP. With the GUI tool, it's very easy to define a data integration process.

There are two main elements in the pentaho data integration process: Transformation and Job. Transformation, as the name suggestions, is a process that does the data manipulation including data exportation, cleansing, format changing, importation and etc. Job may contain one or more transformation and adds more sanity checks (such as if a file exists) and utilities (e.g., emailing the result).

Both Transformation and Job are made up of steps. Pentaho already includes many types of steps that performs the most common tasks. There steps serve as bricks that you can use to build up the whole data integration process. In the GUI, you can easily use drag and drop to define the steps and hops to form the process. You can even preview the transformed data in some steps to make sure they are doing the right thing.

From the Spoon GUI tool welcome page, you can find a "Get Started" document that helps you build the first working example. In addition, the pentaho community web site provides useful documentation on how to use this data integration tool.

No comments:

Post a Comment