Paxcel Labs BI

KickStart to ETL using Talend

We had already discussed the basic concepts of ETL in a previous blog. Lets see how ETL can be achieved through the powerful open source tool “Talend”

Talend Open Studio for Data Integration: It is a powerful ETL tool that is free of cost !!.It allows you to  efficiently and effectively manage all facets of data extraction, data transformation, and data loading. This leading open source ETL tool boosts developer productivity with a rich set of features including:

  •   An Eclipse-based graphical integrated development environment that enables easy data profiling and modeling, drag-and-drop job design, and efficient reuse of completed work across projects and modules.
  •  More than 400 built-in connectors that allow you to easily link a wide array of sources and targets.
  • Robust ETL functionality such as string manipulations, management of slowly changing dimensions, and automatic lookup handling
  •  The ability to execute extract, load, and transform (ELT) processes as well as ETL processes, even within the same job.

Talend Open Studio for Data Integration comprises three major applications (Business Modeler, Job Designer, and Metadata Manager) within a single graphical development environment based on Eclipse, which is easily adapted to corporate needs.

It also provides a rich user manual for how to use various components of Talend and can be accessed from the following link

Case Study

Input –A program generated file that gives the response of surveys conducted on people. Number of people for each questionnaire varies and hence the column count is not fixed. This file is a positional  file with a specification of which char gives which column.

–column count can be captured from header record or from length as response of each person is of fixed length

Sample record for a questionnaire

questionid question response_from_users

1What do you prefer iphone or S2                                   1 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2

0-1  specify question id

2-50 Question

51-60 Filler(blanks)

60-* response of each person which is of one char(1 or 2)

Output-Table in MySql with fixed columns for question id,question but varying number for response

Steps involved

Create  a positional file in talend with refence to your input file and specify the position which would act like delimiters(except for the last columns position after 60 as that is not fixed).

Similarly create a table in MySql with fixed columns(later we would alter it) and create a database connection for it in Talend.

For this conversion we develop a routine(in Java  since Talend is written in Java it lets you use ur coding skills and develop your own components and functionality)and  pass the columns of  file as parameters

This routine takes the last field (after 60th position) and counts the number of columns (by dividing it by 2)  and then applies an alter command to add these new columns on the fly. These columns are then inserted in a table for each row .

To call this routing, Talend provide a component called tJavaRow which gets called for each row from input.

**To load any library use tLibraryLoad

And its all done..

So talend lets you integrate your java code in a job and doesn’t restrict you to its own functionalities.


Comments are closed.