Monday, 22 June 2015

Configure XML Input Stage with EXTERNAL SOURCE Stage

In this article we will see two important stages in Datastage with a simple scenario:
           1.       XML Input Stage in Datastage

           2.       External Source Srage
Here External Source stage used to read the XML file path stored. Advantge of External Source is, we can use unix commands to list or read files. This file path is used to convert data stored in XML file in tabular format. Let’s study these two stages in detail by implementing above scenario step by step.
Step #1: Design your job structure like below.
Job Design for XML Input
In above figure External Source stage is a input stage named extrnl_src_books_det. Target is a Dataset file named ds_trgt_book_det.
XML Input named XML_Input which will be used to convert XML data into tabular form.
Step #2: Double click on External Source stage, following window will pop up.
Configure External Source Stage
To configure External Source Stage we can use Unix commands to read a file path. Mention Specific Program(s) as we are going get list of XML files stored by using a command in Source Program as shown in above figure.
Filepath is a Job Parameter used here which has value G:\Study\Project.
ls #Filepath#*oks.XML will list the all files of having oks.XML.
Step #3: To read this file path we need to define metadata as shown below.
Define Column
After defining metadata click on View Data shown at right upper corner of above window, it will show following output.
C:\Study\Project\Books.XML
This is the required file which we will use for our next process and in this way we used External Source Stage to read the file path of a XML file. Now our next job is to convert this XML file data into tabular form.
Step #4: Now to load above XML file we need to define Table Definitions first. Follow the following procedure to load XML Table Definition.
  1. Click Import -> Table Definition -> XML Table Definitions through Title Bar.
  2. XML Meta Data Importer window will pop up
  3. Open XML file through File -> Open -> Required XML File
  4. Column names will pop up.
  5. Drop down each column and check box against Text.
  6. Under Table Definition pane you can see the Description which gives the address of each and every column.
  7. Save this as BooksTD.
  8. Refer below image for better understanding.
XML Table Definition
Step #5: Now we have to configure XML Input stage. Double click on it, following window will pop up.
Confiure XML Input Stage
Select XML Source Column as File_name as we have declared it in Metadata for External Source stage in previous step. Here shown by green box.
As External Source stage has XML file path as output, select URL/File path as shown by red box in above image.
Step #6: This step is to load the XML table definition which we have created in Step #4.
Under Output tab we need to define Columns here. For that purpose click on Load button at the bottom and load the XML table definition which we have saved earlier i.e. BooksBT. After loading it you can see the metadata in following two images.
Load XML Table Definition

Load XML Table Definition 1
Step #7: Compile and Run the job. View Output on target Dataset file you will get the tabular form of XML file.
Hope this tutorial to configure XML Input Stage in Datastage is useful to you.

5 comments:

  1. thanks you so much

    ReplyDelete
  2. thanks you so much

    ReplyDelete
  3. Thank you so much. This saved my day :)

    ReplyDelete
  4. is this designed in server or parallel job

    ReplyDelete
  5. hi, I am trying to open the file attached in step#3, define column one, but I'm unable to open it

    ReplyDelete