Introduction
The intention of this tutorial is to give novice developers a quick start with loading XML data using a DataStage parallel job.Steps
Step 1:Create a simple XML file named test.xml
<xml> <customer>Mike</customer> <customer>Anna</customer> </xml>
Step 2:
Create a new DataStage parallel job with 3 stages linked together: A sequential file stage, XML input stage (located under the Real Time category), and a peek stage.
Step 3:
The first trick is to load the entire XML file into a single column of a single row. You do this by creating a column in the sequential file stage of type LongVarChar[Max=9999]. In this example the max size is arbitrary. Set the input file to test.xml. Next, remove all properties in the [Format] tab and add these two:
In the Record level:
Record type=implicit
In the Field defaults:
Delimiter=none
Step 4:
Now that we have the XML in a single column then we can set the XML input stage properties. In the [Transformation settings] tab under the [Stage] tab check the [Repetition element required] tag. In the [Input] tab select the column that you defined in step 3 and check the [XML document] box. In the [Output] tag define a column named [customer] of type varchar[max=255]. Set it as the key. In the description box enter the xml path. In this case /xml/customer/text()
Tip: To reference XML attributes you would use @. For example: /xml/customer/@id would equal 1 when using this xml: <xml><customer id=”1”>Mike</customer></xml>
Step 5:
Compile and run. Peek will produce log records that list the customers from the XML file.
No comments:
Post a Comment