Saturday, 24 October 2015

SaveInputRecord in Datastage Example

You can use the Transformer stage to add aggregated information to output rows.

Aggregation operations make use of a cache that stores input rows. You can monitor the number of entries in the cache by setting a threshold level in the Loop Variable tab of the Stage Properties window. If the threshold is reached when the job runs, a warning is issued into the log, and the job continues to run.

Input row group aggregation included with input row data

You can save input rows to a cache area, so that you can process this data in a loop.

For example, you have input data that has a column holding a price value. You want to add a column to the output rows. The new column indicates what percentage the price value is of the total value for prices in all rows in that group. The value for the new Percentage column is calculated by the following expression.

(price * 100)/sum of all prices in group

In the example, the data is sorted and is grouped on the value in Col1.

Col1	Col2	Price
1000	abc	100.00
1000	def	20.00
1000	ghi	60.00
1000	jkl	20.00
2000	zyx	120.00
2000	wvu	110.00
2000	tsr	170.00

The percentage for each row in the group where Col1 = 1000 is calculated by the following expression.

(price * 100)/200

The percentage for each row in the group where Col1 = 2000 is calculated by the following expression.

(price * 100)/400

The output is shown in the following table.

Col1	Col2	Price	Percentage
1000	abc	100.00	50.00
1000	def	20.00	10.00
1000	ghi	60.00	30.00
1000	jkl	20.00	10.00
2000	zyx	120.00	30.00
2000	wvu	110.00	27.50
2000	tsr	170.00	42.50

This scenario uses key break facilities that are available on the Transformer stage. You can use these facilities to detect when the value of an input column changes, and so group rows as you process them.

This scenario is implemented by storing the grouped rows in an input row cache and processing them when the value in a key column changes. In the example, the grouped rows are processed when the value in the column named Col1 changes from 1000 to 2000. Two functions, SaveInputRecord() and GetSavedInputRecord(), are used to add input rows to the cache and retrieve them. SaveInputRecord() is called when a stage variable is evaluated, and returns the count of rows in the cache (starting at 1 when the first row is added). GetSavedInputRecord() is called when a loop variable is evaluated.

To implement this scenario in the Transformer stage, make the following settings:

Stage variable

Define the following stage variables:

NumSavedRows: SaveInputRecord()
IsBreak: LastRowInGroup(inlink.Col1)
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

The loop continues to iterate for the count specified in the NumRows variable.

Loop variables

Define the following loop variable:

SavedRowIndex: GetSavedInputRecord()

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - inlink.Col1
Col2 - inlink.Col2
Price - inlink.Price
Percentage - (inlink.Price * 100)/TotalPrice

SaveInputRecord() is called in the first Stage Variable (NumSavedRows). SaveInputRecord() saves the current input row in the cache, and returns the count of records currently in the cache. Each input row in a group is saved until the break value is reached. At the last value in the group, NumRows is set to the number of rows stored in the input cache. The Loop Condition then loops round the number of times specified by NumRows, calling GetSavedInputRecord() each time to make the next saved input row current before re-processing each input row to create each output row. The usage of the inlink columns in the output link refers to their values in the currently retrieved input row, so will change on each output loop.

Caching selected input rows

You can call the SaveInputRecord() within an expression, so that input rows are only saved in the cache when the expression evaluates as true.

For example, you can implement the scenario described, but save only input rows where the price column is not 0. The settings are as follows:

Stage variable

Define the following stage variables:

IgnoreRow: IF (inlink.Price = 0) THEN 1 ELSE 0
NumSavedRows: IF IgnoreRecord THEN SavedRowSum ELSE SaveInputRecord()
IsBreak: LastRowInGroup(inlink.Col1)
SavedRowSum: IF IsBreak THEN 0 ELSE NumSavedRows
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

Loop variables

Define the following loop variable:

SavedRowIndex: GetSavedInputRecord()

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - inlink.Col1
Col2 - inlink.Col2
Price - inlink.Price
Percentage - (inlink.Price * 100)/TotalPrice

This example produces output similar to the previous example, but the aggregation does not include Price values of 0, and no output rows with a Price value of 0 are produced.

Outputting additional generated rows

This example is based on the first example, but, in this case, you want to identify any input row where the Price is greater than or equal to 100. If an input row has a Price greater than or equal to 100, then a 25% discount is applied to the Price and a new additional output row is generated. The Col1 value in the new row has 1 added to it to indicate an extra discount entry. The original input row is still output as normal. Therefore any input row with a Price of greater than or equal to 100 will produce two output rows, one with the discounted price and one without.

The input data is as shown in the following table:

Col1	Col2	Price
1000	abc	100.00
1000	def	20.00
1000	ghi	60.00
1000	jkl	20.00
2000	zyx	120.00
2000	wvu	110.00
2000	tsr	170.00

The required table is shown in the following table:

Col1	Col2	Price	Percentage
1000	abc	100.00	50.00
1001	abc	75.00	50.00
1000	def	20.00	10.00
1000	ghi	60.00	30.00
1000	jkl	20.00	10.00
2000	zyx	120.00	30.00
2001	zyx	90.00	30.00
2000	wvu	110.00	27.50
2001	wvu	82.50	27.50
2000	tsr	170.00	42.50
2001	tsr	127.50	42.50

To implement this scenario in the Transformer stage, make the following settings:

Stage variable

Define the following stage variables:

NumSavedRowInt: SaveInputRecord()
AddRow: IF (inlink.Price >= 100) THEN 1 ELSE 0
NumSavedRows: IF AddRow THEN SaveInputRecord() ELSE NumSavedRowInt
IsBreak: LastRowInGroup(inlink.Col1)
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

The loop continues to iterate for the count specified in the NumRows variable.

Loop variables

Define the following loop variables:

SavedRowIndex: GetSavedInputRecord()
AddedRow: LastAddedRow
LastAddedRow: IF (inlink.Price < 100) THEN 0 ELSE IF (AddedRow = 0) THEN 1 ELSE 0

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - IF (inlink.Price < 100) THEN inlink.Col1 ELSE IF (AddedRow = 0) THEN inlink.Col1 ELSE inlink.Col1 + 1
Col2 - inlink.Col2
Price - IF (inlink.Price < 100) THEN inlink.Price ELSE IF (AddedRow = 0) THEN inlink.Price ELSE inlink.Price * 0.75
Percentage - (inlink.Price * 100)/TotalPrice

SaveInputRecord is called either once or twice depending on the value of Price. When SaveInputRecord is called twice, in addition to the normal aggregation, it produces the extra output record with the recalculated Price value. The Loop variable AddedRow is used to evaluate the output column values differently for each of the duplicate input rows.

Runtime errors

The number of calls to SaveInputRecord() and GetSavedInputRecord() must match for each loop. You can call SaveInputRecord() multiple times to add to the cache, but once you call GetSavedInputRecord(), then you must call it enough times to empty the input cache before you can call SaveInputRecord() again. The examples described can generate runtime errors in the following circumstances by not observing this rule:

If your Transformer stage calls GetSavedInputRecord before SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 1 but only 0 records saved by save_record()
```
If your Transformer stage calls GetSavedInputRecord more times than SaveInputRecord is called, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 3 but only 2 records saved by save_record()
```
If your Transformer stage calls SaveInputRecord but does not call GetSavedInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 0 records retrieved by get_record()
```
If your Transformer stage does not call GetSavedInputRecord as many times as SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 2 records retrieved by get_record()
```

Saturday, 17 October 2015

SAP Data Loading using BAPI Stage

Defining Keywords of SAP R/3

ABAP (Advanced Business Application Programming)

The language developed by SAP for application development purposes. All R/3 applications are written in ABAP.

IDOC (Intermediate Document)

IDoc is a standard data structure for electronic data interchange (EDI) between application programs written for the popular SAP business system or between an SAP application and an external program. It provides a standard format for exchanging data with R/3 Enterprise. This interface should be used primarily for bulk data transfers when the desired data set is already represented by an available IDoc. It is used to load data into R/3 system.

BAPI (Business Application Programming Interface)

A precisely defined interface providing access to processes and data in business application systems. BAPIs are defined as API methods of SAP objects. These objects and their methods are stored in the Business Objects Repository. Most suited to transactional environments where the efficiency of mass data transfers is not a requirement.

WebSphere DataStage Pack for SAP R/3

The WebSphere® DataStage™ Pack for SAP R/3 helps you extract data from and load data into SAP R/3 and all mySAP Business Suite application modules. The IBM WebSphere DataStage Pack for SAP R/3 includes the following stages and utility:

ABAP Extract

Lets WebSphere DataStage extract data from the R/3 repository using the ABAP extraction program generated by the stage.

IDoc Extract

Let’s WebSphere DataStage capture IDocs from R/3 source systems to be used as source data for WebSphere DataStage job.

IDoc Load

Generates IDocs to load data into R/3 Enterprise.

BAPI

Loads data into and extracts data from R/3 Enterprise.

Administrator for SAP

Manages the configurations of R/3 connection and IDoc type objects.

Data Loading through BAPI Stage in Datatage

1. BAPI stage Architecture

2. BAPI Structure

BAPI is a business API that exposes business objects to external applications. The business objects can be executed synchronously or asynchronously. Invoked via RPC. There are several hundred delivered BAPIs, companies can create their own BAPIs. Structure of a business object and BAPI is in the below picture.

BAPI’s in R/3 implemented as function modules.

BAPI supports Remote Function Call (RFC) protocol.

BAPI are processed without returning screen dialog to calling application.

BAPI Interface defined by:

Import Parameters
Export Parameters
Import/Export (table) parameters

3. BAPI Load Design Steps

Ø The GUI client logs on to the SAP R/3 application server and retrieves metadata information from BOR.

Ø The GUI client stores BAPI metadata in the DataStage Repository.

Ø The run-time server reads the metadata from the DataStage Repository and dynamically builds a BAPI call.

Ø The run-time server makes a BAPI call and processes the generated dataset.

Ø The returned values are written to the appropriate logs.

4. BAPI Stage in WebSphere

5. Sample BAPI Load job

6. BAPI Stage Properties

This stage dialog has three pages – STAGE, INPUT and OUTPUT

Stage Page

· This page displays the name of the stage you are editing and contains the General, BAPI and BAPI Parameters pages. The selected SAP connection parameters are displayed on the General tab on the Stage page.

· Use the BAPI tab of the Stage page to select the BAPI name to load (see Selecting BAPI).

· Using BAPI Paremeters tab pass the necessary parameter as per the requirement.

· The NLS page defines a character set map to use with the stage. This page opens only if you have installed NLS for InfoSphere DataStage (see Defining character set mapping).

6.1. BAPI Stage: Stage Page-General Tab

DataStage Connection to SAP - The InfoSphere DataStage connection to the SAP R/3 system that is defined on the server machine and shared by all users connected to that machine.

-> Name - The name of the selected connection to the SAP R/3 system that stores the BAPI. The logical connection includes RFC client logon details.

-> Select - Click to choose an InfoSphere DataStage connection to an existing SAP R/3 system. The Select InfoSphere DataStage Connection to SAP dialog opens. The selected connection provides all needed connection and default logon details that are needed to communicate with the corresponding SAP R/3 system. The default SAP Logon Details are used if Use connection defaults is selected.

-> New – Using this option you can create a new connection.

-> Use job parameter – Use the DS parameters in SAP Logon Details (User, Password, Client and Language) defined in your job/project. After that, again select on the connection option to parameterize the SAP system using the option ‘Use job parameter’ option and write the parameter name as defined in the job/project and then click OK.

-> Clear job parameter – This option is required to clear the existing job parameter and entry a new one.

-> Clear connection – This is for clear the existing connection.

Description - Additional information about the selected connection.

-> Application Server - The name of the host system running R/3. If the selected connection uses load balancing, Message Server is used instead.

-> System Number - The number assigned to the SAP R/3 system used to connect to R/3. If the selected connection uses load balancing, System ID is used instead.

SAP Logon Details - The fields in this area are read-only unless you clear the Use connection defaults box.

– User Name - The user name that is used to connect to SAP.

– Password - A password for the specified user name.

– Client Number - The number of the client system used to connect to SAP.

– Language - The language used to connect to SAP.

– Use connection defaults - Clear to remove the default SAP Logon Details settings so you can use different logon information for only this stage in this job. If selected, the displayed logon details are obtained from the selected connection and are disabled.

Enter an optional description to describe the purpose of the stage in the Description field.
Validate All - Select to check the stage and link properties and the column lists for consistency and completeness. Always check this once any of the connection details is updated.

To start with BAPI load stage, open the stage and then click on General tab.

After that click on button to select the available connections to SAP system configured before.

Below pop up will appear with list of configured SAP systems. Select the required system connection.

6.2. BAPI Stage: Stage Page-BAPI Tab

Using the BAPI tab we can select the BAPI to be loaded.

To configure BAPI name:

Click on BAPI Exploer tab of the Stage page to open the Select Connection Details dialog.

Select the proper connection details to Find the BAPI name. After Click OK a pop up window will come up.

A list of all released BAPI defined on the R/3 system will appear to select from ‘Show Released BAPIs’. Or we can go with the specific BAPI name using the option ‘Show RFMs’. Select the ‘Show RFMs’ and type the BAPI name and click OK.

Click OK to set proper BAPI name for the stage.

6.3. BAPI Stage: Stage Page-BAPI Parameters-Import Tab

When a BAPI is first selected, only required parameters are active on the BAPI Import tab of the Input page. For input links, fields that are required for import parameter appear on the Columns page.

Activate/Deactivate the parameter

Green icons beside the parameter names indicate that the parameters are active, That is, used to dynamically build BAPI calls at run time. (Red icons indicate That parameters are inactive, unused when calling BAPIs.) Green icons display I or E to indicate whether table parameters are activated for Import or Export. To activate or deactivate parameters, do one of the following:

. Double-click a parameter name

. Right-click for a shortcut menu

6.4. BAPI Stage: Stage Page-BAPI Parameters-Table Tab

Click the Tables tab on the Input BAPI page to view the table parameters of a BAPI. Fields that are required for table parameters appear on the Columns page.

Activate/Deactivate the parameter for Input or Output

Green icons beside the parameter names indicate that the parameters are active, that is, used to load that table using build BAPI calls at run time. (Red icons indicate that parameters are inactive, unused when calling BAPIs.)

To activate any of the parameter Right-click for a shortcut menu and select it.

· Activate Selected Parameter for Input – select this option to load any particular table in SAP system

· Activate Selected Parameter for Output - select this option to capture the data into file/dataset/table

· Activate Selected Parameter for Input and Output- it will load SAP system and the same time you can capture the same record into file/dataset/table

· De-Activate Selected Parameter – to de activate the parameter

6.5. BAPI Stage: Return Table

In the table list there is Return table which return the parameters. This table is very important for de-bugging purpose. It writes the status of the loaded records whether it is loaded success or fail.

By the filed Message type (T_RETURN_BAPIRET2_TYPE): S Success, E Error, W Warning, I Info, A Abort

The structure of the Return table shown as below

6.6. BAPI Stage: Stage Page-BAPI Parameters-Export Tab

Click the Export tab on the Input BAPI page to view the export parameters of a BAPI. By default, export parameters are optional and initially inactive. Fields that are required for export parameters appear in a grid on the Input Logs Page, not on the Input Columns page, because they contain return values from the BAPI call. The reference structure being used for each parameter appears with a short text Description of the parameter. This helps you decide whether to activate a particular
parameter.

7. Data Validation in SAP environment

Once the Datastage load job execute successfully, you need to do the data validation in SAP environment. To do data validation use the following steps.

Enter SAP table name into the table name section and press enter.

Provide the primary key value, create date and time. Click on execute button (F8)

Below Screen shows all processed records on the given date and time.

Provide the primary key value, create date and time. Click on Number of Entries button. It will provide the count of the processed records for the given date and time.

8. BAPI Load Error Analysis

By selecting the Return table in the Output, you can capture the BAPI Return table log report into file/dataset. It writes the status of the loaded records whether it is loaded success or fail. By the filed Message type (T_RETURN_BAPIRET2_TYPE): S Success, E Error, W Warning, I Info, A Abort.

The detailed error report is mentioned in the field T_RETURN_BAPIRET2_MESSAGE

9. BAPI stage and it’s disadvantage

It is experienced that for huge volume of data the BAPI stage performs very slow. For a volume like more than 10K, it slow down the entire SAP load process. Though we don’t know the actual reason for this delay of the performance. But it is also experienced and tested if the total input data can divided into some logical chunks and call the BAPI for one chunk at a time, we can get some better performance compare to process all the data in a single shot.

For each and every BAPISEQNO generate a unique number using the below derivation.

StageVariabe = @PARTITIONNUM + (@NUMPARTITIONS * @INROWNUM - 1) +1

For example, see the below table

BapiSeqNo	StageVariable
7000	11
7001	12
7002	13
7003	14
7004	15
7005	16

The logical group can be made using the below derivation.

Batch = Mod(StageVariable,p_Batch) + 1

Where p_Batch is the no. of logical group, user want to create based on the input volume of the data.

Batch shows the formulated group number associated with each and individual BAPISEQNO.

For example, the user want to divide the above dataset into 3 logical group. That means p_Batch = 3

BapiSeqNo	StageVariable	Batch
7000	11	3
7001	12	1
7002	13	2
7003	14	3
7004	15	1
7005	16	2

Finally we can design a sequence job where using loops we can easily pass the Batch list so it can execute the BAPI load job for one group at a time.