The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
Lookup stages do not require data on the input link or reference links to be sorted. Be aware, though, that large in-memory lookup tables will degrade performance because of their paging requirements. Each record of the output data set contains columns from a source record plus columns from all the corresponding lookup records where corresponding source and lookup records have the same value for the lookup key columns. The lookup key columns do not have to have the same names in the primary and the reference links.
The optional reject link carries source records that do not have a corresponding entry in the input lookup tables.
You can also perform a range lookup, which compares the value of a source column to a range of values between two lookup table columns. If the source column value falls within the required range, a row is passed to the output link. Alternatively, you can compare the value of a lookup column to a range of values between two source columns. Range lookups must be based on column values, not constant values. Multiple ranges are supported.
There are some special partitioning considerations for Lookup stages. You need to ensure that the data being looked up in the lookup table is in the same partition as the input data referencing it. One way of doing this is to partition the lookup tables using the Entire method.
Lookup stage Configuration:Equal lookup


You can specify what action need to perform if lookup fails.
Scenario1: Continue

Choose entire partition on the reference link



Scenario2:Fail

stg_Lkp,0: Failed a key lookup for record 2 Key Values: CUSTOMER_ID: 3
Scenari03:Drop






Business scenario:we have input data with customer id and customer name and transaction date.We have customer dimension table with customer address information.Customer can have multiple records with different start and active dates and we want to select the record where incoming transcation date falls between start and end date of the customer from dim table.
Ex Input Data:
CUSTOMER_ID | CUSTOMER_NAME | TRANSACTION_DT |
1 | UMA | 2011-03-01 |
1 | UMA | 2010-05-01 |
CUSTOMER_ID | CITY | ZIP_CODE | START_DT | END_DT |
1 | BUENA PARK | 90620 | 2010-01-01 | 2010-12-31 |
1 | CYPRESS | 90630 | 2011-01-01 | 2011-04-30 |
CUSTOMER_ID | CUSTOMER_NAME | TRANSACTION_DT | CITY | ZIP_CODE |
1 | UMA | 2011-03-01 | CYPRESS | 90630 |
1 | UMA | 2010-05-01 | BUENA PARK | 90620 |



You need to specify return multiple rows from the reference link otherwise you will get following warning in the job log.Even though we have two distinct rows base on customer_id,start_dt and end_dt columns but datastage is considering duplicate rows based on customer_id key only.
stg_Lkp,0: Ignoring duplicate entry; no further warnings will be issued for this table

Compile and Run the job:


Scenario 2:Specify range on reference link:


This concludes lookup stage configuration for different scenarios.
- Like Join and Merge stage, Look up stage has multiple input links, one is primary and others are reference, according to which lookup operation takes place.
- But it doesn’t have condition like Merge stage i.e. ‘Reject Links’ should be equal to update input links also it not require data on any of the input links to be sorted. This stage provides four conditions depend on which future of output data depends. We will see these conditions in ‘Step 4’.
Table 1
ID
|
First Name
|
Last Name
|
Location
|
Network ID
|
EmailID
|
1
|
Jach
|
Simmons
|
Chicago
|
JS524
|
Letsc@gmail.com
|
2
|
Shumas
|
Jane
|
LA
|
Sj145
|
Jaene@ymail.com
|
3
|
Jonty
|
Waughn
|
Sydney
|
JW927
|
JontyW@sdbh.com
|
4
|
Suhana
|
Safar
|
Maxico
|
SS99
|
Sas@gmail.com
|
ID
|
Dept
|
Dept Head
|
1
|
Electronics
|
Paul
|
3
|
CS
|
Jack
|
4
|
TS
|
Summur
|
5 |
IT
|
Sean
|
ID
|
Training Cent
|
1
|
CKG
|
2
|
AMD
|
3
|
WC
|

Consider Employee table as Primary link as shown. Depending upon each record in Primary link, Look Up Stage performs look up operation on Reference Link according with key column.
Consider Employee’s department detail i.e. table 2 and Employee’s Training Center detail i.e. Table 3 as data on two reference links.
Step 2 : Now we are going to configure Look Up Stage (Named as lkp_emp_det in design). Double click on Look Up Stage. Following window will pop up. Left pane is for all inputs and right pane is for output. First link detail table is for Primary link. Second and third are for reference links. Order of these reference links can be changed by using


Step 3 : In left pane, map Key column (here ‘ID’) by just dragging it to the respective key column in reference links. Map all remaining required column to right pane as shown.
Step 4 : One of the most important step is to set look up conditions which we can do by using 2nd option on Title bar. Just click on it, following window will pop up.

There is a list of reference links in ‘Link Name’ column. In ‘Condition’ column we can give conditions for each reference link. If this condition will not meet then what will happen to that data is decided by ‘Condition Not Met’ column and if lookup fails it is decided by ‘Lookup Failure’ column.
Continue : Data will be sent to the Output link.
Reject : Data will be sent to the Reject link.
Drop : Data will neither go to Output link nor to Reject link.
Fail : Job will fail.
In our case, let’s first try for without condition in ‘Condition’ column and ‘Continue’ and ’Reject’ in other columns.
Step 5 : Compile and run the JOB. Let’s see what the output is.
Output : Stream link

It’s showing two records. As we have given ‘Lookup Failure’ condition as ‘Reject’, those records from primary link which are not matched with reference link data are collected in Reject Link ‘rjct_primary_rec’ as shown below.
Output : Reject Link

Step 6 : Let’s try to configure for ‘Condition’ column in ‘Lookup Stage Conditions’ pane.
Just put condition as ID=3 and ‘Reject’ under ‘Condition Not Met’ as shown below.

Except ID=3 all records will get rejected and get stored in ‘Reject link’. Here data for ID=2 get rejected and we will get output for Stream link as shown.

Output for ‘Reject Link’

Note : Reject Link shows rejected record from primary input link only.
Nice Post .. i was able to solve the problem with your suggestion --"You need to specify return multiple rows from the reference link " ..
ReplyDeleteThanks a lot.
thank you for this article. what if i have a situation where i want to drop if condition is met (conversely, continue if condition is not met)?
ReplyDelete