Thursday, 12 June 2014

QualityStage Address Verification Interface

Scenario: Improving international address data

 You can use IBM® InfoSphere® QualityStage® Address Verification Interface to improve the quality of your address data without you needing to become an expert in international postal standards.
The following case study shows how a sample company uses the Address Verification stage to parse and validate worldwide address data.
Gleaming Green Cleansers (GGC) makes environmentally safe cleaning products. The company began as a small family business in Denmark in 1995 with a few products for cleaning the kitchen. Over the next decade, the cleansers became popular all over Europe. In the last few years, GGC acquired companies in France and Germany that manufactured complementary environmentally safe cleaning products for the bathroom and laundry. The expanded product line is so successful that GGC plans to release the products in Japan.

Business need: To manage international address data

As Gleaming Green Cleansers expanded beyond Denmark, the need to maintain accurate international address data became critical for the supply chain. The manufacturing plants, distributors, and retail outlets are spread among several countries and regions. Each area has unique postal standards and a different language. When GGC releases their products in Japan, they have the additional challenge of handling a different character set.
The Address Verification stage meets the current and future business needs of Gleaming Green Cleansers in the following ways:
  • Validates address data for each European country or region in which GGC does business:
    • Formats each address according to the postal standards of the country or region
    • Assesses the deliverability of each address so that products reach their destinations
  • Parses inconsistent address data from each acquired company as a step towards standardizing the format
  • Transliterates Japanese address data to convert the information into a common representation by using the Latin character set
IBM InfoSphere QualityStage Address Verification Interface provides a comprehensive solution for companies to maintain high-quality address data from all over the world.


Parsing, validating, and transliterating address data

You can configure the Address Verification stage to parse or validate address data. The stage also transliterates the data. You configure the stage in the IBM® InfoSphere® DataStage® and QualityStage® Designer client.
You can parse the addresses that are in your input stage. You assign address components, such as street or postal code, to a column.
You can validate the addresses against postal reference files to check the correctness of the data. The validation process assesses address deliver ability and provides a status such as very likely, fair chance, or unlikely to be deliverable.
You can also generate geographic location information and a summary report as part of address validation. A Validation Summary report shows the following items:
  • Total number of records processed
  • Number and percentage of records that the stage passed, failed, validated, corrected, or suggested another address for
  • Number and percentage of records that the stage failed because of postal code, city, street, country, or region
If you choose the validation processing type, ensure that you have access to current postal validation reference files.
Transliteration is performed on the address data after the processing. Transliteration converts addresses from one representation (script) to another. You can transliterate addresses in non-Latin languages, such as Greek or Hebrew, to the Latin character set. Or you can transliterate from Latin to a non-Latin, Native character set. Use transliterated addresses to store data consistently in one common writing system.
When you configure the Address Verification stage, you can use the Fast Path navigation in the stage editor as a shortcut. The required tabs are: Stage > Processing, which is where you select the parse or validation processing type, Stage > Options, Input > Address Columns, and Output > Mapping. You can select and modify any of the other available tabs in addition to the Fast Path tabs.

Configuring the Address Verification stage to process address data

You specify the type of process that you want and define the detail level of the output information and the format of the output address data.

Before you begin

Create a job that includes a stage that contains input address data, the Address Verification stage, and a stage to receive the output data. You can use a file stage, a database stage, or a processing stage to contain the input or output data. For example, you might use a database stage for the input data and a sequential file stage for the output data.
To improve performance of the Address Verification stage, sort the input data before you add the data to the job. Data that is sorted at a granular level improves job performance more than data that is sorted only at the country level. For example, to ensure that the job runs as quickly as possible, you might sort input data in the following order:
  1. By country
  2. By region or province
  3. By city
  4. By postal code

Procedure

  1. In the IBM® InfoSphere® DataStage® and QualityStage® Designer, double-click the Address Verification stage. The stage editor opens to the Stage > Processing page.
  2. On the Processing page, complete the required fields.
  3. In the Fast Path navigation area, click the forward arrow. The stage editor moves to the tabs for Fast Path: 2 of 4.
  4. On the Options page, specify how detailed you want the output information to be and the format of the output address data.
  5. In the Fast Path navigation area, click the forward arrow. The stage editor moves to the tabs for Fast Path: 3 of 4.

Assigning input columns to address fields

You assign input columns to address fields in the stage editor. For example, you can assign the column for a house number and the column for a street name.

Before you begin

Select and configure a processing type.

About this task

You can assign multiple columns to one address line. Place each additional column name after the one that is assigned first. For example, to assign two columns, named Recipient and Title, to a field, you can first place the Title column name in the address field. Then, place Recipient column name in the same field.
You do not have to assign every input column. For example, the input data might contain a column, such as Customer Since 1988, that you do not want to use in an address.

Procedure

  1. On the Input > Address Columns page of the stage editor, assign input columns from the left pane to address fields in the right pane.
    Assignment method Procedure
    Double-click
    1. In the right pane, select an address field.
    2. In the left pane, double-click an input column.
      The column populates the address field that you selected.
    Click and drag
    1. In the left pane, click an input column name.
    2. Drag the column name to an address field in the right pane.
  2. If your address data contains fewer or more address lines than the number of address fields that are shown in the right pane, adjust the Number of address lines to assign field to the number of lines in your data.
  3. In the Fast Path navigation area, click the forward arrow. The stage editor moves to the tabs for Fast Path: 4 of 4.

Mapping columns of data to output links

You map processed columns to output links to choose the processed data that you want to see in the output stages. Output links can include columns that originate from the input source or columns that are generated by the Address Verification stage.

Before you begin

Assign input columns to address fields.

About this task

The stage provides methods to map data. Output columns that are generated by the Address Verification stage are indicated by the suffix _QSAV, such as Organization_QSAV or Street_QSAV.

Procedure

  1. From the Output name column on the Output > Mapping page, select the output data link. This link shows the records that the stage processed successfully.
  2. Select columns of processed data from the left pane by using one of the following methods:
    • Select the columns that meet your business needs by clicking each column.
    • Select all of the columns by clicking the heading of the Columns table.
  3. Drag the selected columns of processed data to the table in the right pane, which shows the columns for the output link.
  4. If you set the Include Separate Output for Errors option to Yes when you selected the processing type, select the output data error link. This link shows records that the stage could not process because they contain incomplete data.
  5. Select the columns of error data and drag them to the right pane.
  6. Click OK.

Output columns

You can map all the available columns or select the columns that meet your business needs.
The following columns can be mapped to your data:
  • Organization_QSAV
  • Department_QSAV
  • Contact_QSAV
  • Function_QSAV
  • Building_QSAV
  • Subbuilding_QSAV (suite, unit, or floor)
  • HouseNumber_QSAV
  • Street_QSAV
  • POBOX_QSAV
  • Locality_QSAV (can be the city name)
  • DependentLocality_QSAV
  • DoubleDependentLocality_QSAV
  • PostCode_QSAV
  • PostalCodePrimary_QSAV
  • PostalCodeSecondary_QSAV
  • SuperAdministrativeArea_QSAV (the largest geographic unit of a country or region)
  • AdministrativeArea_QSAV (state, province, or other unit of a country or region)
  • SubAdministrativeArea_QSAV (the smallest geographic unit of a country or region, such as a county)
  • Country_QSAV
  • ISO3166_2_QSAV
  • ISO3166_3_QSAV
  • ISO3166_N_QSAV
  • Address_QSAV (complete and formatted address)
  • Residue_QSAV (information removed in processing)
  • GeoAccuracy_QSAV
  • Latitude_QSAV
  • Longitude_QSAV
  • GeoDistance_QSAV

Errors

Some error conditions cause the stage to stop processing. When one of these conditions occurs, the InfoSphere® DataStage® and QualityStage® Director job log view displays the error.

Error messages written to the Director log

Error messages are sent to the Director job log when a file, such as an input, output, or report file, cannot be opened. Files cannot be opened if the specified directory does not exist or write permission to the specified directory is not set.
Error messages are as follows:
Table 1. Error messages that are displayed in the log file
Error ID Error Message
IIS-DSEE-AVIF-00101 The liblqtcr.so library has not yet been successfully initialized. Check that the directory specified for the reference files is correct, or you may need to complete the installation process again
IIS-DSEE-AVIF-00102 The directory that you specified does not contain the reference files.
IIS-DSEE-AVIF-00103 The installed unicode.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the unicode reference file.
IIS-DSEE-AVIF-00104 The installed country.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the country reference file.
IIS-DSEE-AVIF-00105 The installed context.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the context reference file.
IIS-DSEE-AVIF-00106 The installed format.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the format reference file.
IIS-DSEE-AVIF-00107 The installed lexicon.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the lexicon reference file.
IIS-DSEE-AVIF-00108 The reference file version is invalid. Contact your IBM Account Team to update the reference file version
IIS-DSEE-AVIF-00110 An exception has occurred.

Error messages written to the output error link

When you validate addresses and include the ErrorCode_QSAV and ErrorMessage_QSAV columns in the output, you see any errors associated with input addresses.
Table 2. Error codes and messages for unprocessed records
Value Description
10 An exception has occurred.
12 The input record contains invalid data. The problem might be non-UTF8 or non-Unicode data contained in the record.