Scenario: Improving international address data
You can use IBM® InfoSphere® QualityStage® Address Verification
Interface to
improve the quality of your address data without you needing to become
an expert in international postal standards.
The following case study shows how a sample company uses
the Address
Verification stage to parse and validate worldwide address data.
Gleaming
Green Cleansers (GGC) makes environmentally safe cleaning
products. The company began as a small family business in Denmark
in 1995 with a few products for cleaning the kitchen. Over the next
decade, the cleansers became popular all over Europe. In the last
few years, GGC acquired companies in France and Germany that manufactured
complementary environmentally safe cleaning products for the bathroom
and laundry. The expanded product line is so successful that GGC plans
to release the products in Japan.
Business need: To manage international address data
As
Gleaming Green Cleansers expanded beyond Denmark, the need to maintain
accurate international address data became critical for the supply
chain. The manufacturing plants, distributors, and retail outlets
are spread among several countries and regions. Each area has unique
postal standards and a different language. When GGC releases their
products in Japan, they have the additional challenge of handling
a different character set.
The Address Verification stage meets
the current and future business needs of Gleaming Green Cleansers
in the following ways:
- Validates address data for each European
country or region in
which GGC does business:
- Formats each address according to the postal standards of the country or region
- Assesses the deliverability of each address so that products reach their destinations
- Parses inconsistent address data from each acquired company as a step towards standardizing the format
- Transliterates Japanese address data to convert the information into a common representation by using the Latin character set
IBM InfoSphere QualityStage Address Verification
Interface provides
a comprehensive solution for companies to maintain high-quality address
data from all over the world.
Parsing, validating, and transliterating address data
You can configure the Address Verification stage to parse
or validate address data. The stage also transliterates the data.
You configure the stage in the IBM® InfoSphere® DataStage® and QualityStage® Designer client.
You can parse the addresses that are in your input stage. You assign
address components, such as street or postal code, to a column.
You can validate the addresses against postal reference files to
check the correctness of the data. The validation process assesses
address deliver ability and provides a status such as very likely,
fair chance, or unlikely to be deliverable.
You can also generate geographic location information and a
summary report as part of address validation. A Validation Summary
report shows the following items:
- Total number of records processed
- Number and percentage of records that the stage passed, failed, validated, corrected, or suggested another address for
- Number and percentage of records that the stage failed because of postal code, city, street, country, or region
If you choose the validation processing type, ensure that you have
access to current postal validation reference files.
Transliteration is performed on the address data after the processing.
Transliteration converts addresses from one representation (script)
to another. You can transliterate addresses in non-Latin languages,
such as Greek or Hebrew, to the Latin character set. Or you can transliterate
from Latin to a non-Latin, Native character set. Use transliterated
addresses to store data consistently in one common writing system.
When you configure the Address Verification stage, you can use
the Fast Path navigation in the stage editor as a shortcut. The required
tabs are:
, which is where you select the parse or validation
processing type, , , and . You can select and modify
any of the other available tabs in addition to the Fast Path tabs.- Configuring the Address Verification stage to process address data
You specify the type of process that you want and define the detail level of the output information and the format of the output address data. - Assigning input columns to address fields
You assign input columns to address fields in the stage editor. For example, you can assign the column for a house number and the column for a street name. - Mapping columns of data to output links
You map processed columns to output links to choose the processed data that you want to see in the output stages. Output links can include columns that originate from the input source or columns that are generated by the Address Verification stage.
Configuring the Address Verification stage to process address data
You specify the type of process that you want and define
the detail level of the output information and the format of the output
address data.
Before you begin
Create a job that includes a stage that contains input
address data, the Address Verification stage, and a stage to receive
the output data. You can use a file stage, a database stage, or a
processing stage to contain the input or output data. For example,
you might use a database stage for the input data and a sequential
file stage for the output data.
To improve performance of the Address Verification
stage, sort the input data before you add the data to the job. Data
that is sorted at a granular level improves job performance more than
data that is sorted only at the country level. For example, to ensure
that the job runs as quickly as possible, you might sort input data
in the following order:
- By country
- By region or province
- By city
- By postal code
Procedure
- In the IBM® InfoSphere® DataStage® and QualityStage® Designer, double-click the Address Verification stage. The stage editor opens to the page.
- On the Processing page, complete the required fields.
- In the Fast Path navigation area, click the forward arrow. The stage editor moves to the tabs for Fast Path: 2 of 4.
- On the Options page, specify how detailed you want the output information to be and the format of the output address data.
- In the Fast Path navigation area, click the forward arrow. The stage editor moves to the tabs for Fast Path: 3 of 4.
Assigning input columns to address fields
You
assign input columns to address fields in the stage
editor. For example, you can assign the column for a house number
and the column for a street name.
Before you begin
About this task
You can assign multiple columns to one address line. Place
each additional column name after the one that is assigned first.
For example, to assign two columns, named Recipient and Title, to
a field, you can first place the Title column name in the address
field. Then, place Recipient column name in the same field.
You
do not have to assign every input column. For example, the input data
might contain a column, such as Customer Since 1988, that you do not
want to use in an address.
Procedure
Mapping columns of data to output links
You map processed columns to output links to choose the
processed data that you want to see in the output stages. Output links
can include columns that originate from the input source or columns
that are generated by the Address Verification stage.
Before you begin
About this task
The stage provides methods to map data. Output columns
that are generated by the Address Verification stage are indicated
by the suffix _QSAV, such as Organization_QSAV or Street_QSAV.
Procedure
- Output columns
You can map all the available columns or select the columns that meet your business needs.
Output columns
You can map all the available columns or select the columns
that meet your business needs.
The following columns can be mapped to your data:- Organization_QSAV
- Department_QSAV
- Contact_QSAV
- Function_QSAV
- Building_QSAV
- Subbuilding_QSAV (suite, unit, or floor)
- HouseNumber_QSAV
- Street_QSAV
- POBOX_QSAV
- Locality_QSAV (can be the city name)
- DependentLocality_QSAV
- DoubleDependentLocality_QSAV
- PostCode_QSAV
- PostalCodePrimary_QSAV
- PostalCodeSecondary_QSAV
- SuperAdministrativeArea_QSAV (the largest geographic unit of a country or region)
- AdministrativeArea_QSAV (state, province, or other unit of a country or region)
- SubAdministrativeArea_QSAV (the smallest geographic unit of a country or region, such as a county)
- Country_QSAV
- ISO3166_2_QSAV
- ISO3166_3_QSAV
- ISO3166_N_QSAV
- Address_QSAV (complete and formatted address)
- Residue_QSAV (information removed in processing)
- GeoAccuracy_QSAV
- Latitude_QSAV
- Longitude_QSAV
- GeoDistance_QSAV
Errors
Some error conditions cause the stage to stop processing.
When one of these conditions occurs, the InfoSphere® DataStage® and QualityStage® Director job
log view displays the error.
Error messages written to the Director log
Error
messages are sent to the Director job log when a file, such as an
input, output, or report file, cannot be opened. Files cannot be opened
if the specified directory does not exist or write permission to the
specified directory is not set.
Error messages are as follows:
Error ID | Error Message |
---|---|
IIS-DSEE-AVIF-00101 | The liblqtcr.so library has not yet been successfully initialized. Check that the directory specified for the reference files is correct, or you may need to complete the installation process again |
IIS-DSEE-AVIF-00102 | The directory that you specified does not contain the reference files. |
IIS-DSEE-AVIF-00103 | The installed unicode.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the unicode reference file. |
IIS-DSEE-AVIF-00104 | The installed country.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the country reference file. |
IIS-DSEE-AVIF-00105 | The installed context.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the context reference file. |
IIS-DSEE-AVIF-00106 | The installed format.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the format reference file. |
IIS-DSEE-AVIF-00107 | The installed lexicon.lfs reference file does not match the installed software level. Contact your IBM Account Team to update the lexicon reference file. |
IIS-DSEE-AVIF-00108 | The reference file version is invalid. Contact your IBM Account Team to update the reference file version |
IIS-DSEE-AVIF-00110 | An exception has occurred. |
Error messages written to the output error link
When
you validate addresses and include the ErrorCode_QSAV and ErrorMessage_QSAV
columns in the output, you see any errors associated with input addresses.
Value | Description |
---|---|
10 | An exception has occurred. |
12 | The input record contains invalid data. The problem might be non-UTF8 or non-Unicode data contained in the record. |