Datastage for Learners: Interview Questions

SQL :-

1. What is the difference between primary key, foreign key and surrogate key ?
Ans :- A primary key is a special constraint on a column or set of
columns. A primary key constraint ensures that the column(s) so
designated have no NULL values, and that every value is unique.
Physically, a primary key is implemented by the database system
using a unique index, and all the columns in the primary key
must have been declared NOT NULL. A table may have only one
primary key, but it may be composite (consist of more than one
column).
A surrogate key is any column or set of columns that can be
declared as the primary key instead of a "real" or natural key.
Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate
keys. So a surrogate is a candidate key. A table could actually
have more than one surrogate key, although this would be
unusual. The most common type of surrogate key is an
incrementing integer, such as an auto_increment column in MySQL,
or a sequence in Oracle, or an identity column in SQL Server.

1. Both keys contain unique value for a record in a table.
2. Primary keys are used in OLTP whereas surrogate keys are used in OLAP
schemas.
3. Primary keys hold some business meaning whereas surrogate does not hold
any business meaning.
4. Primary key may contain numeric as well as non-numeric values whereas
surrogate keys contain only (simple)numeric values.

Now question comes that why do we use surrogate keys in OLAP rather then
using primary keys?
Answer is very simple. There are two main reasons for this:-

1. Surrogate keys are simple numeric values, as simple as normal counting.
So most of the time they save storage space.
2. As surrogate keys are simple and short, it speed-up the join performance.
3. Best thing is that same pattern of surrogate keys can be used across all
the tables present in a star/schema.

Like a primary key, a foreign key is also a type of constraint placed on one or more columns in a table. The foreign key establishes a link between the key columns and related columns in another table. (You can also link the foreign key columns to columns within the same table.) The table that contains the foreign key is considered the child table, and the table that the foreign key references is the parent table. The foreign key restricts what data can be stored in the foreign key columns in the child table, based on the data in the referenced columns in the parent table.

Other ways that foreign keys differ from primary keys are that you can create more than one foreign key on a table and you can define foreign keys on columns that permit NULL values. In addition, SQL Server does not automatically index the foreign key columns like it does for primary keys. If you want to index the foreign key columns, you must do so as a separate step.
2. How many types of join ?
There mainly exist 5 types of join in SQL

1. JOIN or INNER JOIN :

In this kind of a JOIN, we get all records that match the condition in both the tables, and records in both the tables that do not match are not reported.
In other words, INNER JOIN is based on the single fact that : ONLY the matching entries in BOTH the tables SHOULD be listed.

2. OUTER JOIN :

Outer Join retrieves
Either, the matched rows from one table and all rows in the other table Or, all rows in all tables (it doesn't matter whether or not there is a match).
There are three kinds of Outer Join :
2.1 LEFT OUTER JOIN or LEFT JOIN
This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.
2.2 RIGHT OUTER JOIN or RIGHT JOIN
This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.
2.3 FULL OUTER JOIN or FULL JOIN
This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.
In other words, OUTER JOIN is based on the fact that : ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

3. NATURAL JOIN :

It is based on the two conditions :

the JOIN is made on all the columns with the same name for equality.
Removes duplicate columns from the result.

This seems to be more of theoretical in nature and as a result (probably) most DBMS don't even bother supporting this.

4. CROSS JOIN :

It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense in most of the situations. Moreover, we wont need this at all (or needs the least, to be precise).

5. SELF JOIN :

It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.

JOINs based on Operators

Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are

Equi JOIN
Theta JOIN

1. Equi JOIN :

For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that the JOIN is an Equi JOIN.

2. Theta JOIN :

This is same as Equi JOIN but it allows all other operators like >, <, >= etc

3. Implementation of cross join ?[product & customer(many to many)]
4. Difference between Union and full outer join.
5. What is the ordering of where,group by,having,order by clause ?
6. What is the difference between view and materialized view? Can we apply index on materialized view ?

Datastage :-

1. What is the difference between Join , Merge and Lookup?Where we use it for what purpose?
2. How many type of partitioning ? Where we use it for what purpose? [e.g:- Aggregator -> Hash, Lookup -> Entire etc]
3. Implement scd 3 using change apply?
click here.
4. What is the difference between Sequential file stage and dataset? Why dataset is used for?[explain what does control file do in a dataset]
5. What is the use of intermediate link in join stage?
6. What is the difference between copy,filter and modify stage?
7. What is the disadvantage of transformer?
9. How to do pivoting using transformer ?
10. How can we call unix script in a job ?
11. How many input link a join can have ?
12. How to use parameter in a job ? Where value files used to get stored ?
13. Name of the links in lookup,join,merge and cdc stage and their function.
14. How to remove duplicate without using RD?
15. What is change key column in Sort stage?
16. Use of aggregretor stage.
17. Use of Change apply stage.
18. How many types of lookup in datastage?
19. What is the maximum size limit of hash file and why?
20. What is remote column propagation ? Describe the implementation process step by step. Why it is used? What is it's advantage and limitation ?
21. State the difference between MPP and SMP.
22. Explain APT config file and architecture of datastage.
23. How to set some stage execute in parallel ?
24. How to capture log in datastage ?
25. Why peak stage has been used ?

Datawarehousing :-

1. How many type of dimension and their example.
2. How many type of fact and their example.
3. Fact less fact.
4. What is star , snowflake and hybrid schema ? what schema depend on ER model and what will depend on DWH ?

Unix :-
1. How to replace a value using echo?

$ var="12345678abc"
$ replace="test"
$ echo ${var//12345678/$replace}
testabc

$ echo $LINE | sed -e "s/12345678/$replace/g"

2. How to copy content of file1 and file2 to file2?

cat file2 >> file1

Datastage for Learners

Friday, 19 June 2015

Interview Questions

No comments:

Post a Comment