Datastage for Learners: 2016

Tuesday, 25 October 2016

Temp Tablespace

select distinct(a.tablespace_name), b.total_size ,b.used_size - c.free_space used_size,b.space_available + c.free_space SPACE_AVAILABLE
from dba_tables a,
(select sum(MAXBYTES)/1024/1024/1024 total_size, sum(BYTES)/1024/1024/1024 used_size, sum(MAXBYTES)/1024/1024/1024 - sum(BYTES)/1024/1024/1024 space_available, TABLESPACE_NAME
FROM DBA_DATA_FILES
group by TABLESPACE_NAME
) b,
(select
ddf.tablespace_name
,sum(dfs.bytes)/1024/1024/1024 free_space
from dba_data_files ddf, dba_free_space dfs
where ddf.file_id = dfs.file_id
group by ddf.tablespace_name ) c
where
a.tablespace_name=b.tablespace_name
and a.tablespace_name=c.tablespace_name
--and a.tablespace_name like '%SRCSTGP04%' –if tablespace is known then mention here.
order by 4;

Sunday, 11 September 2016

Difference between Informatica and Datastage

I have used both Datastage and Informatica... In my opinion, DataStage is way more powerful and scalable than Informatica. Informatica has more developer-friendly features, but when it comes to scalabality in performance, it is much inferior as compared to datastage.

Here are a few areas where Informatica is inferior -

1. Partitioning - Datastage PX provides many more robust partitioning options than informatica. You can also re-partition the data whichever way you want.

2. Parallelism - Informatica does not support full pipeline parallelism (although it claims).

3. File Lookup - Informatica supports flat file lookup, but the caching is horrible. DataStage supports hash files, lookup filesets, datasets for much more efficient lookup.

4. Merge/Funnel - Datastage has a very rich functionality of merging or funnelling the streams. In Informatica the only way is to do a Union, which by the way is always a Union-all.

Thursday, 1 September 2016

Datastage Interview Question

http://www.datawarehousing-praveen.com/2013/10/datastage-interview-questions-part-1_720.html

Tuesday, 9 August 2016

MDM

http://news.sap.com/consolidation-harmonization-and-central-management/

https://scn.sap.com/thread/1034712

http://www.sourcemediaconferences.com/MDM/pdf/Tues/MDS/Shewale_Young.pdf

Monday, 14 March 2016

Data Warehousing

Q1 :- Four-Step Dimensional Design Process

Objective: design of a dimensional database by considering four steps in a particular order:

1. Select the business process to model

2. Declare the grain of the business process

3. Choose the dimensions that apply to each fact table row.

4. Identify the numeric facts that will populate each fact table row

https://dwbi1.wordpress.com/2010/03/05/transaction-dimension/
http://byobi.com/blog/2013/09/dimensional-modeling-junk-vs-degenerate/
http://dwhlaureate.blogspot.in/2012/08/junk-dimension.html
https://bintelligencegroup.wordpress.com/2012/06/05/different-types-of-dimensions-and-facts-in-data-warehouse/
http://www.disoln.org/2013/12/Design-Approach-to-Handle-Late-Arriving-Dimensions-and-Late-Arriving-Facts.html

5. MDM :- Consolidation,Harmonization,Centralization,Distribution

Friday, 8 January 2016

Unix Interview Questions

1. Delete All Spaces or tabs from a line
:- cat file.txt tr -d ' \t'
:- sed 's/[ \t]*$//' file_A > new_file_A
2. Delete consequtive Spaces from a line
:- cat file.txt tr -s ' \t'
3. Remove blank line from file
:- sed '/^$/d' file.txt
:- grep -v '^$' filename > newfilename
4. Search for a pattern in the context of multiple xml files

:- grep -lR 'pattern' * |egrep '(xml)$'
5. Print the fields from the 10th to the 12th in a file
:- cut -d',' -f10-f12 filename.csv

6. Print the line number with the line containing a pattern in a file

:- grep -nf pattern.txt file.txt
7. Searching for repetating lines in a file [- is used after -f to read data from stream]

:- head -1 Crane4C1.dat| grep -nf - Crane4C1.dat

8. Remove CTRL+M from a file

:- sed -e "s/^M//"

9. How to find CTRL+M from a file

:- cat -v filename.txt

10. Print 10 to 20th line in a file
:- sed -n 10,20p filename.txt

11. Looping in AWK

12. Create duplicate record from a file using awk.

13. Show content of two tag in a xml file using sed.

sed -n '/<Tag>/,/<\/Tag>/p' file.xml

14. Find the counts of all the records of files in a directory.

wc -l `find /path/to/directory/*PATTERN* -type f`

15. Remove duplicate record from a file using sed.

sort file | uniq -d