Monday, April 30, 2012

Informatica - 1

HOW TO USE A COBOL FILE FOR TRANSFORMATION
 
Informatica allows reading data from cobol copybook formatted data files. These files mostly 
come from mainframe based source systems. Given that many of the world's leading business 
systems still use IBM Mainframe as their computing systems, e.g. airlines, banks, insurance 
companies etc, these systems act as a major source of information for Data warehouses, 
and thus to our Informatica mappings.  
For using a cobol copy book structure as a source, you'd have to put that copybook in a 
empty skeleton cobol program. 
IDENTIFICATION DIVISION.
PROGRAM-ID. RAGHAV.

ENVIRONMENT DIVISION.
SELECT FILE-ONE ASSIGN TO "MYFILE". 

DATA DIVISION.
FILE SECTION.
FD FILE-ONE.

COPY "RAGHAV_COPYBOOK.CPY".

WORKING-STORAGE SECTION.

PROCEDURE DIVISION.

STOP RUN. 

The copybook file can by a plain record structure.
Read more about defining copybooks around here.

Need of Scheduling and Commonly used Schedulers
 
Any and all Data warehousing environments need some kind of scheduler setup to enable
jobs being run at periodic intervals without human intervention.  Another important feature
is the repeatability of the jobs set up such.  Without the help of a scheduler, things would
become very ad-hoc and thus prone to errors and messups. 
Oracle provides an built in scheduling facility, accessible through its dbms_scheduler package.
Unix provides basic scheduling facility using cron command. Similarly, Informatica also 
provides basic scheduling facilities in the Workflow Manager client.
 
The features provided by these scheduling tools are fairly limited, often limited to launching
a job at a given time, providing basic dependency management etc. 
 
However, in real time data warehousing solutions, the required functionality is lot more 
sophisticated than whats offered by these basic features.  Therefore, the need for full 
fledged scheduling tools, e.g. Tivoli Workload Scheduler, Redwood Cronacle, Control-M, 
Cisco Tidal etc..
 
Most of these tools provide sophisticated launch control, dependency management features 
and therefore allow the data warehouse to be instrumented at finer levels.
 
Some of the tools, e.g. Tidal for informatica and Redwood for Oracle, provide support for
the Tools' API as well, therefore integrating even better with the corresponding tool.  

Sunday, January 8, 2012

bigData - 1


  1. What is bigData ?
  2. How is it bigData differnet from Data Warehouse ?
  3. What kind of data sources categorize into bigData ?
  4. Why is it required to have a different kind of software/hardware solutions to handle bigData ?
  5. What kind of solutions are being developed to handle bigData ?
  6. What is hadoop ? Is this same as bigData ? different ? how ?
  7. What are the upcoming new technology suites for handling bigData ?

Monday, January 2, 2012

Data Modelling - 1


  1. What are aggregates and why do you need them ?
  2. What is the role of a logical data modeler and physical data modeler ?
  3. How do you gather requirements for creating data model ?
  4. What is the naming convention you follow ?
  5. What are the different notations that can be used ?
  6. What is cardinality ?
  7. What are identifying and non identifying relationships ?
  8. Why do you need reports from data model ?
  9. What kind of Meta data do you capture in a data model ?


    Other useful links - helps in finding answers -
    http://www.agiledata.org/essays/dataModeling101.html