DATA ENGINEERING

DATA ARCHITECTURE, INFORMATION MANAGEMENT

I am opening this discussion in reponse to a terminolgy and an approach problem I got this week.

In the project I'm working on, I created a database with two different layers to support a data warehouse as an unique data source; the first layer is a generalization of the business composed of seven master entities and sub tables; and the second one is the actual set of dimensions and aggregations.
Some of my managers complained about this strutucture and I replied it was a more reliable data source than simply extracting data from external sources, plus we would be working on a single data source which would allow us to obtain an endless number of business performance indicators. As the first layer is highly strucutured I simply refused to call it a staging area, as it represents the business data.

My questions are; One - is it a correct approach to have a database with those two layers?
Two - Why shouldn't it be correct to call the resulting DB and ODS? (operational data store), as it refers to business data? Should I really call it Staging DB?

Comments?

Thank you so much in advance,
Marco

Share

Reply to This

Replies to This Discussion

Mr Ribeiro

I'm a little late joining this party. You made this posting over a month ago. I dont see any feedback here, so I will take a moment and discuss this issue with you and our fellow network members...just in case someone else finds this an interesting topic.

The approach that I've found to work the best is to have two separate areas, a Staging Area and a Core Warehouse.

Staging Area
The Staging Area is intended to manage issues such as timing, history, quality and error management. In essence, the staging area prepares the data for prime-time. The Staging area may contain unstructured tables that resemble the files being loaded, structured tables that resemble those in the Core Warehouse and Error Management structures. Once your data is prepared, you can mimize down time in the Core Warehouse by doing a simple 'Truncade and Load'


Core Warehouse
You always want the Core Warehouse to be up and available as much as possible, and to have prime-time data available to the end users. You will often find that your users are somewhat finicky, and justifiably so. Nothing is more irritating than to run a report/cube off a repository, and then discover later, if you run the same report, the results have changed, when they should be the same. Whether you should use separate schemas for Marts or include the de-normalized structures or only use Star-Based structures (ala Kimball vs Inmon) in the same schema as your Core Warehouse depends on the answers that one must ask about resources, access methods and the talent base associated with your enterprise.

In other words, Marco, Kimbal would probably argue that you should nver model in 3rd-5th Normal Forms, that one should always use Star Schemas. My point is --- it depends. Nonetheless, two areas, Staging and Core Warehouse generally allow for good Data Management methods to be put into place.

Its often that Data Architect is only afforded 'one bite at the apple', if you dont manage your warehouse properly, dont suprised if your users go elsewhere at the first sense of failure....

Hope someone reads this...

Have fun folks!
Will

Reply to This

RSS

© 2009   Created by Marco Ribeiro on Ning.   Create a Ning Network!

Badges  |  Report an Issue  |  Privacy  |  Terms of Service