How do I model this?

Jorgen Andersson

I'm going to create a rather large report that collects the data from two different sources. One is a badly normalized database, where the data I need is spread over a large number of columns in many tables. The other is an Entity–attribute–value model[^]. I don't have any problems in fetching the data from either source, but the data is going to be organised in categories and subcategories, or possibly a tree, that are currently not existing in the EAV model and not corresponding at all with the data model on the other database, so they need to be created from scratch. I'm looking for ideas on how to make this in a fairly futureproof way, I don't really have any keys to use either. It's just the column names in one source and the attribute names in the other. I realize there is no right way of doing this (but many wrong). So any opinions on how to do this, and as important, how not to do this, is appreciated. And no, I can't do much about the structure of the sources, that's way out of scope.

Wrong is evil and must be defeated. - Jeff Ello

Mycroft Holmes

Instant answer would be a mapping table/application. Probably not what you want to hear :-O

Never underestimate the power of human stupidity RAH

Jorgen Andersson

:laugh: That's what I'm building at the moment. But you're quite right, it's not what I wanted to hear. :) To many error sources, and yet another place to add the same texts again. I suspect that this is one of those cases where garbage in becomes garbage out.

Wrong is evil and must be defeated. - Jeff Ello

Lost User

There's a number of ways to approach this. Sounds like you're at the "exploratory" stage. In that case, I might suggest MS Power BI for the Desktop. [Power BI Desktop | Microsoft Power BI](https://powerbi.microsoft.com/en-us/desktop/) You can use Power BI to pull data from multiple sources, build relationships, and get a "feel for the data" and perhaps produce your report in question (with or without using any number of "cleansing" techniques). Moving forward, look into (data warehouse) "star schemas" for modeling ideas.

"(I) am amazed to see myself here rather than there ... now rather than then". ― Blaise Pascal

Lost User

Jörgen Andersson wrote:

I'm looking for ideas on how to make this in a fairly futureproof way, I don't really have any keys to use either. It's just the column names in one source and the attribute names in the other.I realize there is no right way of doing this (but many wrong).So any opinions on how to do this

You got two groups of unsorted and ungrouped column-names coming from a repository that need to be mapped to a model? ..then I'd go for a relational model, even if it only exists in memory. Which would mean following the steps to normalize the data (upto BCNF). That's a tedious but proven way. Any group of properties that make up a model (regardless of the way the data is physically stored) would be a candidate. I would seriously consider creating that theoretical model and then create an in-memory database (using SQLite or similar) to hold it. The added advantage of that approach is having an abstraction of the data that you can already use to design reporting and stuff. Another added advantage is that any redundancies or inconsistencies in the database would be caught. Downside is that you need (access to) a domain-expert to build and validate the model.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

Jorgen Andersson

Yes, this would probably be the right way of doing it. :sigh:

Wrong is evil and must be defeated. - Jeff Ello

Lost User

Jörgen Andersson wrote:

Yes, this would probably be the right way of doing it. :sigh:

Not the most exciting thing to do, I agree :thumbsup:

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)