Validating data between RDBMS (Mysql/ Oracle / DB2) and Hadoop ( HDFS/ Hive)
-
I have got this assignment question on database. Not getting any idea of how to solve this. Can anyone please help me ? ======================================================== “Mr. A” and “Mr. B” data warehousing experts working for “XYZ” company, currently they are developing ETL-Validator framework for big-data technology i.e. Validating data between RDBMS (Mysql/ Oracle / DB2) and Hadoop ( HDFS/ Hive). Source database (RDBMS) constains millions of records and all the records from source are already migrated to target database (Hadoop - Hive). They need your help in implementing following scenario's A. Column Level comparision between source and Target Database (i.e. Comparing each column of source Database with each column of Target Database ) . Now your task is to : 1. Assume suitable database on source side and design table structure(student / retail banking /telecommunication / insurance , any other) for the same having atleast ten columns. 2. Assuming that buffer size = 500, propose efficient strategy to reduce the number of comparision between source and target columns records. 3. Write SQL query for the solution proposed in step#2. 4. Draw query tree for the query of step#3. 5. Write psudo code or program ( Java / C# ) for proposed solution in step#2 and step #3. B. As foreign key constraint in not implemented in Target Database (Hadoop – Hive ) , implement foreign key validator for target database. 1. Assuming that table used in #A.1 is already present in target DB, now construct one more table on target side which references to primary key of table used in #A.1 2. Assuming that buffer size=500, propose efficient strategy with min. Comparision to validate foreign key constraint. ==============================================================
-
I have got this assignment question on database. Not getting any idea of how to solve this. Can anyone please help me ? ======================================================== “Mr. A” and “Mr. B” data warehousing experts working for “XYZ” company, currently they are developing ETL-Validator framework for big-data technology i.e. Validating data between RDBMS (Mysql/ Oracle / DB2) and Hadoop ( HDFS/ Hive). Source database (RDBMS) constains millions of records and all the records from source are already migrated to target database (Hadoop - Hive). They need your help in implementing following scenario's A. Column Level comparision between source and Target Database (i.e. Comparing each column of source Database with each column of Target Database ) . Now your task is to : 1. Assume suitable database on source side and design table structure(student / retail banking /telecommunication / insurance , any other) for the same having atleast ten columns. 2. Assuming that buffer size = 500, propose efficient strategy to reduce the number of comparision between source and target columns records. 3. Write SQL query for the solution proposed in step#2. 4. Draw query tree for the query of step#3. 5. Write psudo code or program ( Java / C# ) for proposed solution in step#2 and step #3. B. As foreign key constraint in not implemented in Target Database (Hadoop – Hive ) , implement foreign key validator for target database. 1. Assuming that table used in #A.1 is already present in target DB, now construct one more table on target side which references to primary key of table used in #A.1 2. Assuming that buffer size=500, propose efficient strategy with min. Comparision to validate foreign key constraint. ==============================================================
Nobody here is going to do your homework for you. It is set to test what you know, not what a bunch of random strangers on the Internet know. If you genuinely don't know where to start, then talk to your teacher.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer