Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Translate data from one schema to another

Translate data from one schema to another

Scheduled Pinned Locked Moved Database
databasesql-serverhelpquestionsysadmin
12 Posts 3 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    treefirmy
    wrote on last edited by
    #1

    I know it is possible to use SSIS and other tools to translate data from one schema to another but does anyone know of any well-known patterns or code examples of doing this? One wrinkle to the problem is that the original constraints, foreign keys, unique keys, etc., were not enforced at the database level and so translation has to consider this. In other words, fix things up as best as possible by doing things like inserting rows into parent tables when foreign keys are missing. The data in question isn't very large; a max of around 200MB. The spec would like the solution to be in two phases, one to translate into the new schema with data written out to the file system and another phase where the data is loaded into the new database, MS SQL Server. Unless there is a free tool that can do this and do it well, the solution will need to be hand coded. Someone had already looked at using a mapping tool with poor results.

    M 1 Reply Last reply
    0
    • T treefirmy

      I know it is possible to use SSIS and other tools to translate data from one schema to another but does anyone know of any well-known patterns or code examples of doing this? One wrinkle to the problem is that the original constraints, foreign keys, unique keys, etc., were not enforced at the database level and so translation has to consider this. In other words, fix things up as best as possible by doing things like inserting rows into parent tables when foreign keys are missing. The data in question isn't very large; a max of around 200MB. The spec would like the solution to be in two phases, one to translate into the new schema with data written out to the file system and another phase where the data is loaded into the new database, MS SQL Server. Unless there is a free tool that can do this and do it well, the solution will need to be hand coded. Someone had already looked at using a mapping tool with poor results.

      M Offline
      M Offline
      Mycroft Holmes
      wrote on last edited by
      #2

      Having done this job a number of times over the years, I doubt there could be a tool to do this. So many of the decisions are a judgement call that an automated tool could not cut it.

      treefirmy wrote:

      doing things like inserting rows into parent tables

      This typical example - insert the parent or delete the child - is typical. These type of jobs are where we really earn our money, if they ever automate it I'll be astonished but would like to meet the AI that does it.

      Never underestimate the power of human stupidity RAH

      T 1 Reply Last reply
      0
      • M Mycroft Holmes

        Having done this job a number of times over the years, I doubt there could be a tool to do this. So many of the decisions are a judgement call that an automated tool could not cut it.

        treefirmy wrote:

        doing things like inserting rows into parent tables

        This typical example - insert the parent or delete the child - is typical. These type of jobs are where we really earn our money, if they ever automate it I'll be astonished but would like to meet the AI that does it.

        Never underestimate the power of human stupidity RAH

        T Offline
        T Offline
        treefirmy
        wrote on last edited by
        #3

        Thanks for the info. Are there any online articles, etc. that might be helpful for me to read? Since you have done this many times, is there any wisdom you might impart? This will be my first time. What things should I expect to run into other than memory issues? Unfortunately, the data is coming in as a DataSet. Is that even reasonable? I’ve been told that DataSet is space inefficient. Are there other problems with DataSet to watch out for? My idea was to work with the data in place: 1. Scrub the data so that it passes the value constraints of the new schema (rules come from an XML file) 2. Rename tables and columns 3. Create new tables, filling them with columns and rows of data (not sure this will even work - maybe have to create a new set and move data into it) 4. Fix up foreign key constraints 5. Either drop data for rows that have duplicate primary keys or somehow, make them unique. Both of these options seem very expensive time-wise. 6. Binary serialize the data to a file For the upload, I was going to use the BulkCopy class. I considered just writing out table data in a format that bcp.exe could understand but bcp.exe only allows 263 rows. (Wow, that limit is weird, http://msdn.microsoft.com/en-us/library/ms162802.aspx).

        M 1 Reply Last reply
        0
        • T treefirmy

          Thanks for the info. Are there any online articles, etc. that might be helpful for me to read? Since you have done this many times, is there any wisdom you might impart? This will be my first time. What things should I expect to run into other than memory issues? Unfortunately, the data is coming in as a DataSet. Is that even reasonable? I’ve been told that DataSet is space inefficient. Are there other problems with DataSet to watch out for? My idea was to work with the data in place: 1. Scrub the data so that it passes the value constraints of the new schema (rules come from an XML file) 2. Rename tables and columns 3. Create new tables, filling them with columns and rows of data (not sure this will even work - maybe have to create a new set and move data into it) 4. Fix up foreign key constraints 5. Either drop data for rows that have duplicate primary keys or somehow, make them unique. Both of these options seem very expensive time-wise. 6. Binary serialize the data to a file For the upload, I was going to use the BulkCopy class. I considered just writing out table data in a format that bcp.exe could understand but bcp.exe only allows 263 rows. (Wow, that limit is weird, http://msdn.microsoft.com/en-us/library/ms162802.aspx).

          M Offline
          M Offline
          Mycroft Holmes
          wrote on last edited by
          #4

          Whenever I do this I do it at the database level, I would not even consider doing through any other interface. My usual process is something like: Take a copy of the production DB Create a target DB Create as much of the data structure as you understand based on the current crap and what the business wants to do Starting with the static tables (Country, City, ###Type etc) and script them into the new database When all the static stuff is over take a backup of your target DB - this is your new start point Now start with the ugly stuff, working through your data structures. Create additional tables to hold any new records created to support the new data integrity. EVERY step requires a script, a reset script and an endless acceptance that NOTHING is final till the next start point. If you get a structure in and are happy with it take more backups as a new point. Always be prepared to trash a start point if you get it wrong. NEVER accept that a a prior point is sacrosanct if it is wrong scrap it and fix the problem - the other guy probably didn't do this. If you do a single action in the entire process that is not scripted and repeatable you are screwed and should quit now, before you start. Do not let your PM/boss push you into accepting a wrong decision. Once you have the whole thing completed script out your target database. Create a new database with the script and run your transfer script using the latest production data, now go fix the NEW problems raised by the latest data, this is a diminishing return problem and needs to be repeated till it... well diminishes.

          Never underestimate the power of human stupidity RAH

          T 1 Reply Last reply
          0
          • M Mycroft Holmes

            Whenever I do this I do it at the database level, I would not even consider doing through any other interface. My usual process is something like: Take a copy of the production DB Create a target DB Create as much of the data structure as you understand based on the current crap and what the business wants to do Starting with the static tables (Country, City, ###Type etc) and script them into the new database When all the static stuff is over take a backup of your target DB - this is your new start point Now start with the ugly stuff, working through your data structures. Create additional tables to hold any new records created to support the new data integrity. EVERY step requires a script, a reset script and an endless acceptance that NOTHING is final till the next start point. If you get a structure in and are happy with it take more backups as a new point. Always be prepared to trash a start point if you get it wrong. NEVER accept that a a prior point is sacrosanct if it is wrong scrap it and fix the problem - the other guy probably didn't do this. If you do a single action in the entire process that is not scripted and repeatable you are screwed and should quit now, before you start. Do not let your PM/boss push you into accepting a wrong decision. Once you have the whole thing completed script out your target database. Create a new database with the script and run your transfer script using the latest production data, now go fix the NEW problems raised by the latest data, this is a diminishing return problem and needs to be repeated till it... well diminishes.

            Never underestimate the power of human stupidity RAH

            T Offline
            T Offline
            treefirmy
            wrote on last edited by
            #5

            Thanks for the info – that seems like a sane way to go. Of course, things are a little more involved as we don’t have the actual data that will be fed to the tool. The tool is supposed to run on customer data, possibly 1000’s of customers with 10-100,000 ‘databases’ each. ‘database’ == ISAM database. The company of course has sample ‘databases’ and can create ‘databases’ but it does not have the actual data that this tool will be fed. Using a script seems like a much better way to go as the script could be modified by a DB admin at the customer site whereas doing it in code means rewriting/refactoring and recompiling (well unless they could somehow jam the change into the XML file – yeah, I know doubtful). I’m naïve about SQL scripting, how would one go about actually stuffing the ISAM data into the new SQL server tables? I’ve only ever written C# code to call stored procs. Is this what you mean? The company already tried to have this project done offshore with terrible results (they completely ignored memory considerations). Unfortunately, I’m not sure I would be able to convince them to go the script way as they seem to want this done in code. Hopefully this all makes some sort of sense – I’m not even close to being DB savvy.

            M 1 Reply Last reply
            0
            • T treefirmy

              Thanks for the info – that seems like a sane way to go. Of course, things are a little more involved as we don’t have the actual data that will be fed to the tool. The tool is supposed to run on customer data, possibly 1000’s of customers with 10-100,000 ‘databases’ each. ‘database’ == ISAM database. The company of course has sample ‘databases’ and can create ‘databases’ but it does not have the actual data that this tool will be fed. Using a script seems like a much better way to go as the script could be modified by a DB admin at the customer site whereas doing it in code means rewriting/refactoring and recompiling (well unless they could somehow jam the change into the XML file – yeah, I know doubtful). I’m naïve about SQL scripting, how would one go about actually stuffing the ISAM data into the new SQL server tables? I’ve only ever written C# code to call stored procs. Is this what you mean? The company already tried to have this project done offshore with terrible results (they completely ignored memory considerations). Unfortunately, I’m not sure I would be able to convince them to go the script way as they seem to want this done in code. Hopefully this all makes some sort of sense – I’m not even close to being DB savvy.

              M Offline
              M Offline
              Mycroft Holmes
              wrote on last edited by
              #6

              treefirmy wrote:

              I’m not even close to being DB savvy.

              The you are in serious trouble. I would not attempt this without the services of a highly experienced and skilled data specialist. As the company has already found out. This sounds like it is related to a previous post I read a couple of weeks ago with "1000s" of databases. I think your company needs to totally rethink your data structure/storage/delivery processes. 1st question I would ask is do you own the data, 2nd is do you have a mandate to change. If either of these is no then stop now, the company is doing a half assed job AGAIN. Or are you writing an interface between the crap data and something else, if so what is the else?

              Never underestimate the power of human stupidity RAH

              T 1 Reply Last reply
              0
              • M Mycroft Holmes

                treefirmy wrote:

                I’m not even close to being DB savvy.

                The you are in serious trouble. I would not attempt this without the services of a highly experienced and skilled data specialist. As the company has already found out. This sounds like it is related to a previous post I read a couple of weeks ago with "1000s" of databases. I think your company needs to totally rethink your data structure/storage/delivery processes. 1st question I would ask is do you own the data, 2nd is do you have a mandate to change. If either of these is no then stop now, the company is doing a half assed job AGAIN. Or are you writing an interface between the crap data and something else, if so what is the else?

                Never underestimate the power of human stupidity RAH

                T Offline
                T Offline
                treefirmy
                wrote on last edited by
                #7

                The customer owns the data in real terms. The data is only supposed to be changed by the older version of the application so it’s not as if it is an open database where others are allowed to change it. (On the 1000s of 'databases', my PM, thank goodness, understands that this is not the right way to go and will 'force' the developers to use an extra DB table so there will be just one DB.) On the mandate issue, I know the new application is going to go to SQL Server no matter what. Well, unless the entire scheme is completely unworkable and there is a vast amount of evidence to show the critical problems. If this is true, I can probably kiss my job goodbye. No this won't be just an interface; it’s supposed to literally be a scrub, translation, write data to disk and then upload with the tool to the new SQL Server DB. Scrubing and translating data makes me nervous - if I were a customer and I knew this was happening, I would have serious reservations. What’s the yikes factor given the information above? High I’m guessing :-O .

                No one ever lost a fortune by underestimating the intelligence of the American public.

                M 1 Reply Last reply
                0
                • T treefirmy

                  The customer owns the data in real terms. The data is only supposed to be changed by the older version of the application so it’s not as if it is an open database where others are allowed to change it. (On the 1000s of 'databases', my PM, thank goodness, understands that this is not the right way to go and will 'force' the developers to use an extra DB table so there will be just one DB.) On the mandate issue, I know the new application is going to go to SQL Server no matter what. Well, unless the entire scheme is completely unworkable and there is a vast amount of evidence to show the critical problems. If this is true, I can probably kiss my job goodbye. No this won't be just an interface; it’s supposed to literally be a scrub, translation, write data to disk and then upload with the tool to the new SQL Server DB. Scrubing and translating data makes me nervous - if I were a customer and I knew this was happening, I would have serious reservations. What’s the yikes factor given the information above? High I’m guessing :-O .

                  No one ever lost a fortune by underestimating the intelligence of the American public.

                  M Offline
                  M Offline
                  Mycroft Holmes
                  wrote on last edited by
                  #8

                  So my 2nd post still applies. You cannot get the original data but have the current schema. I would create a sql database/table to take current customers data and just grab a substantial set of data to work with. I would then treat that as the source database. Work with that database to create your new data structures. I am presuming there is 1 table coming from the source system with a humongous amount redundant data in it! If this is the case I usually write a script to move the source FILE into a data structure. I never do the transformation in the data transfer tool eg get the data into the database then clean it up.

                  Never underestimate the power of human stupidity RAH

                  T 1 Reply Last reply
                  0
                  • M Mycroft Holmes

                    So my 2nd post still applies. You cannot get the original data but have the current schema. I would create a sql database/table to take current customers data and just grab a substantial set of data to work with. I would then treat that as the source database. Work with that database to create your new data structures. I am presuming there is 1 table coming from the source system with a humongous amount redundant data in it! If this is the case I usually write a script to move the source FILE into a data structure. I never do the transformation in the data transfer tool eg get the data into the database then clean it up.

                    Never underestimate the power of human stupidity RAH

                    T Offline
                    T Offline
                    treefirmy
                    wrote on last edited by
                    #9

                    Excellent. How do I convince my PM that this is the way to go? What rationale should be used? (My PM has managed enterprise type data migration but only with SSIS/tools. I think some of the other developers managed the offshore team and either guided them to the scrub in code scheme or at least allowed it.) Keep in mind this, scrub/translate in SQL Server, might be a very hard sell as I'm not experienced in writing SQL scripts, and have written C++ and C# code for 10+ yrs. Thanks so much for the information. p.s. There are a number of files which make up the ISAM database.

                    M 1 Reply Last reply
                    0
                    • T treefirmy

                      Excellent. How do I convince my PM that this is the way to go? What rationale should be used? (My PM has managed enterprise type data migration but only with SSIS/tools. I think some of the other developers managed the offshore team and either guided them to the scrub in code scheme or at least allowed it.) Keep in mind this, scrub/translate in SQL Server, might be a very hard sell as I'm not experienced in writing SQL scripts, and have written C++ and C# code for 10+ yrs. Thanks so much for the information. p.s. There are a number of files which make up the ISAM database.

                      M Offline
                      M Offline
                      Mycroft Holmes
                      wrote on last edited by
                      #10

                      treefirmy wrote:

                      scrub/translate in SQL Server, might be a very hard sell as I'm not experienced in writing SQL

                      My recommendation is based on my experience and I'm afraid I DO have extensive experience in SQL. Seriously, I would recommend getting in a consultant to do this job, it is not trivial and screwing it up will (and already has once) totally ruin any app built on it! The main reason not to use SSIS for the transformation is my own lack of skill with the tool, I find it particularly difficult to audit a process that happens on the server in what is essentially a black box. Another is that we recently did some comparisons between using SSIS to transform the data and stored procs and the procs were 3 times faster than SSIS. When you process over 2k files per day it is important.

                      Never underestimate the power of human stupidity RAH

                      T 1 Reply Last reply
                      0
                      • M Mycroft Holmes

                        treefirmy wrote:

                        scrub/translate in SQL Server, might be a very hard sell as I'm not experienced in writing SQL

                        My recommendation is based on my experience and I'm afraid I DO have extensive experience in SQL. Seriously, I would recommend getting in a consultant to do this job, it is not trivial and screwing it up will (and already has once) totally ruin any app built on it! The main reason not to use SSIS for the transformation is my own lack of skill with the tool, I find it particularly difficult to audit a process that happens on the server in what is essentially a black box. Another is that we recently did some comparisons between using SSIS to transform the data and stored procs and the procs were 3 times faster than SSIS. When you process over 2k files per day it is important.

                        Never underestimate the power of human stupidity RAH

                        T Offline
                        T Offline
                        treefirmy
                        wrote on last edited by
                        #11

                        I know I'm asking a lot and you should be paid for this kind of info. So thanks again. Don't get me wrong, if it were up to me I would more than follow your guidance. The only issue here is that I'm going to be a mouthpiece for your ideas and I don't have much standing in the company. I'm a new hire. Given that, what concrete examples/reasons can I give my PM to convince him of the folly of the current approach and the correctness of the 'move data to SQL Server and then do translation into the new schema' strategy?

                        J 1 Reply Last reply
                        0
                        • T treefirmy

                          I know I'm asking a lot and you should be paid for this kind of info. So thanks again. Don't get me wrong, if it were up to me I would more than follow your guidance. The only issue here is that I'm going to be a mouthpiece for your ideas and I don't have much standing in the company. I'm a new hire. Given that, what concrete examples/reasons can I give my PM to convince him of the folly of the current approach and the correctness of the 'move data to SQL Server and then do translation into the new schema' strategy?

                          J Offline
                          J Offline
                          Jorgen Andersson
                          wrote on last edited by
                          #12

                          You've recieved some excellent info from Mycroft. And I agree with him fully. I'd just like to add that memory considerations is a major reason to do the job on the database using SQL scripts. A database is optimised for handling large sets of data.

                          "When did ignorance become a point of view" - Dilbert

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups