Dynamically generated insert command from DataTable

NandoMan

So, I found myself having to migrate the data from a Db in Access to a Db in MySql. What I did was that I first filled a DataTable with the data from the access Db, then I used a little bit of Linq to generate a MySql insert command, then use a DataAdapter to insert the records of the DataTable. The DataRows' state had to be manually set to added. Here, "Tables" is an array of strings with the tables I wanted to migrate.

foreach (string table in Tables)
{

            string columns = string.Join(",",
                (from col in ds.Tables\[table\].Columns.Cast()
                 select string.Format("\`{0}\`", col.ColumnName)).ToArray()); //select the columns' names present in the DataTable, separated by ","

            string values = string.Join(", ",
                (from col in ds.Tables\[table\].Columns.Cast()
                 select "@" + col.ColumnName.Replace(' ', '\_')).ToArray()); //The same, but with an "@" in the beginning for the parameters' names

            var param =
                from col in ds.Tables\[table\].Columns.Cast()
                select new MySqlParameter("@" + col.ColumnName.Replace(' ', '\_'), ToMySqlDbType(col.DataType), col.MaxLength, col.ColumnName); //The parameters used in the DataAdapter

            StringBuilder command = new StringBuilder();
            command.Append("insert into ")
                .Append(table)
                .Append(" (")
                .Append(columns)
                .Append(") values (")
                .Append(values)
                .Append(") ");

            MySqlDataAdapter insertAdapter = new MySqlDataAdapter();
            insertAdapter.InsertCommand = new MySqlCommand(command.ToString(), connection);
            insertAdapter.InsertCommand.Parameters.AddRange(param.ToArray());

            insertAdapter.Update(ds, table);
        }

And the ToMySqlDbType function, taken from someplace else:

public static MySqlDbType ToMySqlDbType(Type type)
{
MySqlParameter p1 = new MySqlParameter();
System.ComponentModel.TypeConverter tc;
tc = System.ComponentModel.TypeDescriptor.GetConverter(p1.DbType);

        if (tc.CanConvertFrom(type))
            p1.DbType = (DbType)tc.ConvertFrom(type.Name);
        else
        {
            try
            {

PIEBALDconsult

First, ditch the DataTable. Then ditch the DataAdapters. And then ditch the Linq. I do the same thing, but with a DataReader. I don't know about Access and MySql, but I had a little trouble copying from Ingres to SQL Server -- different ranges for Date and byte-sized columns.

OriginalGriff

PIEBALDconsult wrote:

ditch the DataTable. Then ditch the DataAdapters. And then ditch the Linq

:thumbsup:

Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.

NandoMan

PIEBALDconsult wrote:

First, ditch the DataTable. Then ditch the DataAdapters. And then ditch the Linq.

Really? are they that bad? why are they bad? Thanks!

PIEBALDconsult

The DataTable technique requires that you read all the data into memory at once -- using a DataReader allows you to have only one row in memory at a time. DataAdapters are very limited in what they can do, and they try to do too much -- all you need is inserts, right? Linq is just plain silly -- in the long run you're much better off rolling your own.

Dave Kreskowiak

Simple math. What if your DB table has 100,000,000 rows in it, each weighing in at 5KB? Since your're using a DataTable, it has to hold all of those records all at once. You'd need 512GB of memory in the machine to hold all that, plus overhead.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak

KP Lee

"100,000,000 rows" in ACCESS!??! :)

Dave Kreskowiak

It was just an example of why the technique he was using is so bad. It doesn't matter which database he was using. It's a lot easier to find an SQL Server database with that many rows in a table, so... But, for an example, yes, I've had my hands on Access databases with tables containing over a 500,000 rows. It wasn't pretty. I don't use Access for anything.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak

KP Lee

I did put a smiley face on my note! I was laughing at the jusxtaposition of your example and the DB that started this thread. Your experience with 500K records highlights the problems that come from joining the wrong DB engine with the technical requirements that need to be satisfied. I wouldn't rule out the use of Access, it just has to fit the demands expected from it. Your example is an example of a process that would make his technique bad. The technique isn't bad in and of itself. There are situations where the technique of storing records in memory comes in very handy and it doesn't tie up your communication with the DB. I tend to shudder at the technique implied by the title of this thread. I also tend to shudder at the idea of reading 100 meg rows with a data reader that is spending time calculating things for each row it reads.

Dave Kreskowiak

KP Lee wrote:

I wouldn't rule out the use of Access, it just has to fit the demands expected from it.

I haven't used Access for anything, other than testing other peoples code, since about 1999 and I haven't been disappointed. I used various editions of SQL Server for everything.

KP Lee wrote:

The technique isn't bad in and of itself.

The only thing that can justify that technique is if you can guarantee the size of the resulting table used isn't very big. If you have no idea what the upper bound of the record count in a resulting table is, this is a very dangerous technique to use as everything will work fine until one day, may a few years down the road, it fails because of memory limitations.

KP Lee wrote:

I also tend to shudder at the idea of reading 100 meg rows with a data reader that is spending time calculating things for each row it reads.

That's what the database is used for - the heavy lifting and calculating. You don't depend on the DataReader to do any calculations at all. It'll outperform every other db access method out there.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak

KP Lee

Dave Kreskowiak wrote:

That's what the database is used for - the heavy lifting and calculating. You don't depend on the DataReader to do any calculations at all.

Sorry about the slow response. Tend to get buried in Emails and sometimes miss them. Found this while cleaning up E-mails. I totally disagree. If you have an OLAP solution, then yea, use the database for the heavy lifting and calculating. Unfortunately, too many people agree with your view and put the heavy lifting in the DB. On a lightly loaded DB, that's fine, it works and you can get away with it, so go ahead and do it. If you have an OLTP solution that is handling 1000's of requests a second, you don't have the luxury of putting business logic in the DB without bogging down the communication with the DB as well. This, too often, also introduces blocking and deadlocks, which means either retries, or lost processes. I was shuddering about that one situation because with megs of rows being read, you can't load the data into a data table and close the connection. You open the connection and leave it open while you are running the BL in a loop. This produces a long-running transaction which can cause blocking and deadlocks. The proper way of handling this is either to handle a single request, download the data, close the connection, run the BL, update the DB, close the connection or execute a queuing process. In the queuing process, you have a queuing table, add each request as it comes in. Your service asks for 50 transactions from the queue and closes the transaction. The DB has enough logic to lock up to 50 transactions in the table at a time and record when the transaction is locked and then download the 50 transactions. You schedule a cleanup job. Any transactions that have been locked for over 5 minutes, you free up because something is wrong. Your service app then loops through those 50 transactions and handles each request and updates the BL tables and then removes the transaction record from the queue. At least that's the way it works in theory. I came into a group and was told they were having some kind of problem with their queuing logic. I had a main task I was assigned to do, so I worked on that task. I got it to the point where it was ready for testing and asked to get access to their lab servers. If the queuing was being handled properly, the queue table should have 0 to 10 rows. The lab had millions of rows in the table. I ran profiler and found out that it

Dave Kreskowiak

KP Lee wrote:

Unfortunately, too many people agree with your view and put the heavy lifting in the DB. On a lightly loaded DB, that's fine, it works and you can get away with it, so go ahead and do it. If you have an OLTP solution that is handling 1000's of requests a second, you don't have the luxury of putting business logic in the DB without bogging down the communication with the DB as well. This, too often, also introduces blocking and deadlocks, which means either retries, or lost processes.

You design the solution to fit the problem and it's scale. If you have a DB that's handling thousands of requests a second, of course your not going to put this kind of load on the box that does the lifting. If the problem scales to this large, the solution has to be of the scale along with it. You put the workload on dedicated boxes that don't handle the client requests themselves. Back the truck up. Go back to the OP's posts and look at the scale of his problem. Are you going to drop your enterprise solution on him? Does he have the skill to understand it? Implement it? Debug it? Support it?

KP Lee wrote:

The proper way of handling this is either to handle a single request, download the data, close the connection, run the BL, update the DB, close the connection or execute a queuing process. In the queuing process, you have a queuing table, add each request as it comes in. Your service asks for 50 transactions from the queue and closes the transaction. The DB has enough logic to lock up to 50 transactions in the table at a time and record when the transaction is locked and then download the 50 transactions. You schedule a cleanup job. Any transactions that have been locked for over 5 minutes, you free up because something is wrong.

As you've demonstrated in your own post, the person designing and writing the code is part of the scale of the problem directly affecting the scale of the solution, not just the problem itself.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak

KP Lee

Dave Kreskowiak wrote:

You design the solution to fit the problem and it's scale.

That part I totally agree with. I did say "using datareader to read megs of rows." That's much bigger scale than the original poster indicated. Hopefully when you get to that scale, you handle your problems piecemeal.