How to Improve the performance of a database application

Uma Kameswari

I have a problem with the performance of the application in case of huge volumes of data. I have two tables. Each of them has 30+lakhs of records. I have to extract data from one table and store these records in a temporary table. next I have to extract data from the second table and update the temporary table for each and every record. For this purpose I have to write a update command in a loop of roughly 3 lakh records. Means the loop executes 3 lakh times. As a result of this the performace is getting effected very badly. can some one let me know the points to be taken care when working with huge volumes of data.

Wendelius

Uma Kameswari wrote:

can some one let me know the points to be taken care when working with huge volumes of data

Your question isn't very specific so some general notes: - correct design flaws - use efficient indexing - use features in SQL language - try to get rid of middle steps - optimize the database - optimize the hardware

Uma Kameswari wrote:

Each of them has 30+lakhs of records

I don't know what 30+lakhs means. It could be millions of rows, but if it's only for example 30000 that's not much.

The need to optimize rises from a bad design.My articles[^]

Luc Pattyn

Mika Wendelius wrote:

I don't know what 30+lakhs means.

30+lakhs is more than 3 million; that is a lot of rows... :)

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. :confused:

Wendelius

Luc Pattyn wrote:

30+lakhs is more than 3 million

Thanks, good to know :)

Luc Pattyn wrote:

that is a lot of rows...

It depends. If the logic OP described is executed continuously, then it's a lot, but then again I would say it's a design problem in that case. If the operation is executed for example once a day and the desired throughput time is, let's say less than 1 minute, it should be doable (of course depending on the overall logic). :)

The need to optimize rises from a bad design.My articles[^]

Uma Kameswari

Thanks for the suggestion. I am working with accounting tables. subledger and general ledger. I developed an application that should show me the difference between subledger and general ledger. the design is as follows 1. retrieve the data from the subledger table into a dataset and copy this dataset into a new table. 2. retrieve data from the general ledger table into a dataset and run a loop . 3. within the loop update the rows of new table created in step 1 and if no row is updated then add a new row. 4. a stored procedure that uses cursor on the new table to set the status based on comparison of data from 3 fields of the new table. The time taken for performing the above steps is 1 hour when 28556 records are fetched. Hope this is clear. When all the 3 million records are retrieved it took roughly 3 days.

Dave Kreskowiak

Uma Kameswari wrote:

a stored procedure that uses cursor

There's your biggest design problem that I can see. I have no idea what updates you need to make, but if performance is an issue, try to eliminate the need for opening a CURSOR. They are VERY expensive to use and their performance sucks.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
2006, 2007, 2008

Uma Kameswari

can u suggest me an alternative

Dave Kreskowiak

Like I said, I know nothing of the updates you're trying to do, so no, I can't make any suggestions. The lake of detail in your question and the absense of any code snippets makes it pretty much impossible to give you any alternatives, if at all possible.

A guide to posting questions on CodeProject[^]
Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
2006, 2007, 2008

Uma Kameswari

Here is the code snippet in the application this is my first step

"select rm1.docnumbr DOCNUMBR,rm1.custnmbr CUSTNMBR,rm1.trxsorce TRXSORCE,rm2.docdate DOCDATE,rm2.postdate POSTDATE,rm2.glpostdt GLPOSTDT,PSTUSRID,ORTRXAMT,rm2.DEX_ROW_ID H_DEX_ROW_ID,rm1.DEX_ROW_ID D_DEX_ROW_ID, rm1.dstindx,rm1.debitamt,rm1.crdtamnt,(select rtrim(actnumbr_1) + '-'+rtrim(actnumbr_2) + '-'+rtrim(actnumbr_3) + '-' + rtrim(actnumbr_4) + '-' + rtrim(actnumbr_5) + '-' + rtrim(actnumbr_6) + '-' + rtrim(actnumbr_7) from gl00100 where actindx = rm1.dstindx) 'Account',seqnumbr from rm10101 rm1,rm20101 rm2 where rm2.docdate between convert(datetime,'" + dtFrom.ToShortDateString() + "',103) and convert(datetime,'" + dtTo.ToShortDateString() + "',103 and rm1.docnumbr = rm2.docnumbr and rm1.custnmbr = rm2.custnmbr and rm1.rmdtypal = rm2.rmdtypal and rm1.dstindx in(" + strAcc + ")"

In the above query strAcc,dtFrom,dtTo are user inputs. the results of the above query are taken into a dataset. a loop for the above resultset storing the data into a new dataset which is used in bulk copy into a new table as follows.

DataTable DT = new DataTable();
DT = ds.Tables["SRC"];

            #region bulkcopy
            string strDB = conn.Database.ToString();
            SqlBulkCopy bk = new SqlBulkCopy(conn);
            bk.DestinationTableName = strDB + ".dbo.gl\_sl\_cmpr";
            bk.WriteToServer(DT);

This code is getting executed with no issue. Now I have to get records from general ledger and and update the above table gl_sl_cmpr if no updation is done then I have to insert a new row. For this I am writing the below code SELECT JRNENTRY,trxdate,trxsorce,USWHPSTD,ortrxsrc,orgntsrc,orctrnum,debitamt,crdtamnt,Dex_row_id,actindx,(select rtrim(actnumbr_1) + '-'+rtrim(actnumbr_2) + '-'+rtrim(actnumbr_3) + '-' + rtrim(actnumbr_4) + '-' + rtrim(actnumbr_5) + '-' + rtrim(actnumbr_6) + '-' + rtrim(actnumbr_7) from gl00100 g1 where g1.actindx = g.actindx) 'account', origseqnum FROM GL20000 g WHERE trxdate between convert(datetime,'" + dtMinGLPostDt.ToShortDateString() + "',103) and convert(datetime,'" + dtMaxGLPostDt.ToShortDateString() + "',103) and actindx in (" + strAcc + ") and series <> 2 and correspondingunit <> 1 the result of this query is taken into a dataset and following loop is executed.

for (int i = 0; i < ds.Tables["InGL"].Rows.Count; i++)
{
iJrnEntry = Convert.ToInt32(ds.Table

Wendelius

The are several issues. Lets start with the procedure. If I interpreted the logic correctly, something like following statements should do the same updates as your procedure:

update gl_sl_cmpr
set status = 1
WHERE status not in (2,4)
AND ( OnAccount = (SELECT sum(SLDr) FROM from gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno)
OR OnAccount = (SELECT sum(SLCr) FROM from gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno))
OR ( (SELECT sum(SLDr) FROM gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno)
= (SELECT sum(GLDr) FROM gl_sl_cmpr sub2 where sub2.sopno = gl_sl_cmpr.sopno)
AND (SELECT sum(SLCr) FROM gl_sl_cmpr sub3 where sub3.sopno = gl_sl_cmpr.sopno)
= (SELECT sum(GLCr) FROM gl_sl_cmpr sub4 where sub4.sopno = gl_sl_cmpr.sopno)))

update gl_sl_cmpr
set status = 6
WHERE status not in (2,4)
AND ( OnAccount != (SELECT sum(SLDr) FROM from gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno)
AND OnAccount != (SELECT sum(SLCr) FROM from gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno))
AND ( (SELECT sum(SLDr) FROM gl_sl_cmpr sub1 where sub1.sopno = gl_sl_cmpr.sopno)
!= (SELECT sum(GLDr) FROM gl_sl_cmpr sub2 where sub2.sopno = gl_sl_cmpr.sopno)
OR (SELECT sum(SLCr) FROM gl_sl_cmpr sub3 where sub3.sopno = gl_sl_cmpr.sopno)
!= (SELECT sum(GLCr) FROM gl_sl_cmpr sub4 where sub4.sopno = gl_sl_cmpr.sopno)))

That should speed up a lot. Then, you partially have the logic at client side and partially in the database. Try to use the database (do all the updates at server side) Also I think you have design issues if this logic is needed often since it's for example relying on summed fields and seems that it has redundancy etc. Redesign on these areas should have a major impact on performance

The need to optimize rises from a bad design.My articles[^]

Uma Kameswari

Thank you for the suggestion and uopdate query. Need to work it out and check.The query seems to be similar to what I am looking for.

Uma Kameswari

Hi Mika Wendelius, The alternate update statements given by you helped me a lot in speeding up the process. I have few more areas where I need to re-design the logic. I am giving the code below. Please suggest alternative for this. roughly the loop is going to run for 100,000 to 300,000 rows.

        for (int i = 0; i < ds.Tables\["InGL"\].Rows.Count; i++)
                {
                    iJrnEntry = Convert.ToInt32(ds.Tables\["InGL"\].Rows\[i\]\["JRNENTRY"\]);
                    dtTrxDate = Convert.ToDateTime(ds.Tables\["InGL"\].Rows\[i\]\["trxdate"\]);
                    strTrxSorce = ds.Tables\["InGL"\].Rows\[i\]\["trxsorce"\].ToString();
                    strActIndex = ds.Tables\["InGL"\].Rows\[i\]\["account"\].ToString();
                    strGLUser = ds.Tables\["InGL"\].Rows\[i\]\["USWHPSTD"\].ToString();
                    strOrgnTrxSorce = ds.Tables\["InGL"\].Rows\[i\]\["orgntsrc"\].ToString();
                    strOrgnDocNo = ds.Tables\["InGL"\].Rows\[i\]\["orctrnum"\].ToString();
                    strOrTrxSrc = ds.Tables\["InGL"\].Rows\[i\]\["ortrxsrc"\].ToString();
                    dclGLDr = Convert.ToDecimal(ds.Tables\["InGL"\].Rows\[i\]\["debitamt"\]);
                    dclGLCr = Convert.ToDecimal(ds.Tables\["InGL"\].Rows\[i\]\["crdtamnt"\]);
                    dexrowid = Convert.ToInt32(ds.Tables\["InGL"\].Rows\[i\]\["Dex\_row\_id"\]);
                    iGLACT = Convert.ToInt32(ds.Tables\["InGL"\].Rows\[i\]\["actindx"\]);
                    iGlSeqNo = Convert.ToInt32(ds.Tables\["InGL"\].Rows\[i\]\["origseqnum"\]);
                    strOrgnDocNo = strOrgnDocNo.Replace("'", "''");
                    strSource = strOrgnTrxSorce.Trim();
                    if (strOrgnTrxSorce.Trim() == "")
                        strSource = strOrTrxSrc;
                    cmd.CommandText = "select count(\*) from gl\_sl\_cmpr where sopno='" + strOrgnDocNo + "' and actindex ='" + strActIndex + "' and trxsorce='" + strSource + "'";
                    cmd.ExecuteNonQuery();
                    if (ds.Tables.Contains("matchcount"))
                        ds.Tables.Remove("matchcount");
                    da.Fill(ds, "matchcount");
                    
                    
                    reccnt = Convert.ToInt32(ds.Tables\["matchcount"\].Rows\[0\]\[0\]);

                    if (reccnt == 0)
                    {
                        cmd.CommandText = " insert into gl\_sl\_cmpr(jrnentry,gltrxdate,SOURCE,GLACTINDEX,

Wendelius

Since this is a big loop, I think you have two options which both can be used: - Move the logic to the server side. If necessary create the loop inside a stored procedure - Try to eliminate the loops. Use the same technique as I showed you with the previous post. Try to use the logic of a SQL statement You will gain boost to your program when you start writing operations using sets. SQL is a set based language and most efficient in it. Currently you're creating logic with a record based approach. That's never going to be as efficient as pure SQL.

The need to optimize rises from a bad design.My articles[^]

Uma Kameswari

I tried moving the entire logic to server side but still did not find any improvement. The problem is with the huge data being handled in a loop. I need a way to eliminate the loop. If I have to redo the loop with a select statement as we did in the previous case I am having some problem. I am taking the value of 2 fields based on the field value. I am giving the code below. Let me know if this can be handled in a select query directly.I think we can use switch case in the select statement but I am not sure.

SELECT JRNENTRY,trxdate,trxsorce,USWHPSTD,ortrxsrc,orgntsrc,orctrnum,debitamt,
crdtamnt,Dex_row_id,actindx,
(select rtrim(actnumbr_1) + '-'+rtrim(actnumbr_2) + '-'+
rtrim(actnumbr_3) + '-' + rtrim(actnumbr_4) + '-' +
rtrim(actnumbr_5) + '-' +
rtrim(actnumbr_6) + '-' + rtrim(actnumbr_7) from gl00100 g1
where g1.actindx = g.actindx) 'account',
origseqnum FROM GL20000 g WHERE trxdate between convert(datetime,'01/04/2007',103) and
convert(datetime,'31/03/2008',103) and
actindx in (4818,25,3189) and series <> 2
and correspondingunit <> 1

the above query gets me a record set. in the loop for the above record set i am taking value of few fields for insertion or updation based on the value of the field. the logic is as below

for (int i = 0; i < ds.Tables["InGL"].Rows.Count; i++)
{
strOrgnTrxSorce = ds.Tables["InGL"].Rows[i]["orgntsrc"].ToString();
strOrgnDocNo = ds.Tables["InGL"].Rows[i]["orctrnum"].ToString();
strOrTrxSrc = ds.Tables["InGL"].Rows[i]["ortrxsrc"].ToString();
strSource = strOrgnTrxSorce.Trim();
if (strOrgnTrxSorce.Trim() == "")
strSource = strOrTrxSrc;
select count(*) from gl_sl_cmpr where sopno='" + strOrgnDocNo + "'
and actindex ='" + strActIndex + "' and trxsorce='" + strSource + "'"

}

based on the "select count(*) ... " query count I will either insert or update a table. Now my problem is do I have any way of including the above logic in a single sql statement. Also you suggested me to use loops if necessary with in stored procedure. I tried using cursor in stored procedure by moving the logic from .net but still it is taking the same time.

modified on Tuesday, December 23, 2008 12:48 AM

Uma Kameswari

I modified the loop from the application as follows but I have a problem here I cannot give insert/update command in case.Is there any way to solve this. If there exists an alternative solution then it will help a lot. The queries are working correctly I tested them. The following is the code

select
case (select count(*) from test a
where a.trxsorce = case rtrim(b.orgntsrc)
when '' then
b.ortrxsrc
else b.orgntsrc
end
and a.act=b.actindx and a.sopno = b.orctrnum)
when 0 then
insert into test
(gltrxdate,jrnentry,source,glact,glactindex,uswhpstd,ortrxsrc,
orgctrnum,gldr,glcr,gldexrowid,status)
select trxdate,jrnentry,trxsorce,account,actindx,uswhpstd,orgntsrc,
orctrnum,debitamt,crdtamnt,dex_row_id,2
from gl20000 where dex_row_id=b.dex_row_id

when 1 then
update test set
gltrxdate = b.trxdate,jrnentry = b.jrnentry,
source = b.trxsorce,glact = b.account,
glactindex = b.actindx,uswhpstd = b.uswhpstd,
ortrxsrc = b.orgntsrc,orgctrnum = b.orctrnum,
gldr = b.debitamt,
glcr = b.crdtamnt,gldexrowid = b.dex_row_id,status =7
where sopno = b.orctrnum and actindx = b.actindx and
trxsorce = b.trxsorce

else
insert into test
(gltrxdate,jrnentry,source,glact,glactindex,uswhpstd,ortrxsrc,
orgctrnum,gldr,glcr,gldexrowid,status)
select trxdate,jrnentry,trxsorce,account,actindx,uswhpstd,orgntsrc,
orctrnum,debitamt,crdtamnt,dex_row_id,7
from gl20000 where dex_row_id=b.dex_row_id

end
from gl20000 b where
trxdate between convert(datetime,'01/04/2007',103) and
convert(datetime,'30/04/2007',103) and actindx in
(select actindx from gl00100 where actnumbr_6=1300) and
series <> 2 and correspondingunit <> 1

Wendelius

Quickly looking you just have to split this to two separate statements: One update and one insert.

The need to optimize rises from a bad design.My articles[^]

Uma Kameswari

How to split them. The insert or Update statement is not allowed in case statement.

Wendelius

Uma Kameswari wrote:

How to split them. The insert or Update statement is not allowed in case statement

Don't use single statement. Create a different insert statement and different update statement and execute both separately. Instead of using case in select, use where conditions, like:

insert into table (columns)
select columns
from anothertable
where ConditionsToBeMet

and

update table
set columns
where ConditionsToBeMet

Just make the conditions so that they define the logic you defined in case statement.

The need to optimize rises from a bad design.My articles[^]