SQL query to read all the records one by one

Yvon Bourgouin

In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?

R Giskard Reventlov

Does it need to be ordered? If it does at least put an index on [IJK] that should help. Consider indexing the other columns. Have you created an execution plan? How does that look?

me, me, me "The dinosaurs became extinct because they didn't have a space program. And if we become extinct because we don't have a space program, it'll serve us right!" Larry Niven nils illegitimus carborundum

Scubapro

This way your selecting all records (however in parts though). Where's your filter (WHERE CLAUSE), since you need only some records?

J4amieC

Yvon Bourgouin wrote:

million of records

Yvon Bourgouin wrote:

I would like to read the records one by one

Can you spot the problem here. Trying to iterate over millions of rows will, of course, be slow. Why would you not limit the returned results to the rows you actually want?

Yvon Bourgouin

I didn't put the WHERE clause because it's very complex but it's working.

supercat9

That is going to be really slow, since retrieving e.g. records 4801-5000 will use 5,000 records' worth of processing to retrieve 200 records; retrieving records 49,801-50,000 will use 50,000 records' worth of processing to retrieve 200 records. To get through 1,000,000 records would require processing about 2,500,000,000. To add insult to injury, deleting records from the table may cause some records (not involved in the deletion) to be skipped. Adding records may cause some records to be duplicated, but that's a far smaller problem. If the value in column IJK uniquely identifies records, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" where parameter @lastijk is the last value you retrieved. I'd guess that IJK is probably unique, since if it isn't your code won't work properly at any speed, but if it is not unique, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" followed by "SELECT ... WHERE [IJK] = @lastijk" (no TOP 200 qualifier there, and use the last record from the first query for @lastijk); all of the records where IJK equals @lastijk will appear in the second query, so you should ignore any items from the first query where IJK equals @lastijk. Incidentally, to get reasonable performance, you'll need to have an index--preferably clustered--on @lastijk. I'd guess you probably do have one, or else your performance would be really slow for even the first 200 records.

David Skelly

Yvon Bourgouin wrote:

At the beginning, it's quite fast but the performances are degrading quite fast

I'm not surprised. Think about what you are asking the database to do here: SELECT TOP 200... SELECT TOP 200 WHERE NOT IN (SELECT TOP 200...) SELECT TOP 200 WHERE NOT IN (SELECT TOP 400...) SELECT TOP 200 WHERE NOT IN (SELECT TOP 600...) ... repeat lots of times ... SELECT TOP 200 WHERE NOT IN (SELECT TOP 1,000,000...) If I understand you correctly, you are running the same query over and over, asking the database to do more work every time. You say that you are only processing a small number of the rows. So, why fetch them all? Why not put something into the WHERE clause to cut down the number of rows you select? Once you have fetched these rows, how do you decide which ones you will process and which ones you will ignore and what is stopping you from moving that selection logic back to the database? Sorry if that's obvious, I'm sure you have considered it, but I don't quite understand why you don't do that.

i j russell

Returning millions of records takes time. If you can't limit the results via a WHERE clause, you might like to try the following approach;

CREATE PROCEDURE GetData
(
@start INT
)
AS
WITH Query
AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY [IJK] ASC
) as position,
[IJK],
[XC],
[YC],
[ZC]
FROM [MSDP]
)
SELECT [IJK],
[XC],
[YC],
[ZC]
FROM Query
WHERE position BETWEEN @start AND @start + 200

To run the sp, you use

EXEC GetData 200000

[Replace the 200000 with the start value]. You could also try returning more than 200 records by changing the sp.

Yvon Bourgouin

Thanks, that might solve my problem...

Yvon Bourgouin

Thanks, looks good to me, the performance was degrading quickly, this would probably fix it...

Johan Hakkesteegt

Your logic must be really really really complex, if it can not be translated to a WHERE clause. However assuming that you are right, you can always at least cut out some part of the records with the WHERE clause. Look at the search logic you have written in your application, find the single largest search parameter, and translate only that to sql.

My advice is free, and you may get what you paid for.