SQL query to read all the records one by one
-
Hi everyone, I'm using a huge database of geostatistical blocks (million of records). I would like to read the records one by one and process only some. I tried the datareader but it's slowing terribly. Is there a read next or something like that in SQL? I'm programming in C#. Thanks in advance, Yvon
If you are only processing a subset of records why not cut down the size of the result set by using a where clause to filter out those records you will not need? (Sorry if you already know that but your question isn't clear). Like:
select * from [table] where [column] = 'foo'
It's best to bring back only those records you'll actually need: perhaps if you were to explain your need a little more clearly: e.g. are you intending to do this in Sql or c#?me, me, me "The dinosaurs became extinct because they didn't have a space program. And if we become extinct because we don't have a space program, it'll serve us right!" Larry Niven nils illegitimus carborundum
-
If you are only processing a subset of records why not cut down the size of the result set by using a where clause to filter out those records you will not need? (Sorry if you already know that but your question isn't clear). Like:
select * from [table] where [column] = 'foo'
It's best to bring back only those records you'll actually need: perhaps if you were to explain your need a little more clearly: e.g. are you intending to do this in Sql or c#?me, me, me "The dinosaurs became extinct because they didn't have a space program. And if we become extinct because we don't have a space program, it'll serve us right!" Larry Niven nils illegitimus carborundum
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
-
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
Does it need to be ordered? If it does at least put an index on [IJK] that should help. Consider indexing the other columns. Have you created an execution plan? How does that look?
me, me, me "The dinosaurs became extinct because they didn't have a space program. And if we become extinct because we don't have a space program, it'll serve us right!" Larry Niven nils illegitimus carborundum
-
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
-
Hi everyone, I'm using a huge database of geostatistical blocks (million of records). I would like to read the records one by one and process only some. I tried the datareader but it's slowing terribly. Is there a read next or something like that in SQL? I'm programming in C#. Thanks in advance, Yvon
Yvon Bourgouin wrote:
million of records
Yvon Bourgouin wrote:
I would like to read the records one by one
Can you spot the problem here. Trying to iterate over millions of rows will, of course, be slow. Why would you not limit the returned results to the rows you actually want?
-
This way your selecting all records (however in parts though). Where's your filter (WHERE CLAUSE), since you need only some records?
I didn't put the WHERE clause because it's very complex but it's working.
-
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
That is going to be really slow, since retrieving e.g. records 4801-5000 will use 5,000 records' worth of processing to retrieve 200 records; retrieving records 49,801-50,000 will use 50,000 records' worth of processing to retrieve 200 records. To get through 1,000,000 records would require processing about 2,500,000,000. To add insult to injury, deleting records from the table may cause some records (not involved in the deletion) to be skipped. Adding records may cause some records to be duplicated, but that's a far smaller problem. If the value in column IJK uniquely identifies records, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" where parameter @lastijk is the last value you retrieved. I'd guess that IJK is probably unique, since if it isn't your code won't work properly at any speed, but if it is not unique, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" followed by "SELECT ... WHERE [IJK] = @lastijk" (no TOP 200 qualifier there, and use the last record from the first query for @lastijk); all of the records where IJK equals @lastijk will appear in the second query, so you should ignore any items from the first query where IJK equals @lastijk. Incidentally, to get reasonable performance, you'll need to have an index--preferably clustered--on @lastijk. I'd guess you probably do have one, or else your performance would be really slow for even the first 200 records.
-
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
Yvon Bourgouin wrote:
At the beginning, it's quite fast but the performances are degrading quite fast
I'm not surprised. Think about what you are asking the database to do here: SELECT TOP 200... SELECT TOP 200 WHERE NOT IN (SELECT TOP 200...) SELECT TOP 200 WHERE NOT IN (SELECT TOP 400...) SELECT TOP 200 WHERE NOT IN (SELECT TOP 600...) ... repeat lots of times ... SELECT TOP 200 WHERE NOT IN (SELECT TOP 1,000,000...) If I understand you correctly, you are running the same query over and over, asking the database to do more work every time. You say that you are only processing a small number of the rows. So, why fetch them all? Why not put something into the WHERE clause to cut down the number of rows you select? Once you have fetched these rows, how do you decide which ones you will process and which ones you will ignore and what is stopping you from moving that selection logic back to the database? Sorry if that's obvious, I'm sure you have considered it, but I don't quite understand why you don't do that.
-
In fact, I had something like SELECT [IJK],[XC],[YC],[ZC] FROM [MSDP]; and it was taking 10 to 15 minutes to get the records. I tried to change it to SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]. It's fast for the first 200 records. To get the following records, I use SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] WHERE [IJK] NOT IN (SELECT TOP 200 [IJK],[XC],[YC],[ZC] FROM [MSDP] ORDER BY [IJK]) ORDER BY [IJK]; the second 200 changes in a loop to 400,600 and so on. At the beginning, it's quite fast but the performances are degrading quite fast. Any help?
Returning millions of records takes time. If you can't limit the results via a WHERE clause, you might like to try the following approach;
CREATE PROCEDURE GetData
(
@start INT
)
AS
WITH Query
AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY [IJK] ASC
) as position,
[IJK],
[XC],
[YC],
[ZC]
FROM [MSDP]
)
SELECT [IJK],
[XC],
[YC],
[ZC]
FROM Query
WHERE position BETWEEN @start AND @start + 200To run the sp, you use
EXEC GetData 200000
[Replace the 200000 with the start value]. You could also try returning more than 200 records by changing the sp.
-
Returning millions of records takes time. If you can't limit the results via a WHERE clause, you might like to try the following approach;
CREATE PROCEDURE GetData
(
@start INT
)
AS
WITH Query
AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY [IJK] ASC
) as position,
[IJK],
[XC],
[YC],
[ZC]
FROM [MSDP]
)
SELECT [IJK],
[XC],
[YC],
[ZC]
FROM Query
WHERE position BETWEEN @start AND @start + 200To run the sp, you use
EXEC GetData 200000
[Replace the 200000 with the start value]. You could also try returning more than 200 records by changing the sp.
Thanks, that might solve my problem...
-
That is going to be really slow, since retrieving e.g. records 4801-5000 will use 5,000 records' worth of processing to retrieve 200 records; retrieving records 49,801-50,000 will use 50,000 records' worth of processing to retrieve 200 records. To get through 1,000,000 records would require processing about 2,500,000,000. To add insult to injury, deleting records from the table may cause some records (not involved in the deletion) to be skipped. Adding records may cause some records to be duplicated, but that's a far smaller problem. If the value in column IJK uniquely identifies records, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" where parameter @lastijk is the last value you retrieved. I'd guess that IJK is probably unique, since if it isn't your code won't work properly at any speed, but if it is not unique, use "SELECT TOP 200 ... WHERE [IJK] > @lastijk" followed by "SELECT ... WHERE [IJK] = @lastijk" (no TOP 200 qualifier there, and use the last record from the first query for @lastijk); all of the records where IJK equals @lastijk will appear in the second query, so you should ignore any items from the first query where IJK equals @lastijk. Incidentally, to get reasonable performance, you'll need to have an index--preferably clustered--on @lastijk. I'd guess you probably do have one, or else your performance would be really slow for even the first 200 records.
Thanks, looks good to me, the performance was degrading quickly, this would probably fix it...
-
I didn't put the WHERE clause because it's very complex but it's working.
Your logic must be really really really complex, if it can not be translated to a WHERE clause. However assuming that you are right, you can always at least cut out some part of the records with the WHERE clause. Look at the search logic you have written in your application, find the single largest search parameter, and translate only that to sql.
My advice is free, and you may get what you paid for.