Stored Procedure to Handle CSV
-
I have done some searching, and I do understand how to do some of what I want to do. But I can't just do a bulk insert into the database I'm already using. Sample Data
Real Estate Number,Owner,Street Number,Street,Street Type,Street Direction,Unit,City,Zip Code
004306-5000,MANWARREN JAMES P,10933,BRIDGES,RD,,,Jacksonville,32225-
013092-0164,WILSON DANTE J,8438,MC GIRTS VILLAGE,LN,,,Jacksonville,32225
013100-1035,LEWIS GARNET A,4435,JACKSON HOLE,CT,,,Jacksonville,32225
048496-0040,ALLEN JON R,2830,WICKWIRE,ST,,2,Jacksonville,32225-
112862-1000,CITY OF JACKSONVILLE,0,FT CAROLINE,RD,,,Jacksonville,32225
112863-0000,ONAS CORPORATION,0,FT CAROLINE,RD,,,Jacksonville,32225Database structure
SELECT [PropertyID]
,[RealEstateNumber]
,[Name]
,[StreetNumber]
,[StreetName]
,[Type]
,[Direction]
,[Unit]
,[City]
,[ZipCode]
,[LastReviewed]
FROM [DuvalFL].[dbo].[PropertyLookupInformation]PropertyID is a uniqueidentifier. It defaults to newid() LastReviewed is a datetime. I need to get the CSV file(File name changes each time) into a temp table. I don't have a problem with an actual table being made to do a temporary insert and then dropping it at the end of the procedure. Truncate the Zip Code field to 5 characters.(some of them have a "-" after the first 5 digits) If the Real Estate Number exists in the database then update the record. Otherwise Insert a new record. I could do all this from my program, but the DB is faster than using for or foreach statements to process each record.
-
I have done some searching, and I do understand how to do some of what I want to do. But I can't just do a bulk insert into the database I'm already using. Sample Data
Real Estate Number,Owner,Street Number,Street,Street Type,Street Direction,Unit,City,Zip Code
004306-5000,MANWARREN JAMES P,10933,BRIDGES,RD,,,Jacksonville,32225-
013092-0164,WILSON DANTE J,8438,MC GIRTS VILLAGE,LN,,,Jacksonville,32225
013100-1035,LEWIS GARNET A,4435,JACKSON HOLE,CT,,,Jacksonville,32225
048496-0040,ALLEN JON R,2830,WICKWIRE,ST,,2,Jacksonville,32225-
112862-1000,CITY OF JACKSONVILLE,0,FT CAROLINE,RD,,,Jacksonville,32225
112863-0000,ONAS CORPORATION,0,FT CAROLINE,RD,,,Jacksonville,32225Database structure
SELECT [PropertyID]
,[RealEstateNumber]
,[Name]
,[StreetNumber]
,[StreetName]
,[Type]
,[Direction]
,[Unit]
,[City]
,[ZipCode]
,[LastReviewed]
FROM [DuvalFL].[dbo].[PropertyLookupInformation]PropertyID is a uniqueidentifier. It defaults to newid() LastReviewed is a datetime. I need to get the CSV file(File name changes each time) into a temp table. I don't have a problem with an actual table being made to do a temporary insert and then dropping it at the end of the procedure. Truncate the Zip Code field to 5 characters.(some of them have a "-" after the first 5 digits) If the Real Estate Number exists in the database then update the record. Otherwise Insert a new record. I could do all this from my program, but the DB is faster than using for or foreach statements to process each record.
I'm pretty sure you will not be able to use the table with propertyid as the target for bulkcopy. Column count and I think format must be exact for bulk copy to work. I load everyting into a temp table designed specifically to receive the data, all fields are varchar and after loading I run a stored proc to do the transforms into the final table. This article mat help CSV to Table to CSV [^]
Never underestimate the power of human stupidity RAH
-
I'm pretty sure you will not be able to use the table with propertyid as the target for bulkcopy. Column count and I think format must be exact for bulk copy to work. I load everyting into a temp table designed specifically to receive the data, all fields are varchar and after loading I run a stored proc to do the transforms into the final table. This article mat help CSV to Table to CSV [^]
Never underestimate the power of human stupidity RAH
I know the bulk insert won't work on the CSV itself. I know that the below code is incorrect to do much of what I want. Problem is I don't know how make it all work together. The first part should work as is, I just need to pass the file name to the procedure. Edited the first part here to actually work. Plus adding in all the normalization updates I know need to be there.
CREATE TABLE [dbo].[TempCsvImport](
[RealEstateNumber] [nvarchar](11) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[StreetNumber] [nvarchar](10) NOT NULL,
[StreetName] [nvarchar](50) NOT NULL,
[Type] [nvarchar](10) NOT NULL,
[Direction] [nvarchar](10) NULL,
[Unit] [nvarchar](20) NULL,
[City] [nvarchar](20) NOT NULL,
[ZipCode] [nvarchar](10) NOT NULL)
GOSET QUOTED_IDENTIFIER OFF
DECLARE @fn varchar(8000)
SELECT @fn = 'C:\Users\owner\Documents\Visual Studio 2010\Projects\UPA Data Miner\UPA Data Miner\bin\Debug\634556001928942124.csv'
DECLARE @cmd varchar(8000)
SET @cmd = "BULK INSERT dbo.[TempCsvImport] FROM '" + @fn + "' WITH(CODEPAGE='RAW',FIELDTERMINATOR = ',', ROWTERMINATOR='\n')"EXEC(@cmd)
GO
UPDATE [dbo].[TempCsvImport]
SET [ZipCode] = SUBSTRING(ZipCode,1,5)
WHERE LEN([ZipCode]) != 5GO
UPDATE [dbo].TempCsvImport
SET RealEstateNumber = REPLACE(RealEstateNumber,"-","")
WHERE CHARINDEX("-",RealEstateNumber) != 0The next part is where I have no clue how to make it all work. The '@' parameters all have information from a record in the TempCsvImport
/*For each record in TempCsvImport*/
SELECT * FROM [dbo].[PropertyLookupInformation]
WHERE [dbo].[PropertyLookupInformation].[RealEstateNumber] = @RealEstateNumber
/*If a record is found*/
UPDATE [dbo].[PropertyLookupInformation]
SET [Name] = @Name
,[StreetNumber] = @StreetNumber
,[StreetName] = @StreetName
,[Type] = @Type
,[Direction] = @Direction
,[Unit] = @Unit
,[City] = @City
,[ZipCode] = @ZipCode
,[LastReviewed] = CURRENT_TIMESTAMP
WHERE RealEstateNumber = @RealEstateNumber/*Otherwise insert*/
INSERT INTO [dbo].[PropertyLookupInformation]
([PropertyID]
,[RealEstateNumber]
,[Name]
,[StreetNumber]
,[StreetName]
,[Type]
,[Direction]
,[Unit]
,[City]
,[ZipCode]
,[LastReviewed])
VALUES
( NEWID()
,@RealEstateNumber -
I know the bulk insert won't work on the CSV itself. I know that the below code is incorrect to do much of what I want. Problem is I don't know how make it all work together. The first part should work as is, I just need to pass the file name to the procedure. Edited the first part here to actually work. Plus adding in all the normalization updates I know need to be there.
CREATE TABLE [dbo].[TempCsvImport](
[RealEstateNumber] [nvarchar](11) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[StreetNumber] [nvarchar](10) NOT NULL,
[StreetName] [nvarchar](50) NOT NULL,
[Type] [nvarchar](10) NOT NULL,
[Direction] [nvarchar](10) NULL,
[Unit] [nvarchar](20) NULL,
[City] [nvarchar](20) NOT NULL,
[ZipCode] [nvarchar](10) NOT NULL)
GOSET QUOTED_IDENTIFIER OFF
DECLARE @fn varchar(8000)
SELECT @fn = 'C:\Users\owner\Documents\Visual Studio 2010\Projects\UPA Data Miner\UPA Data Miner\bin\Debug\634556001928942124.csv'
DECLARE @cmd varchar(8000)
SET @cmd = "BULK INSERT dbo.[TempCsvImport] FROM '" + @fn + "' WITH(CODEPAGE='RAW',FIELDTERMINATOR = ',', ROWTERMINATOR='\n')"EXEC(@cmd)
GO
UPDATE [dbo].[TempCsvImport]
SET [ZipCode] = SUBSTRING(ZipCode,1,5)
WHERE LEN([ZipCode]) != 5GO
UPDATE [dbo].TempCsvImport
SET RealEstateNumber = REPLACE(RealEstateNumber,"-","")
WHERE CHARINDEX("-",RealEstateNumber) != 0The next part is where I have no clue how to make it all work. The '@' parameters all have information from a record in the TempCsvImport
/*For each record in TempCsvImport*/
SELECT * FROM [dbo].[PropertyLookupInformation]
WHERE [dbo].[PropertyLookupInformation].[RealEstateNumber] = @RealEstateNumber
/*If a record is found*/
UPDATE [dbo].[PropertyLookupInformation]
SET [Name] = @Name
,[StreetNumber] = @StreetNumber
,[StreetName] = @StreetName
,[Type] = @Type
,[Direction] = @Direction
,[Unit] = @Unit
,[City] = @City
,[ZipCode] = @ZipCode
,[LastReviewed] = CURRENT_TIMESTAMP
WHERE RealEstateNumber = @RealEstateNumber/*Otherwise insert*/
INSERT INTO [dbo].[PropertyLookupInformation]
([PropertyID]
,[RealEstateNumber]
,[Name]
,[StreetNumber]
,[StreetName]
,[Type]
,[Direction]
,[Unit]
,[City]
,[ZipCode]
,[LastReviewed])
VALUES
( NEWID()
,@RealEstateNumberI never do this type of operation from within the database, too fragile with permissions and operations that are not part of a database operation. Like checking for the source file, moving it to archive and general IO work. Break your process into 2 operations, does the bulk load work! Now that you have the data in a table (change it to a temp table later). You logic of using variables implies a cursor, something to avoid if possible. What is wrong with a normal update and insert query?
Update D Set D,Field = S.Field
--Select *
from DestTable D inner join SourceTable S on D.Key = S.KeyInsert DestTable
Select Field1....
From SourceTable
Where KeyField not in(select keyfield from desttable)Never underestimate the power of human stupidity RAH
-
I never do this type of operation from within the database, too fragile with permissions and operations that are not part of a database operation. Like checking for the source file, moving it to archive and general IO work. Break your process into 2 operations, does the bulk load work! Now that you have the data in a table (change it to a temp table later). You logic of using variables implies a cursor, something to avoid if possible. What is wrong with a normal update and insert query?
Update D Set D,Field = S.Field
--Select *
from DestTable D inner join SourceTable S on D.Key = S.KeyInsert DestTable
Select Field1....
From SourceTable
Where KeyField not in(select keyfield from desttable)Never underestimate the power of human stupidity RAH
All the code where I created the temp table and all works like a charm. The update/insert phase is where I have a problem. It has to do with my limited knowledge of SQL. Of course during all this I found out the data was being inserted wrong so I have to redo what I've already inserted which gives me a chance to fix the problem. Thank god it isn't live data. But I finally decided to go with
SET ANSI_WARNINGS OFF
MERGE PropertyLookupInformation as targetDB
USING TempCsvImport as sourceDB
ON targetDB.RealEstateNumber = sourceDB.RealEstateNumber
WHEN MATCHED
THEN
UPDATE
SET Name = sourceDB.Name
,StreetNumber = sourceDB.StreetNumber
,StreetName = sourceDB.StreetName
,Type = sourceDB.Type
,Direction = sourceDB.Direction
,Unit = sourceDB.Unit
,City = sourceDB.City
,ZipCode = sourceDB.ZipCode
,LastReviewed = CURRENT_TIMESTAMP
WHEN NOT MATCHED BY TARGET
THEN
INSERT
(PropertyID
,RealEstateNumber
,Name
,StreetNumber
,StreetName
,Type
,Direction
,Unit
,City
,ZipCode
,LastReviewed)
VALUES
( NEWID()
,sourceDB.RealEstateNumber
,sourceDB.Name
,sourceDB.StreetNumber
,sourceDB.StreetName
,sourceDB.Type
,sourceDB.Direction
,sourceDB.Unit
,sourceDB.City
,sourceDB.ZipCode
,CURRENT_TIMESTAMP)
;The only thing I'm not sure of doing it this way is performance issues. Now just to get it put into a stored proc and we are set and ready to go.
-
All the code where I created the temp table and all works like a charm. The update/insert phase is where I have a problem. It has to do with my limited knowledge of SQL. Of course during all this I found out the data was being inserted wrong so I have to redo what I've already inserted which gives me a chance to fix the problem. Thank god it isn't live data. But I finally decided to go with
SET ANSI_WARNINGS OFF
MERGE PropertyLookupInformation as targetDB
USING TempCsvImport as sourceDB
ON targetDB.RealEstateNumber = sourceDB.RealEstateNumber
WHEN MATCHED
THEN
UPDATE
SET Name = sourceDB.Name
,StreetNumber = sourceDB.StreetNumber
,StreetName = sourceDB.StreetName
,Type = sourceDB.Type
,Direction = sourceDB.Direction
,Unit = sourceDB.Unit
,City = sourceDB.City
,ZipCode = sourceDB.ZipCode
,LastReviewed = CURRENT_TIMESTAMP
WHEN NOT MATCHED BY TARGET
THEN
INSERT
(PropertyID
,RealEstateNumber
,Name
,StreetNumber
,StreetName
,Type
,Direction
,Unit
,City
,ZipCode
,LastReviewed)
VALUES
( NEWID()
,sourceDB.RealEstateNumber
,sourceDB.Name
,sourceDB.StreetNumber
,sourceDB.StreetName
,sourceDB.Type
,sourceDB.Direction
,sourceDB.Unit
,sourceDB.City
,sourceDB.ZipCode
,CURRENT_TIMESTAMP)
;The only thing I'm not sure of doing it this way is performance issues. Now just to get it put into a stored proc and we are set and ready to go.
Brownie points for using the MERGE functions, I dislike them only b/c I consider separate update/insert queries simpler to support. You may be right to suspect the performance but I have no helpful info on that.
Never underestimate the power of human stupidity RAH
-
I have done some searching, and I do understand how to do some of what I want to do. But I can't just do a bulk insert into the database I'm already using. Sample Data
Real Estate Number,Owner,Street Number,Street,Street Type,Street Direction,Unit,City,Zip Code
004306-5000,MANWARREN JAMES P,10933,BRIDGES,RD,,,Jacksonville,32225-
013092-0164,WILSON DANTE J,8438,MC GIRTS VILLAGE,LN,,,Jacksonville,32225
013100-1035,LEWIS GARNET A,4435,JACKSON HOLE,CT,,,Jacksonville,32225
048496-0040,ALLEN JON R,2830,WICKWIRE,ST,,2,Jacksonville,32225-
112862-1000,CITY OF JACKSONVILLE,0,FT CAROLINE,RD,,,Jacksonville,32225
112863-0000,ONAS CORPORATION,0,FT CAROLINE,RD,,,Jacksonville,32225Database structure
SELECT [PropertyID]
,[RealEstateNumber]
,[Name]
,[StreetNumber]
,[StreetName]
,[Type]
,[Direction]
,[Unit]
,[City]
,[ZipCode]
,[LastReviewed]
FROM [DuvalFL].[dbo].[PropertyLookupInformation]PropertyID is a uniqueidentifier. It defaults to newid() LastReviewed is a datetime. I need to get the CSV file(File name changes each time) into a temp table. I don't have a problem with an actual table being made to do a temporary insert and then dropping it at the end of the procedure. Truncate the Zip Code field to 5 characters.(some of them have a "-" after the first 5 digits) If the Real Estate Number exists in the database then update the record. Otherwise Insert a new record. I could do all this from my program, but the DB is faster than using for or foreach statements to process each record.
When I've used BCP in the past, I've created a special table to hold the raw data (all as varchar) and used a trigger to move (and convert) the data to the real destination. This technique also allows for adding foreign keys as necessary. On the other hand, I only use BCP for occasional imports, if I have data I need to import frequently I write a console app or Windows Service to do it.