General Architecture query - SSIS / XML / BizTalk [modified]
-
Hi I am working on a project to process three files...two CSVs and one XML. These files are being moved from a file share to a SQL Server database table using BizTalk. The xml file is being transformed into the same flat file format as the two flat files using a C# component in SSIS. Then, these flat files are processed by SSIS packages. There is a lot of business logic in the SSIS transformations. The SSIS packages also do several look-ups using linked servers. All lookups and transforms are done on a row-by-row basis (which is slow). Also, any errors that occur are put in a separate database table depending on the business object that causes the error (i.e. BusObj1_error, BusObj2_error, BusObj3_error). Basically, I was hoping someone could suggest a better architecture that would improve performance, allow scalability and flexibility, and allow many developers works as a team on the same pieces of functionality. E.g. - Put validation rules in a db rather than hardcoded into SSIS. - Instead of using different error tables, use a single error table with a errorTypeId FK to an ErrorType table. - Migrate all transformations from SSIS C# so that multiple developers can work on different Business logic classes at same time. Thanks
modified on Wednesday, August 3, 2011 5:28 AM
-
Hi I am working on a project to process three files...two CSVs and one XML. These files are being moved from a file share to a SQL Server database table using BizTalk. The xml file is being transformed into the same flat file format as the two flat files using a C# component in SSIS. Then, these flat files are processed by SSIS packages. There is a lot of business logic in the SSIS transformations. The SSIS packages also do several look-ups using linked servers. All lookups and transforms are done on a row-by-row basis (which is slow). Also, any errors that occur are put in a separate database table depending on the business object that causes the error (i.e. BusObj1_error, BusObj2_error, BusObj3_error). Basically, I was hoping someone could suggest a better architecture that would improve performance, allow scalability and flexibility, and allow many developers works as a team on the same pieces of functionality. E.g. - Put validation rules in a db rather than hardcoded into SSIS. - Instead of using different error tables, use a single error table with a errorTypeId FK to an ErrorType table. - Migrate all transformations from SSIS C# so that multiple developers can work on different Business logic classes at same time. Thanks
modified on Wednesday, August 3, 2011 5:28 AM
First steps when something is "slow". Determine what the goal is to make it not "slow" (so are you looking for 10% or 20000% better.) Second step is to determine why it is "slow" - specifically what parts make it slow. Myself I would suspect SSIS and the linked sources. Putting the raw data directly in the databae, in work tables, and then processing it there to produce the final result would probably be much faster.
-
First steps when something is "slow". Determine what the goal is to make it not "slow" (so are you looking for 10% or 20000% better.) Second step is to determine why it is "slow" - specifically what parts make it slow. Myself I would suspect SSIS and the linked sources. Putting the raw data directly in the databae, in work tables, and then processing it there to produce the final result would probably be much faster.
Thanks jschell What would you say is the ideal architecture for what I have described?
-
Hi I am working on a project to process three files...two CSVs and one XML. These files are being moved from a file share to a SQL Server database table using BizTalk. The xml file is being transformed into the same flat file format as the two flat files using a C# component in SSIS. Then, these flat files are processed by SSIS packages. There is a lot of business logic in the SSIS transformations. The SSIS packages also do several look-ups using linked servers. All lookups and transforms are done on a row-by-row basis (which is slow). Also, any errors that occur are put in a separate database table depending on the business object that causes the error (i.e. BusObj1_error, BusObj2_error, BusObj3_error). Basically, I was hoping someone could suggest a better architecture that would improve performance, allow scalability and flexibility, and allow many developers works as a team on the same pieces of functionality. E.g. - Put validation rules in a db rather than hardcoded into SSIS. - Instead of using different error tables, use a single error table with a errorTypeId FK to an ErrorType table. - Migrate all transformations from SSIS C# so that multiple developers can work on different Business logic classes at same time. Thanks
modified on Wednesday, August 3, 2011 5:28 AM
Fucking ETL, whoever put the transform in ETL should be taken out and shot. Load the data into a staging table that exactly reflects the source data, NO transforms, no data types, make all the fields varchar(###) big enough to service your data fields. Now you have the data in a platform that is designed to manipulate data. Write a stored procedure that transforms the data from the staging table to your target table. This gives you a fast, repeatable process that you can manage in TSQL not some obscure data/row/field/column transform object in your ETL tool. If you don't have TSQL skills then hire someone who does, they are not uncommon. BizTalk, and every other transform tool I have ever had to use are garbage, I can write a bulk copy and stored procedure process that will out perform them every time. The only tool that came close to performance equality has been Informatica that will cost you $100k+ to implement.
Never underestimate the power of human stupidity RAH
-
Fucking ETL, whoever put the transform in ETL should be taken out and shot. Load the data into a staging table that exactly reflects the source data, NO transforms, no data types, make all the fields varchar(###) big enough to service your data fields. Now you have the data in a platform that is designed to manipulate data. Write a stored procedure that transforms the data from the staging table to your target table. This gives you a fast, repeatable process that you can manage in TSQL not some obscure data/row/field/column transform object in your ETL tool. If you don't have TSQL skills then hire someone who does, they are not uncommon. BizTalk, and every other transform tool I have ever had to use are garbage, I can write a bulk copy and stored procedure process that will out perform them every time. The only tool that came close to performance equality has been Informatica that will cost you $100k+ to implement.
Never underestimate the power of human stupidity RAH
Thanks for the response, Mycroft. I would tend to agree with you on using SQL tables and procs.
-
Thanks jschell What would you say is the ideal architecture for what I have described?
-
Fucking ETL, whoever put the transform in ETL should be taken out and shot. Load the data into a staging table that exactly reflects the source data, NO transforms, no data types, make all the fields varchar(###) big enough to service your data fields. Now you have the data in a platform that is designed to manipulate data. Write a stored procedure that transforms the data from the staging table to your target table. This gives you a fast, repeatable process that you can manage in TSQL not some obscure data/row/field/column transform object in your ETL tool. If you don't have TSQL skills then hire someone who does, they are not uncommon. BizTalk, and every other transform tool I have ever had to use are garbage, I can write a bulk copy and stored procedure process that will out perform them every time. The only tool that came close to performance equality has been Informatica that will cost you $100k+ to implement.
Never underestimate the power of human stupidity RAH
Mycroft Holmes wrote:
f***ing ETL, whoever put the transform in ETL should be taken out and shot.
LOL, Tell us how you really feel :) Not that I disagree, the tools I've seen in action make Access loook like a rockstar (tibco and something from oracle)
Common sense is admitting there is cause and effect and that you can exert some control over what you understand.