Remove all double quotes not directly preceded or directly followed by a semicolon.
-
Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time
-
Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time
-
-
Thanks for your quick response! It seems i have oversimplified my test string. "aul";1;200;"aap"noot";"cafe "'t hoekje"";piet The string can start with a double quote, which is ok. Can you fix this easily?
-
Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time
This is an example of why I always insist on tabs as a separator (not commas nor semi-colons.)
Guus2005 wrote:
Remove all double quotes not directly preceded or directly followed by a semicolon
You are trying to solve this incorrectly.
Guus2005 wrote:
1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12"
You do not want to "remove" the double quotes because they are part of the value. The following is the correct value from the above.
Apotheker "Blue tongue"
The pattern for the CSV is as follows 1. Semi-colon separates values. 2. Some values are quoted (double quotes.) For processing for the second case the following applies for the value (not the line but just a value from the line.) 1. The double quotes MUST be at both the end and start of the value. It is ignored if both are not true. 2. The double quotes in that case are removed. Double quotes internal are not impacted. Additionally you need to deal with the potential that there is a semi-colon in the middle of a value. If there is a semi-colon in a value then I doubt you should be using a regex to parse lines. Certainly if I was doing it I would not use a regex. Rather I would build a parser/tokenizer since the rules would be easier to see (and debug). Additionally it would probably be faster also. The tokenizer makes the case with the semi-colon much easier to deal with. The tokenizer rule would be in general 1. Find a semi-colon (start at semi-colon.) 2. If the next character is a double quote, flag a rule that it must look for quote then semi-colon as next break. 3. If the next character is not a double quote, flag a rule that it must look for a semi-colon as next break.