Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. Remove all double quotes not directly preceded or directly followed by a semicolon.

Remove all double quotes not directly preceded or directly followed by a semicolon.

Scheduled Pinned Locked Moved Regular Expressions
helpdatabasesql-serversysadminwindows-admin
5 Posts 3 Posters 4 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    Guus2005
    wrote on last edited by
    #1

    Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time

    Richard DeemingR J 2 Replies Last reply
    0
    • G Guus2005

      Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time

      Richard DeemingR Offline
      Richard DeemingR Offline
      Richard Deeming
      wrote on last edited by
      #2

      How about using zero-width negative look-ahead/behind assertions?

      (? Demo[^]


      "These people looked deep within my soul and assigned me a number based on the order in which I joined."
      ** - Homer**

      "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

      G 1 Reply Last reply
      0
      • Richard DeemingR Richard Deeming

        How about using zero-width negative look-ahead/behind assertions?

        (? Demo[^]


        "These people looked deep within my soul and assigned me a number based on the order in which I joined."
        ** - Homer**

        G Offline
        G Offline
        Guus2005
        wrote on last edited by
        #3

        Thanks for your quick response! It seems i have oversimplified my test string. "aul";1;200;"aap"noot";"cafe "'t hoekje"";piet The string can start with a double quote, which is ok. Can you fix this easily?

        Richard DeemingR 1 Reply Last reply
        0
        • G Guus2005

          Thanks for your quick response! It seems i have oversimplified my test string. "aul";1;200;"aap"noot";"cafe "'t hoekje"";piet The string can start with a double quote, which is ok. Can you fix this easily?

          Richard DeemingR Offline
          Richard DeemingR Offline
          Richard Deeming
          wrote on last edited by
          #4

          (?Demo[^]


          "These people looked deep within my soul and assigned me a number based on the order in which I joined."
          ** - Homer**

          "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

          1 Reply Last reply
          0
          • G Guus2005

            Hi, I have 3000 large csv files which give an error when i bulk insert them into a sql server table. This is caused by the fact that some text fields, which are surrounded by double quotes sometimes have quotes in them: 1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12" 2;232;312;"Café "Blue Oyster";"Rotterdam";33;"DCBA21" Sometimes 1 and sometimes 2 double quotes too many. They need to be removed or replaced by single quotes. Like this: 1;200;345;"Apotheker Blue tongue";"Apeldoorn";12;"ABCD12" 2;232;312;"Café Blue Oyster";"Rotterdam";33;"DCBA21" In short the solution is this: Remove all double quotes not directly preceded or directly followed by a semicolon. I bought RegexBuddy and RegexMagic to help me on my quest but no solution is forthcomming. I want to use powershell to scan all the files and replace where necessary. I hope you can help me. Thanks for your time

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #5

            This is an example of why I always insist on tabs as a separator (not commas nor semi-colons.)

            Guus2005 wrote:

            Remove all double quotes not directly preceded or directly followed by a semicolon

            You are trying to solve this incorrectly.

            Guus2005 wrote:

            1;200;345;"Apotheker "Blue tongue"";"Apeldoorn";12;"ABCD12"

            You do not want to "remove" the double quotes because they are part of the value. The following is the correct value from the above.

            Apotheker "Blue tongue"

            The pattern for the CSV is as follows 1. Semi-colon separates values. 2. Some values are quoted (double quotes.) For processing for the second case the following applies for the value (not the line but just a value from the line.) 1. The double quotes MUST be at both the end and start of the value. It is ignored if both are not true. 2. The double quotes in that case are removed. Double quotes internal are not impacted. Additionally you need to deal with the potential that there is a semi-colon in the middle of a value. If there is a semi-colon in a value then I doubt you should be using a regex to parse lines. Certainly if I was doing it I would not use a regex. Rather I would build a parser/tokenizer since the rules would be easier to see (and debug). Additionally it would probably be faster also. The tokenizer makes the case with the semi-colon much easier to deal with. The tokenizer rule would be in general 1. Find a semi-colon (start at semi-colon.) 2. If the next character is a double quote, flag a rule that it must look for quote then semi-colon as next break. 3. If the next character is not a double quote, flag a rule that it must look for a semi-colon as next break.

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups