Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. RegEx remove duplicate need help

RegEx remove duplicate need help

Scheduled Pinned Locked Moved Regular Expressions
regexhelptutorialquestion
6 Posts 5 Posters 38 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    manoo88
    wrote on last edited by
    #1

    Hi, not good at RegEx and trying to remove some duplicate values. data is in CSV, part of it looks like this: "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/fe519c1e91c5e4983a70a2512fd5788b.jpg , "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/4956e4fe56b59135c086605c9gyye.png "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg , "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg "/content/1/1458767/images/856657.jpg "/content/1/1448511/images/856373.jpg I am trying the following using Notepad++: \w+\.+jpg|\w+\.+png(?:^|\G)(\b\w+\b),?(?=.*\1) it does select one image by one image when clicking find, but I am not sure how to delete duplicates, when I replace it with empty, it removes all, I want to let the first image remain and delete the duplicates, Anyone can help me with the code, please? I don't want to remove the line, because it could mess with the CSV file, removing the extension of the duplicate image is OK, Thanks for your help.

    M A J 3 Replies Last reply
    0
    • M manoo88

      Hi, not good at RegEx and trying to remove some duplicate values. data is in CSV, part of it looks like this: "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/fe519c1e91c5e4983a70a2512fd5788b.jpg , "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/4956e4fe56b59135c086605c9gyye.png "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg , "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg "/content/1/1458767/images/856657.jpg "/content/1/1448511/images/856373.jpg I am trying the following using Notepad++: \w+\.+jpg|\w+\.+png(?:^|\G)(\b\w+\b),?(?=.*\1) it does select one image by one image when clicking find, but I am not sure how to delete duplicates, when I replace it with empty, it removes all, I want to let the first image remain and delete the duplicates, Anyone can help me with the code, please? I don't want to remove the line, because it could mess with the CSV file, removing the extension of the duplicate image is OK, Thanks for your help.

      M Offline
      M Offline
      Member 10601191
      wrote on last edited by
      #2

      Hi, requesting some clarification: Given the sample input, please post the expected output, that's to avoid any misunderstanding by me. I could guess but prefer not too. Also, must this be done on Notepad++ (if yes - why ?), and on which OS ?. thks

      1 Reply Last reply
      0
      • M manoo88

        Hi, not good at RegEx and trying to remove some duplicate values. data is in CSV, part of it looks like this: "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/fe519c1e91c5e4983a70a2512fd5788b.jpg , "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/4956e4fe56b59135c086605c9gyye.png "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg , "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg "/content/1/1458767/images/856657.jpg "/content/1/1448511/images/856373.jpg I am trying the following using Notepad++: \w+\.+jpg|\w+\.+png(?:^|\G)(\b\w+\b),?(?=.*\1) it does select one image by one image when clicking find, but I am not sure how to delete duplicates, when I replace it with empty, it removes all, I want to let the first image remain and delete the duplicates, Anyone can help me with the code, please? I don't want to remove the line, because it could mess with the CSV file, removing the extension of the duplicate image is OK, Thanks for your help.

        A Offline
        A Offline
        Andre Oosthuizen
        wrote on last edited by
        #3

        The regex you provided is close, but there are a few modifications needed to achieve your desired result. Here's the correct regex and how you can use it in Notepad++ -

        ("\/.*?\.(?:jpg|png))\s*,\s*(?=.*\1)

        I have tested this regex in Notepad++ using the following steps - 1) Open your CSV file in Notepad++. 2) Press Ctrl + H to open the "Find" dialog. 3) In the "Find what" field, enter the regex: ("\/.*?\.(?:jpg|png))\s*,\s*(?=.*\1). 4) Leave the "Replace with" field empty. 5) In the "Search Mode" section, select "Regular expression". 6) Click on "Replace All". Make sure to have a backup of your data before performing any find and replace operations, just in case...

        1 Reply Last reply
        0
        • M manoo88

          Hi, not good at RegEx and trying to remove some duplicate values. data is in CSV, part of it looks like this: "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/fe519c1e91c5e4983a70a2512fd5788b.jpg , "/content/7/66345/images/590009.jpg , "/content/7/66345/images/590009.jpg , "/attachments/4956e4fe56b59135c086605c9gyye.png "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg , "/content/1/3968663/images/856609.jpg , "/attachments/086605c7c6e4fe56b59135c11b.jpg "/content/1/1458767/images/856657.jpg "/content/1/1448511/images/856373.jpg I am trying the following using Notepad++: \w+\.+jpg|\w+\.+png(?:^|\G)(\b\w+\b),?(?=.*\1) it does select one image by one image when clicking find, but I am not sure how to delete duplicates, when I replace it with empty, it removes all, I want to let the first image remain and delete the duplicates, Anyone can help me with the code, please? I don't want to remove the line, because it could mess with the CSV file, removing the extension of the duplicate image is OK, Thanks for your help.

          J Offline
          J Offline
          jschell
          wrote on last edited by
          #4

          Presumably your expectation is the following 1. The entire row is duplicated. 2. The duplicated row immediately follows the first row. Otherwise I doubt regex is the way to go.

          T 1 Reply Last reply
          0
          • J jschell

            Presumably your expectation is the following 1. The entire row is duplicated. 2. The duplicated row immediately follows the first row. Otherwise I doubt regex is the way to go.

            T Offline
            T Offline
            trønderen
            wrote on last edited by
            #5

            If the ordering of the rows is insignificant you can simply sort the lines to collect the duplicate rows together. (And if necessary, sort again on a key field after you have completed.) But if you want to remove entire duplicated rows (lines), are you serious about using a regex to compare entire text lines for being identical? That can't be! But from the OP's first post, I cannot see what he intends to compare, and what he intends to remove.

            J 1 Reply Last reply
            0
            • T trønderen

              If the ordering of the rows is insignificant you can simply sort the lines to collect the duplicate rows together. (And if necessary, sort again on a key field after you have completed.) But if you want to remove entire duplicated rows (lines), are you serious about using a regex to compare entire text lines for being identical? That can't be! But from the OP's first post, I cannot see what he intends to compare, and what he intends to remove.

              J Offline
              J Offline
              jschell
              wrote on last edited by
              #6

              trønderen wrote:

              are you serious about using a regex to compare entire text lines for being identical

              Myself? No I would not have attempted it with regex at all. I probably would have created a one shot perl script, not for the regex capabilities, but rather because reading files is easier to set up. And running it for iteration testing is easier also. And I would note that the editor I use does have a fairly decent regex. So the lack of that would not have impacted my decision.

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups