Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. XML / XSL
  4. Repairing broken XML

Repairing broken XML

Scheduled Pinned Locked Moved XML / XSL
helpcommcpxmlquestion
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    Grimolfr
    wrote on last edited by
    #1

    I have a 3rd-party app that generates some "almost XML" files that I need to parse. It has elements similar to the following:

    <color
    name = "Black"
    colorspace = "CMYK"
    cyan = 0.000000
    magenta = 0.000000
    yellow = 0.000000
    black = 100.000000
    />

    Notice that the attributes with numeric values aren't quoted as they should be. There are also a few empty elements that appear as <data   > (the element name and three spaces), although it's an empty element and should be <data />. These two deviations from true XML are making it impossible for me to simply load the XML into an XMLDocument so that I can easily access the elements I need. I don't normally work with XML a whole lot. I was wondering if anyone knows of any "simple" methods or an existing library that can correct these errors in the XML as it's read from the file. The empty element problem I think I can deal with pretty easily with a simple search/replace, as it seems there's only one element in the file that's ever munged this way, but the missing quotes problem is much bigger, as 99% of the numeric attribures are broken, in all elements. TIA for any help with this.


    Grim

    (aka Toby)

    MCDBA, MCSD, MCP+SB

    Need a Second Life?[^]

    SELECT * FROM user WHERE clue IS NOT NULL GO

    (0 row(s) affected)

    D P 2 Replies Last reply
    0
    • G Grimolfr

      I have a 3rd-party app that generates some "almost XML" files that I need to parse. It has elements similar to the following:

      <color
      name = "Black"
      colorspace = "CMYK"
      cyan = 0.000000
      magenta = 0.000000
      yellow = 0.000000
      black = 100.000000
      />

      Notice that the attributes with numeric values aren't quoted as they should be. There are also a few empty elements that appear as <data   > (the element name and three spaces), although it's an empty element and should be <data />. These two deviations from true XML are making it impossible for me to simply load the XML into an XMLDocument so that I can easily access the elements I need. I don't normally work with XML a whole lot. I was wondering if anyone knows of any "simple" methods or an existing library that can correct these errors in the XML as it's read from the file. The empty element problem I think I can deal with pretty easily with a simple search/replace, as it seems there's only one element in the file that's ever munged this way, but the missing quotes problem is much bigger, as 99% of the numeric attribures are broken, in all elements. TIA for any help with this.


      Grim

      (aka Toby)

      MCDBA, MCSD, MCP+SB

      Need a Second Life?[^]

      SELECT * FROM user WHERE clue IS NOT NULL GO

      (0 row(s) affected)

      D Offline
      D Offline
      DavidNohejl
      wrote on last edited by
      #2

      hi, Library you need will (hopefully) be my school work :) meanwhile, you can check HTML Tidy (http://www.w3.org/People/Raggett/tidy/[^]) It has some XML support. best regards, David 'DNH' Nohejl Never forget: "Stay kul and happy" (I.A.)

      G 1 Reply Last reply
      0
      • D DavidNohejl

        hi, Library you need will (hopefully) be my school work :) meanwhile, you can check HTML Tidy (http://www.w3.org/People/Raggett/tidy/[^]) It has some XML support. best regards, David 'DNH' Nohejl Never forget: "Stay kul and happy" (I.A.)

        G Offline
        G Offline
        Grimolfr
        wrote on last edited by
        #3

        Thanks, David. I took a look at Tidy, but since it's specific to HTML and won't process a file with unknown tags, it won't work for me in its existing incarnation. The source code, however, will give me some good insight into how to parse the XML and correct it myself on-the-fly.


        Grim

        (aka Toby)

        MCDBA, MCSD, MCP+SB

        Need a Second Life?[^]

        SELECT * FROM user WHERE clue IS NOT NULL GO

        (0 row(s) affected)

        1 Reply Last reply
        0
        • G Grimolfr

          I have a 3rd-party app that generates some "almost XML" files that I need to parse. It has elements similar to the following:

          <color
          name = "Black"
          colorspace = "CMYK"
          cyan = 0.000000
          magenta = 0.000000
          yellow = 0.000000
          black = 100.000000
          />

          Notice that the attributes with numeric values aren't quoted as they should be. There are also a few empty elements that appear as <data   > (the element name and three spaces), although it's an empty element and should be <data />. These two deviations from true XML are making it impossible for me to simply load the XML into an XMLDocument so that I can easily access the elements I need. I don't normally work with XML a whole lot. I was wondering if anyone knows of any "simple" methods or an existing library that can correct these errors in the XML as it's read from the file. The empty element problem I think I can deal with pretty easily with a simple search/replace, as it seems there's only one element in the file that's ever munged this way, but the missing quotes problem is much bigger, as 99% of the numeric attribures are broken, in all elements. TIA for any help with this.


          Grim

          (aka Toby)

          MCDBA, MCSD, MCP+SB

          Need a Second Life?[^]

          SELECT * FROM user WHERE clue IS NOT NULL GO

          (0 row(s) affected)

          P Offline
          P Offline
          Philip Fitzsimons
          wrote on last edited by
          #4

          I would think that a Regular Expression would be the best way to fix this - RegExp


          "When the only tool you have is a hammer, a sore thumb you will have."

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups