Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. xml regex (for php)

xml regex (for php)

Scheduled Pinned Locked Moved Regular Expressions
phphtmlregexxmlquestion
3 Posts 2 Posters 9 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    fdsfsa76f7sa6
    wrote on last edited by
    #1

    I'd like to extract data from blogspot feed. I've used regex only for Rainmeter, so this is what I came up with:

    (?siU)</id><published>(.*)</published><updated>.*</updated><title type='text'>(.*)</title>.*<link rel='alternate' type='text/html' href='(.*)'

    I assume the "(?siU)" part is wrong. What would be the correct format?

    I've also heard about php's xml_parser, but I think regex is faster. Still, how would I extract same data as in above (broken) regex with xml_parser in php?

    Thanks in advance!

    A 1 Reply Last reply
    0
    • F fdsfsa76f7sa6

      I'd like to extract data from blogspot feed. I've used regex only for Rainmeter, so this is what I came up with:

      (?siU)</id><published>(.*)</published><updated>.*</updated><title type='text'>(.*)</title>.*<link rel='alternate' type='text/html' href='(.*)'

      I assume the "(?siU)" part is wrong. What would be the correct format?

      I've also heard about php's xml_parser, but I think regex is faster. Still, how would I extract same data as in above (broken) regex with xml_parser in php?

      Thanks in advance!

      A Offline
      A Offline
      AspDotNetDev
      wrote on last edited by
      #2

      You should be a little more clear about exactly what you are trying to do with that question mark, but here are some things to keep in mind... If you have well formed XML, an XML parser is almost certainly the way to go. It might actually be faster than a regular expression. Unfortunately, I'm not familiar with PHP's XML parser, but you should take the time to familarize yourself with it. Also, the question mark means "the preceding item is optional". Since the question mark is after an opening paren, there is nothing preceeding it, so I'm not exactly sure what you're after there. Depending on the regular expression engine you use, you can use a similar syntax for positive and negative lookaheads and lookbehinds, and you can use them for named groups. Or if you put a backslash to the left of the question mark, you'll escape it so it matches a literal question mark. But I'm not really sure what you're trying to do here. For example, if you were trying to get the query string value out of a URL, you could use a named group to grab it:

      http://www\.google\.com\?(?<QUERY_STRING>.*)

      Notice I use the question mark twice. The first time as a literal question mark and the second time as part of a named group. Here is another example:

      http://www\.google\.com(?=\?)

      That is a positive lookahead that ensures the character following the "m" is a question mark. But it doesn't actually grab the question mark as part of the pattern, it only ensures that the URL will match if that question mark exists in the right location. And of course, there is this use of the question mark:

      http://www\.google\.com\??

      That means the last question mark is optional. And then there is one more use of question marks (lazy matching rather than greedy matching) that goes like this:

      \<img\>.*?\</img\>

      I'll leave it up to you to figure out what that does if you are interested. One more thing, the less than and greater than signs have a special meaning in regular expressions. You may want to escape them by putting a backslash to the left of them.

      [Forum Guidelines]

      F 1 Reply Last reply
      0
      • A AspDotNetDev

        You should be a little more clear about exactly what you are trying to do with that question mark, but here are some things to keep in mind... If you have well formed XML, an XML parser is almost certainly the way to go. It might actually be faster than a regular expression. Unfortunately, I'm not familiar with PHP's XML parser, but you should take the time to familarize yourself with it. Also, the question mark means "the preceding item is optional". Since the question mark is after an opening paren, there is nothing preceeding it, so I'm not exactly sure what you're after there. Depending on the regular expression engine you use, you can use a similar syntax for positive and negative lookaheads and lookbehinds, and you can use them for named groups. Or if you put a backslash to the left of the question mark, you'll escape it so it matches a literal question mark. But I'm not really sure what you're trying to do here. For example, if you were trying to get the query string value out of a URL, you could use a named group to grab it:

        http://www\.google\.com\?(?<QUERY_STRING>.*)

        Notice I use the question mark twice. The first time as a literal question mark and the second time as part of a named group. Here is another example:

        http://www\.google\.com(?=\?)

        That is a positive lookahead that ensures the character following the "m" is a question mark. But it doesn't actually grab the question mark as part of the pattern, it only ensures that the URL will match if that question mark exists in the right location. And of course, there is this use of the question mark:

        http://www\.google\.com\??

        That means the last question mark is optional. And then there is one more use of question marks (lazy matching rather than greedy matching) that goes like this:

        \<img\>.*?\</img\>

        I'll leave it up to you to figure out what that does if you are interested. One more thing, the less than and greater than signs have a special meaning in regular expressions. You may want to escape them by putting a backslash to the left of them.

        [Forum Guidelines]

        F Offline
        F Offline
        fdsfsa76f7sa6
        wrote on last edited by
        #3

        Thanks for the detailed explanation. After a more extensive searching I found out how to use xml_parser for blogspot feed. It certainly seems easier than regex.

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups