Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. How could i analyze and do something to a webpage content?

How could i analyze and do something to a webpage content?

Scheduled Pinned Locked Moved ASP.NET
htmlcomtutorialquestion
6 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W Offline
    W Offline
    wnfk
    wrote on last edited by
    #1

    hi guys, thanks for advance. I'd like to do something as Google translate, but is more simple than google translate, and is somehow like an online webproxy. it should contain the following function descriptions: 1. get all text nodes from a web page 2. do some changes to the text nodes which is the result of the first step 3. put the changes back to original web page 4. display the changed web page again. for example, i do some change to www.bing.com, i'd like to do things as follow: 1. get the webcontent of www.bing.com with WebRequest, let's assume the result is as below:

    <html>
    ...
    <body>
    <span>WebPage</span>
    <span>Pictures</span>
    .....
    <img src="/aa.gif" />
    <input type="text" />
    <input type="submit" value="Search" />
    </body>
    </html>

    2.changed web page content is as below:

    <html>
    ...
    <body>
    <span>MyTranslatedWebPage</span>
    <span>MyTranslatedPictures</span>
    .....
    <img src="http://www.bing.com/aa.gif" />
    <input type="text" />
    <input type="submit" value="Search" />
    </body>
    </html>

    could anybody give me some inputs according the descriptions above? thanks in advance!

    J 1 Reply Last reply
    0
    • W wnfk

      hi guys, thanks for advance. I'd like to do something as Google translate, but is more simple than google translate, and is somehow like an online webproxy. it should contain the following function descriptions: 1. get all text nodes from a web page 2. do some changes to the text nodes which is the result of the first step 3. put the changes back to original web page 4. display the changed web page again. for example, i do some change to www.bing.com, i'd like to do things as follow: 1. get the webcontent of www.bing.com with WebRequest, let's assume the result is as below:

      <html>
      ...
      <body>
      <span>WebPage</span>
      <span>Pictures</span>
      .....
      <img src="/aa.gif" />
      <input type="text" />
      <input type="submit" value="Search" />
      </body>
      </html>

      2.changed web page content is as below:

      <html>
      ...
      <body>
      <span>MyTranslatedWebPage</span>
      <span>MyTranslatedPictures</span>
      .....
      <img src="http://www.bing.com/aa.gif" />
      <input type="text" />
      <input type="submit" value="Search" />
      </body>
      </html>

      could anybody give me some inputs according the descriptions above? thanks in advance!

      J Offline
      J Offline
      Jens Meyer
      wrote on last edited by
      #2

      Hi xianlong, maybe you should take a look into HttpModules. You can find many articles here on codeproject on that topic. You can alter the result stream (your html code for example) in this module just like the way you want with the proxy. Regards Jens

      When in trouble, when in doubt, run in circles, scream and shout

      W 2 Replies Last reply
      0
      • J Jens Meyer

        Hi xianlong, maybe you should take a look into HttpModules. You can find many articles here on codeproject on that topic. You can alter the result stream (your html code for example) in this module just like the way you want with the proxy. Regards Jens

        When in trouble, when in doubt, run in circles, scream and shout

        W Offline
        W Offline
        wnfk
        wrote on last edited by
        #3

        Thanks Jens, the question i meet now is how to alter the result stream/ string. that is how to alter the relative path of the website to a absolute path, for example, when the code is

        <img src="./a.gif" />, <a href="./default.aspx">Home</a>
        good

        how i alter them to

        <img src="http://www.example.com/a.gif" />, <a href="http://www.example.com/default.aspx">Default</a>
        better

        and how to get the word "Home" and "good", and alter them to "Default" and "better" is there any example code you know? thanks again jens!

        1 Reply Last reply
        0
        • J Jens Meyer

          Hi xianlong, maybe you should take a look into HttpModules. You can find many articles here on codeproject on that topic. You can alter the result stream (your html code for example) in this module just like the way you want with the proxy. Regards Jens

          When in trouble, when in doubt, run in circles, scream and shout

          W Offline
          W Offline
          wnfk
          wrote on last edited by
          #4

          Jens Meyer wrote:

          maybe you should take a look into HttpModules. You can find many articles here on codeproject on that topic. You can alter the result stream (your html code for example) in this module just like the way you want with the proxy.

          Thanks Jens, the question i meet now is how to alter the result stream/ string. that is how to alter the relative path of the website to a absolute path, for example, when the code is <pre> <img src="./a.gif" />, <a href="./default.aspx">Home</a> good </pre> how i alter them to <pre> <img src="http://www.example.com/a.gif" />, <a href="http://www.example.com/default.aspx">Default</a> better </pre> and how to get the word "Home" and "good", and alter them to "Default" and "better" is there any example code you know? thanks again jens!

          J 1 Reply Last reply
          0
          • W wnfk

            Jens Meyer wrote:

            maybe you should take a look into HttpModules. You can find many articles here on codeproject on that topic. You can alter the result stream (your html code for example) in this module just like the way you want with the proxy.

            Thanks Jens, the question i meet now is how to alter the result stream/ string. that is how to alter the relative path of the website to a absolute path, for example, when the code is <pre> <img src="./a.gif" />, <a href="./default.aspx">Home</a> good </pre> how i alter them to <pre> <img src="http://www.example.com/a.gif" />, <a href="http://www.example.com/default.aspx">Default</a> better </pre> and how to get the word "Home" and "good", and alter them to "Default" and "better" is there any example code you know? thanks again jens!

            J Offline
            J Offline
            Jens Meyer
            wrote on last edited by
            #5

            Dear xianlong, i assume you have a couple of rules, for example how to translate a link or a specific string. So you have to replace the link and words in the stream according to the rules you have made. For example you translate all addresses in the output stream by replacing the '.' with htt://www.example.com. To do so i would suggest to use reguar expressions. Using RegEx you can get a set of matches and do the same translations for each of the matches. Since you have different tasks (relative path to absolute paths and translations of specific words) i would also suggest to use a different httpmodule for each of them. Im afraid i cant give you any real examples to use since i dont know all the details of you task. But replacements can be achieved quite easily using regular expressions. Regards Jens

            When in trouble, when in doubt, run in circles, scream and shout

            W 1 Reply Last reply
            0
            • J Jens Meyer

              Dear xianlong, i assume you have a couple of rules, for example how to translate a link or a specific string. So you have to replace the link and words in the stream according to the rules you have made. For example you translate all addresses in the output stream by replacing the '.' with htt://www.example.com. To do so i would suggest to use reguar expressions. Using RegEx you can get a set of matches and do the same translations for each of the matches. Since you have different tasks (relative path to absolute paths and translations of specific words) i would also suggest to use a different httpmodule for each of them. Im afraid i cant give you any real examples to use since i dont know all the details of you task. But replacements can be achieved quite easily using regular expressions. Regards Jens

              When in trouble, when in doubt, run in circles, scream and shout

              W Offline
              W Offline
              wnfk
              wrote on last edited by
              #6

              thanks Jens! i've solved the questions. solutions is i write an html analyzer, and analyze the html document to get the dom tree. and then, alter the dom tree is then much easy.

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups