Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Extract certain text from a html page.

Extract certain text from a html page.

Scheduled Pinned Locked Moved C#
htmlcomgame-devquestion
8 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Offline
    B Offline
    Buckleyindahouse
    wrote on last edited by
    #1

    I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.

    J B 3 Replies Last reply
    0
    • B Buckleyindahouse

      I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.

      J Offline
      J Offline
      Jan Sommer
      wrote on last edited by
      #2

      This is how you extract the source: http://www.experts-exchange.com/Programming/Languages/C\_Sharp/Q\_20739698.html i don't know the best way to grab the usernames, but i would probably put the whole source into a string, substring it by figuring out where the highscorelist starts and ends, and then maybe split the tablerows apart. But that's alot of code, and i think you should look into regular expressions, which might solve your problem in a better way.

      B 1 Reply Last reply
      0
      • J Jan Sommer

        This is how you extract the source: http://www.experts-exchange.com/Programming/Languages/C\_Sharp/Q\_20739698.html i don't know the best way to grab the usernames, but i would probably put the whole source into a string, substring it by figuring out where the highscorelist starts and ends, and then maybe split the tablerows apart. But that's alot of code, and i think you should look into regular expressions, which might solve your problem in a better way.

        B Offline
        B Offline
        Buckleyindahouse
        wrote on last edited by
        #3

        Ok thanks, Im going to try that btw that link is to a site where they will help you if you pay and I don't like that. Does anyone have anymore insight on this? Thanks, Buckley.

        J 1 Reply Last reply
        0
        • B Buckleyindahouse

          Ok thanks, Im going to try that btw that link is to a site where they will help you if you pay and I don't like that. Does anyone have anymore insight on this? Thanks, Buckley.

          J Offline
          J Offline
          Jan Sommer
          wrote on last edited by
          #4

          scroll to the bottom of the page that i linked to :) EDIT: weird, if you come from google you can see the answer.. paste the link onto google.com and visit it from there. then scroll to the bottom.

          B 1 Reply Last reply
          0
          • J Jan Sommer

            scroll to the bottom of the page that i linked to :) EDIT: weird, if you come from google you can see the answer.. paste the link onto google.com and visit it from there. then scroll to the bottom.

            B Offline
            B Offline
            Buckleyindahouse
            wrote on last edited by
            #5

            Ok I looked at I already know how to get the html source, but I need to retrieve only the username. This is the html source. http://paste-it.net/public/dfe778b/[^] On line 339 contains on of the usernames "Kingduffy 1" but it's not always on line 339 so thats why i need to know how to strip it and retrieve all the usernames on that page.

            1 Reply Last reply
            0
            • B Buckleyindahouse

              I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.

              B Offline
              B Offline
              Bliedtke
              wrote on last edited by
              #6

              What I've done in the past to grab information from a web page is to take the web page returned as a string from the StreamReader.ReadToEnd() method of the StreamReader used to get the web page and break it into an array of HTML tokens. It is pretty starightforward to scan the array to find the data you want. The tokenizer I created to do this is as follows: /// <summary> /// Tokenize the passed string which contains an HTML page into HTML elements /// </summary> /// <param name="InStr">The HTML page to parse.</param> /// <returns>An array of strings that contains the seperate elements of the passed HTML page.</returns> private string[] Tokenize(string InStr) { ArrayList buf = new ArrayList(); int begin = 0, end = 0; bool in_tag = false; while (end != -1) // IndexOf returns -1 when end of string encountered { if (!in_tag) { end = InStr.IndexOf("<", begin); // find index of start of next HTML tag if (begin < end) // if there is length to the token. buf.Add(HttpUtility.HtmlDecode(InStr.Substring(begin, end - begin))); // Add token to list begin = end; in_tag = true; } else { end = InStr.IndexOf(">", begin); // find index of end of HTML tag buf.Add(InStr.Substring(begin, end - begin + 1)); // Add HTML tag to list. begin = end + 1; in_tag = false; } } return ((string[])buf.ToArray(typeof(string))); }

              B 1 Reply Last reply
              0
              • B Bliedtke

                What I've done in the past to grab information from a web page is to take the web page returned as a string from the StreamReader.ReadToEnd() method of the StreamReader used to get the web page and break it into an array of HTML tokens. It is pretty starightforward to scan the array to find the data you want. The tokenizer I created to do this is as follows: /// <summary> /// Tokenize the passed string which contains an HTML page into HTML elements /// </summary> /// <param name="InStr">The HTML page to parse.</param> /// <returns>An array of strings that contains the seperate elements of the passed HTML page.</returns> private string[] Tokenize(string InStr) { ArrayList buf = new ArrayList(); int begin = 0, end = 0; bool in_tag = false; while (end != -1) // IndexOf returns -1 when end of string encountered { if (!in_tag) { end = InStr.IndexOf("<", begin); // find index of start of next HTML tag if (begin < end) // if there is length to the token. buf.Add(HttpUtility.HtmlDecode(InStr.Substring(begin, end - begin))); // Add token to list begin = end; in_tag = true; } else { end = InStr.IndexOf(">", begin); // find index of end of HTML tag buf.Add(InStr.Substring(begin, end - begin + 1)); // Add HTML tag to list. begin = end + 1; in_tag = false; } } return ((string[])buf.ToArray(typeof(string))); }

                B Offline
                B Offline
                Bliedtke
                wrote on last edited by
                #7

                Whoops. The posting converted the '<' and '>' characters to the HTML equivalent '>' and '<' repectively making this hard to read. Instead of cluttering this up with posting a new snippet email me at liedtke@frii.com if you want the code. Brian

                1 Reply Last reply
                0
                • B Buckleyindahouse

                  I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.

                  B Offline
                  B Offline
                  Bliedtke
                  wrote on last edited by
                  #8

                  Joshua, Your email address is bouncing. It is the gmail.com account. Re-email me with a valid address. Brian

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups