Extract certain text from a html page.
-
I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.
-
I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.
This is how you extract the source: http://www.experts-exchange.com/Programming/Languages/C\_Sharp/Q\_20739698.html i don't know the best way to grab the usernames, but i would probably put the whole source into a string, substring it by figuring out where the highscorelist starts and ends, and then maybe split the tablerows apart. But that's alot of code, and i think you should look into regular expressions, which might solve your problem in a better way.
-
This is how you extract the source: http://www.experts-exchange.com/Programming/Languages/C\_Sharp/Q\_20739698.html i don't know the best way to grab the usernames, but i would probably put the whole source into a string, substring it by figuring out where the highscorelist starts and ends, and then maybe split the tablerows apart. But that's alot of code, and i think you should look into regular expressions, which might solve your problem in a better way.
Ok thanks, Im going to try that btw that link is to a site where they will help you if you pay and I don't like that. Does anyone have anymore insight on this? Thanks, Buckley.
-
Ok thanks, Im going to try that btw that link is to a site where they will help you if you pay and I don't like that. Does anyone have anymore insight on this? Thanks, Buckley.
scroll to the bottom of the page that i linked to :) EDIT: weird, if you come from google you can see the answer.. paste the link onto google.com and visit it from there. then scroll to the bottom.
-
scroll to the bottom of the page that i linked to :) EDIT: weird, if you come from google you can see the answer.. paste the link onto google.com and visit it from there. then scroll to the bottom.
Ok I looked at I already know how to get the html source, but I need to retrieve only the username. This is the html source. http://paste-it.net/public/dfe778b/[^] On line 339 contains on of the usernames "Kingduffy 1" but it's not always on line 339 so thats why i need to know how to strip it and retrieve all the usernames on that page.
-
I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.
What I've done in the past to grab information from a web page is to take the web page returned as a string from the StreamReader.ReadToEnd() method of the StreamReader used to get the web page and break it into an array of HTML tokens. It is pretty starightforward to scan the array to find the data you want. The tokenizer I created to do this is as follows: /// <summary> /// Tokenize the passed string which contains an HTML page into HTML elements /// </summary> /// <param name="InStr">The HTML page to parse.</param> /// <returns>An array of strings that contains the seperate elements of the passed HTML page.</returns> private string[] Tokenize(string InStr) { ArrayList buf = new ArrayList(); int begin = 0, end = 0; bool in_tag = false; while (end != -1) // IndexOf returns -1 when end of string encountered { if (!in_tag) { end = InStr.IndexOf("<", begin); // find index of start of next HTML tag if (begin < end) // if there is length to the token. buf.Add(HttpUtility.HtmlDecode(InStr.Substring(begin, end - begin))); // Add token to list begin = end; in_tag = true; } else { end = InStr.IndexOf(">", begin); // find index of end of HTML tag buf.Add(InStr.Substring(begin, end - begin + 1)); // Add HTML tag to list. begin = end + 1; in_tag = false; } } return ((string[])buf.ToArray(typeof(string))); }
-
What I've done in the past to grab information from a web page is to take the web page returned as a string from the StreamReader.ReadToEnd() method of the StreamReader used to get the web page and break it into an array of HTML tokens. It is pretty starightforward to scan the array to find the data you want. The tokenizer I created to do this is as follows: /// <summary> /// Tokenize the passed string which contains an HTML page into HTML elements /// </summary> /// <param name="InStr">The HTML page to parse.</param> /// <returns>An array of strings that contains the seperate elements of the passed HTML page.</returns> private string[] Tokenize(string InStr) { ArrayList buf = new ArrayList(); int begin = 0, end = 0; bool in_tag = false; while (end != -1) // IndexOf returns -1 when end of string encountered { if (!in_tag) { end = InStr.IndexOf("<", begin); // find index of start of next HTML tag if (begin < end) // if there is length to the token. buf.Add(HttpUtility.HtmlDecode(InStr.Substring(begin, end - begin))); // Add token to list begin = end; in_tag = true; } else { end = InStr.IndexOf(">", begin); // find index of end of HTML tag buf.Add(InStr.Substring(begin, end - begin + 1)); // Add HTML tag to list. begin = end + 1; in_tag = false; } } return ((string[])buf.ToArray(typeof(string))); }
Whoops. The posting converted the '<' and '>' characters to the HTML equivalent '>' and '<' repectively making this hard to read. Instead of cluttering this up with posting a new snippet email me at liedtke@frii.com if you want the code. Brian
-
I'm trying to make a username grabber from a game's highscores page. http://hiscore.runescape.com/hiscores.ws[^] The name I want to grab will be in the center where it says "KingDuffy 1" How would I go about doing this? Thanks, Buckley.