Parsing "javascript-driven" webpage
-
Hi Im building a parser for betting websites as a hobby project mainly for educational reasons. I have finished the underling design and database structure and can now parse a couple of websites that are easily parsable (such as Pinnaclesports which uses XML and Expekt where you can get all the odds on a single page). For a page like Expekt I load the page with the WebAii testing framework, which makes life a lot easier when it comes to getting the data from the webpage. But when it comes to websites which have nested tree stuctures and require a lot of clicks and no apparent way of showing lots of odds on a single page I am stuck, or really, I dont know where to begin. Example of such sites are Ladbrokes.com and Nordicbet.com (this one should be a lot easier) Because of the structure (of Ladbrokes), at any given time there will be maybee 2000+ different pages so it seems its not the best way to go to every page individually and parse them. This approach would also be incredibly slow, and I know that there are odds comparision sites out there that parses 80+ bookmakers serveral times every miunte. Thus there must be a better way, I just don´t know what it is and I haven´t found any information about how to do it. Is C# even a good language for this? Any pointers toward where I can begin to look after how to solve my problem or tips of good resources would be greatly appreciated.