Site scraper alert: new article layout
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
http://beta.codeproject.com[^] For the lazy amongst us. :)
Regards, Nish
My technology blog: voidnish.wordpress.com
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
Thanks for the advance notice. No problem whatsoever, I'm not scraping article pages at all. FWIW: I think it again looks better, with a more structured title block. I like the stars. Minor comments: 1. there is a bug where sometimes (had it twice in 5 visits), not sure when/how/why, a tooltip (I think it was "Discuss this article") is already showing when the page opens up, dislocating the "print", "bookmark", "discuss", "report" icons. 2. The pink background seems like a step backward, the white one was more modern IMO. I would opt for a white title box, with an orange border, and containing the new content as is (so the previous style and the latest layout). 3. the spacing between "Edit", "Delete", "Get HTML" could be improved (e.g. by adding a space and a nbsp between edit and delete) 4. The stars should support partial fill, i.e. 4.31 and 4.49 would show differently (that would take them to be rendered separately and chosen from a collection of some 6 to 11 different star symbols). :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
Please don't mention the WsW ever again. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
Please don't mention the WsW ever again. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
Hey Luc, I've been working on a who's who related app. I was originally going to do it with Silverlight, but there were cross-domain issues that I didn't want to have to deal with, so I switched over to WPF. It's not that tough (as long as Chris doesn't change the element IDs). Of course, I had to code around the fact that we're scraping web pages instead of just hitting a web service...
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
Thanks for the advance notice. No problem whatsoever, I'm not scraping article pages at all. FWIW: I think it again looks better, with a more structured title block. I like the stars. Minor comments: 1. there is a bug where sometimes (had it twice in 5 visits), not sure when/how/why, a tooltip (I think it was "Discuss this article") is already showing when the page opens up, dislocating the "print", "bookmark", "discuss", "report" icons. 2. The pink background seems like a step backward, the white one was more modern IMO. I would opt for a white title box, with an orange border, and containing the new content as is (so the previous style and the latest layout). 3. the spacing between "Edit", "Delete", "Get HTML" could be improved (e.g. by adding a space and a nbsp between edit and delete) 4. The stars should support partial fill, i.e. 4.31 and 4.49 would show differently (that would take them to be rendered separately and chosen from a collection of some 6 to 11 different star symbols). :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
I would think he could just draw the partially filled stars dynamically, and just have two pre-drawn stars - a filled one, and an empty one.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I only scrape the actual article page for the original submission date (which shouold by all rights also be found in the article list page - hint, hint).
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
Hey Luc, I've been working on a who's who related app. I was originally going to do it with Silverlight, but there were cross-domain issues that I didn't want to have to deal with, so I switched over to WPF. It's not that tough (as long as Chris doesn't change the element IDs). Of course, I had to code around the fact that we're scraping web pages instead of just hitting a web service...
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001John, I don't mind people scraping the WsW as long as there is no Web Service, however I'd prefer the WsW not to be mentioned in the suggestions forum any more than strictly necessary till then. I trust there is a sleeping dog proverb in English too. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
I would think he could just draw the partially filled stars dynamically, and just have two pre-drawn stars - a filled one, and an empty one.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001There often are many solutions to a single problem, mentioning one often is what it takes to get it done one way or another. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
It looks really nice. :)
-
John, I don't mind people scraping the WsW as long as there is no Web Service, however I'd prefer the WsW not to be mentioned in the suggestions forum any more than strictly necessary till then. I trust there is a sleeping dog proverb in English too. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
Luc Pattyn wrote:
I trust there is a sleeping dog proverb in English too.
'When Dogs sleep, it is an excellent opportunity to lick your own balls'
------------------------------------ I will never again mention that I was the poster of the One Millionth Lounge Post, nor that it was complete drivel. Dalek Dave CCC League Table Link CCC Link[^]
-
Luc Pattyn wrote:
I trust there is a sleeping dog proverb in English too.
'When Dogs sleep, it is an excellent opportunity to lick your own balls'
------------------------------------ I will never again mention that I was the poster of the One Millionth Lounge Post, nor that it was complete drivel. Dalek Dave CCC League Table Link CCC Link[^]
-
I thought it was; When the dog sleeps, is the best time to get your clothes on and the hell out of the house.
Every man can tell how many goats or sheep he possesses, but not how many friends.
-
Luc Pattyn wrote:
I trust there is a sleeping dog proverb in English too.
'When Dogs sleep, it is an excellent opportunity to lick your own balls'
------------------------------------ I will never again mention that I was the poster of the One Millionth Lounge Post, nor that it was complete drivel. Dalek Dave CCC League Table Link CCC Link[^]
Dalek Dave wrote:
When Dogs sleep, it is an excellent opportunity to lick your own balls
I fail to see the logic in that one, however I'll take your word for it, this once. :laugh:
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
Dalek Dave wrote:
When Dogs sleep, it is an excellent opportunity to lick your own balls
I fail to see the logic in that one, however I'll take your word for it, this once. :laugh:
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
It actually feels a little more 'zippy' than the main site. I'm not sure if that's just because it is on a separate web farm or if it is just faster. :) Btw, what happened to the http://alpha.codeproject.com/[^] version?
-
To all those who have written site scrapers because I'm too snowed under to get you your API, a warning: beta.codeproject.com has the latest build and has a new article layout so if you wish to update your code to work with the new markup now's your chance. Upload as soon as I've slept.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I noticed that when I vote for an article the stars at the top do not update unless I manually refresh the page. I also noticed that when I then bookmarked the article the stars disappeared completely and only reappeared once I refreshed the page. Windows XP SP2 w/IE 7
-
It actually feels a little more 'zippy' than the main site. I'm not sure if that's just because it is on a separate web farm or if it is just faster. :) Btw, what happened to the http://alpha.codeproject.com/[^] version?
Andrew Rissing wrote:
what happened to the http://alpha.codeproject.com/\[^\] version?
After over 10 years of development, Code Project Forever has finally come out of alpha. :-D
-
Please don't mention the WsW ever again. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, and update CP Vanity to V2.0 if you haven't already.