Posts made by Jason Manfield

Jason Manfield

I have the following URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=2&p=1&f=G&l=50&d=ptxt&S1=((teeth+OR+member)+OR+provide)&OS=+(teeth+OR+member)+OR+provide&RS=((teeth+OR+member)+OR+provide) Accessing it with the following code snippet (which works for other sites): HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); throws a WebException (timeout). Accessing the same URL with IE works fine. I compared the request header sent by the code and the one sent by IE and they are identical. The user agent has been set to "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)" I even increased the request timeout to a massively huge value and it didn't make any difference.

Jason Manfield

Given a table name (e.g. CUSTOMERS), how do I get column names of the table and the data types of the columns the table has? I am using SQLServer and would like to get that info from my C# code.

Jason Manfield

I have a URL that I can open in the browser, but System.Uri removes a slash when it encounters an extra http: in the URL. Here is the URL: http://citeseer.ist.psu.edu/rd/55811103,124,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/1473/http:zSzzSzmaui.theoinf.tu-ilmenau.dezSzforschungzSzlinkszSz..zSzdoczSzhybrid\_automata.pdf/alur92hybrid.pdf System.Uri changes the above URL to (notice that http:// after Download is changed to http:/) http://citeseer.ist.psu.edu/rd/55811103,124,1,0.25,Download/http:/citeseer.ist.psu.edu/cache/papers/cs/1473/http:zSzzSzmaui.theoinf.tu-ilmenau.dezSzforschungzSzlinkszSz..zSzdoczSzhybrid\_automata.pdf/alur92hybrid.pdf As a result, I get NameResolution exception when making HttpWebRequest.GetResponse() calls.

Jason Manfield

Using HttpUtility.Encode results in System.Uri throwing an exception, since it can't process the encoded info. Uri uri = new Uri(HttpUtility.UrlEncode(url));

Jason Manfield

I have the following URL: http://citeseer.ist.psu.edu/rd/55811103,653,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/7145/http:zSzzSzwww.stanford.eduzSzclasszSzcs343zSzpszSzpathprof.pdf/ball96efficient.pdf Creating System.Uri with the above URL: Uri uri = new Uri(aboveURL) removes the extra slash after "Download/http://". The debugger shows it as Download/http:/citeseer.ist.... The Uri in HttpWebRequest (which is created with the above URL) also removes the extra slash. As a result, I get NameResolution exception. The Uri is shown as http://citeseer.ist.psu.edu/rd/55811103,653,1,0.25,Download/http:/citeseer.ist.psu.edu/cache/papers/cs/7145/http:zSzzSzwww.stanford.eduzSzclasszSzcs343zSzpszSzpathprof.pdf/ball96efficient.pdf

Jason Manfield

I am trying to crawl using the following code snippet. HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)"; ... By default, the req.AllowAutoRedirect is true and MaximumAutomaticRedirections is 50. When I try to crawl the following URL. http://citeseer.ist.psu.edu/rd/55811103,653,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/7145/http:zSzzSzwww.stanford.eduzSzclasszSzcs343zSzpszSzpathprof.pdf/ball96efficient.pdf I get NameResolutionFailure exception. However, I am able to open this URL from the browser and it gets redirected to: http://citeseer.ist.psu.edu/cache/papers/cs/7145/http:zSzzSzwww.stanford.eduzSzclasszSzcs343zSzpszSzpathprof.pdf/ball96efficient.pdf How do I force my C# code to go to the redirected url?

Jason Manfield

What is the difference (pros and cons) between retrieving data from the web using System.Web.WebClient and using HttpWebRequest and Response to get the data? The WebClient download methods seem to neatly encapsulate the multiple steps (request.Create; request.GetResponse; response.GetResponseStream ...) required with the traditional HttpWebRequest/Response approach. I am trying to crawl urls and download data.

Jason Manfield

Thanks. Setting UserAgent helped.

Jason Manfield

For some URLs (e.g.http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005028634&F=0&QPN=WO2005028634), the content length for the HttpWebResponse I get with request.GetResponse in empty. The response.GetResponseStream() also empty. Here is the code snippet: HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress); HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); StreamReader sr = new StreamReader(resp.GetResponseStream()); string pageData = sr.ReadToEnd(); The Content Type for the response is "text/html; charset=iso-8859-1" and the HttpStatusCode was OK. What am I missing?

Jason Manfield

For some URLs (e.g.http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005028634&F=0&QPN=WO2005028634), the content length for the HttpWebResponse I get with request.GetResponse in empty. The response.GetResponseStream() also empty. Here is the code snippet: HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress); HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); StreamReader sr = new StreamReader(resp.GetResponseStream()); string pageData = sr.ReadToEnd(); The Content Type for the response is "text/html; charset=iso-8859-1" and the HttpStatusCode was OK. What am I missing?