Filter HTML Content Before Displaying in WebBrowser
-
I need to write a .NET MDI web browser that supports content filtering but I don't quite know where to start. I would like to filter content before it is displayed in the WebBrowser control but I don't know the best practice for doing this. I want to change the HTML to remove links to sounds, Flash animations, images and whatever else is defined by the user. It doesn't look as though any of the WebBrowser events will work. Same with the MSHTML DOM events. Should I be doing this even farther up-stream? Perhaps using a proxy server object and filtering the content there? - John
-
I need to write a .NET MDI web browser that supports content filtering but I don't quite know where to start. I would like to filter content before it is displayed in the WebBrowser control but I don't know the best practice for doing this. I want to change the HTML to remove links to sounds, Flash animations, images and whatever else is defined by the user. It doesn't look as though any of the WebBrowser events will work. Same with the MSHTML DOM events. Should I be doing this even farther up-stream? Perhaps using a proxy server object and filtering the content there? - John
Two options : - change programmatically IE settings, ( only while your app is running of course), so that animations and/or images are not downloaded or played. Search codeproject for that. Someone showed how to do this in the past. (it is not a matter of hacking down the registry, because it would require the user relaunches the browser). - you can host the web browser control, and subscribe for navigationcomplete events or the like. Doing so, you can access the HTML tree with the DOM API. You've got a starting point here[^]. And you've got hooking techniques here[^] and here[^].
MS quote (http://www.microsoft.com/ddk) : As of September 30, 2002, the Microsoft® Windows® 2000 DDK, the Microsoft Windows 98 DDK, and the Microsoft Windows NT® 4.0 DDK will no longer be available for purchase or download on this site.
-
Two options : - change programmatically IE settings, ( only while your app is running of course), so that animations and/or images are not downloaded or played. Search codeproject for that. Someone showed how to do this in the past. (it is not a matter of hacking down the registry, because it would require the user relaunches the browser). - you can host the web browser control, and subscribe for navigationcomplete events or the like. Doing so, you can access the HTML tree with the DOM API. You've got a starting point here[^]. And you've got hooking techniques here[^] and here[^].
MS quote (http://www.microsoft.com/ddk) : As of September 30, 2002, the Microsoft® Windows® 2000 DDK, the Microsoft Windows 98 DDK, and the Microsoft Windows NT® 4.0 DDK will no longer be available for purchase or download on this site.
Thanks a lot Stephane. The two articles you posted will do the trick I think. I don't like the first because I believe it's inappropriate to globally modify IE settings. What if my app crashes and the changes don't get "undone." I wasn't aware of the NavigateComplete2 bug in .NET but that would definitely explain why my modifications were not being reflected in the HTML in the browser. If you were to write a browser application that would allow users to create customized filtering rules, would you utilize the hooking techniques in your articles or is there a different approach? - John
-
Thanks a lot Stephane. The two articles you posted will do the trick I think. I don't like the first because I believe it's inappropriate to globally modify IE settings. What if my app crashes and the changes don't get "undone." I wasn't aware of the NavigateComplete2 bug in .NET but that would definitely explain why my modifications were not being reflected in the HTML in the browser. If you were to write a browser application that would allow users to create customized filtering rules, would you utilize the hooking techniques in your articles or is there a different approach? - John
Quite honestly, if I had the time I would not use IE at all. Better use a recompiled Mozilla or other programmable browsers with source code. And put web page filtering on top of it. What has stopped me many times already with programming IE is that there are many cases where the IE API doesn't work as expected. For instance, when you have frames in the web page, several events are not triggered at all, or you have to subscribe special things. None of this is documented. So it takes time, and there is no guarantee at all you succeed. Besides that, IE uses multiple threads. One thread for each picture being downloaded for instance, which means it is hard to stop the process as a whole. If you go start programming IE, then you may consider this scenario : - I assume you've got the target URL - use MSHTML as a COM object to download the html code from that URL. (you also could use the wininet library). Store it as a file. To do this, look for the MSDN sample called "Walkall" (C++ code). - filter the html code - add the <base href="URL"> tag inside the HTML code, in order to mimic the right base domain. - ask IE to load this html code in the browser : document.load(...)
MS quote (http://www.microsoft.com/ddk) : As of September 30, 2002, the Microsoft® Windows® 2000 DDK, the Microsoft Windows 98 DDK, and the Microsoft Windows NT® 4.0 DDK will no longer be available for purchase or download on this site.