Regex can be horribly unreliable and a complete pain when unforeseen formats creep up. I recommend using SgmlReader[^] written by a fellow Microsoftie. HTML is, if you don't know, an SGML grammar, as is XML and XHTML (which is actually an XML grammar that only looks like HTML because it uses the XHTML namespace as the default namespace so that namespace prefices aren't required). This posting is provided "AS IS" with no warranties, and confers no rights. Software Design Engineer Developer Division Sustained Engineering Microsoft [My Articles]