Parsing ASP pages with Regex object
-
Can anyone tell me what am I doing wrong with the code on the bottom. The coee on the bottom is code that I wrote into a function that tries to correct URLs with a constant variable so that the pages could be reused for other sites. Though, at some point, it seems to have a hard time detecting one section of code where <% starts on one line and %> ends on another. I can only wonder if I'm using the wrong pattern to detect it, but test cases where I filled a bunch of \r\n in the middle shows me it works. So, somehow, the pattern is not finding the VBScript section of the file correctly. If anyone can help, please answer. All suggestions and advice are welcomed.
Regex reDB = new Regex("<%[^(%>)]*%>"); Match match = reDB.Match(FileContent); int initialText = 0; StringBuilder builder = new StringBuilder(); while (match.Success) { // Now, scan inside ASP code to see if http is present. string ActivePart = match.Value; // Now, in the active part, try to find http or https. We know that // it's under quotations at this point Regex reURL = new Regex("(http\\:\\/\\/)*(https\\:\\/\\/)*www.company.com"); MatchCollection URLs = reURL.Matches(ActivePart); for(int j = 0; j < URLs.Count; j++) { builder.Append(FileContent.Substring(initialText, match.Index - initialText)); if( URLs[j].Value.IndexOf("https") == 0) { builder.Append("\" + BASESECUREURL + \""); } else { builder.Append("\" + BASEURL + \""); } initialText = URLs[j].Index + URLs[j].Length; } match = match.NextMatch(); } builder.Append(FileContent.Substring(initialText)); return builder.ToString();
NOTE: if you see a wink smiley, replace it close paren character. I can't seem to turn that smiley off to make the code look right. Frank http://www.frankliao.com