regular expressions

xkrja

I'm creating a code editor with syntax highlighting and need help with the regular expressions. I've read a couple pages about it but can't figure out the exact way to do what I want. For now, I only need support for comments and keywords like 'if', 'else', and so on. For the comments I created the following pattern:

string commentPattern = @"%.*";

It means that it matches everything to the right of, and including, the '%' character. This should be colored green. This works fine. The next pattern is for the keywords and looks like this:

It means it matches any of these words. These keywords are colored blue. It works fine too. BUT, when I run these two patterns parallell the keywords are colored blue even if they are in a comment. How can I create a pattern that doesn't color the keywords blue if there is a '%' character to the left of them??? Thanks for help!

Ravadre

Probably it can be done in regular expressions themselves, but if I were you, I'd consider making your highligher a bit more intelligent. Parse your code chunk by chunk, word by word, then once you find a comment, you will just move to next line, when you find a keyword, you will move to next word etc. It can come handy for more complicated cases. Other way is of course parsing the text, which would be probably to complicated for your needs.

xkrja

Thanks for your reply. Can you perhaps give a little more detail on this approach?

Ravadre

Lets say you have: int x; % foo % bar x = 5; %x = 5 Your rules: keyword = 'int' ident = Everything that starts with _ or letter and has letters,digits,_ after that comment = Starts with % to newline. Now you write simple lexer that will scan letter by letter, trying to fit what you have to as many possibilities as you can. When no possibilites are left, you go back one letter, and find first one that fits. So, for our example would be:

buffer: what it can be:
'i' int or ident
'in' int or ident
'int' int or ident
'int ' nothing. Go back 1 letter
'int' first rule that fits is keyword int, so it's int
' ' fits nothing from beggining, so ignore it
'x' fits ident
'x ' fits nothing, go back
...

So generally, you will get lists of tokens, their start and end position, so you just color them :).

xkrja

Thanks for your help. I'll take a look at it!