Which type of Regex best to learn for programming with C?
-
I like C but I feel it's Achilles heel is string processing. I've started to do a lot of parsing of text databases in arbitrary format without documentation lately and I need to adapt. What I need to do is define patterns - expected format for the data and to store the values only if the whole string matches that known pattern. Input validation. I'd rather not run the rest of my code without verifying the input conforms. I think regular expressions are the best way to augment my existing skills without learning a new language, but regexes seem kind of varied and mixed breed. Perl (5?) Seems to have formal standardization of regexes which is supported in many searching and text editing programs. There's also PCRE which I can compile on windows or download precompiled lib/dll. Should I learn Perl regexes and use PCRE or am I overlooking things?
-
I like C but I feel it's Achilles heel is string processing. I've started to do a lot of parsing of text databases in arbitrary format without documentation lately and I need to adapt. What I need to do is define patterns - expected format for the data and to store the values only if the whole string matches that known pattern. Input validation. I'd rather not run the rest of my code without verifying the input conforms. I think regular expressions are the best way to augment my existing skills without learning a new language, but regexes seem kind of varied and mixed breed. Perl (5?) Seems to have formal standardization of regexes which is supported in many searching and text editing programs. There's also PCRE which I can compile on windows or download precompiled lib/dll. Should I learn Perl regexes and use PCRE or am I overlooking things?
-
There are many websites that help to learn regexes, Expresso Regular Expression Tool[^] is a popular one. But you will also need a support library, as C does not have native support for them.
As far as I can tell there are at least three main types; POSIX basic, POSIX extended, and Perl Compatible. There's a list of engines here: Comparison of regular expression engines - Wikipedia[^] And apparently some differences between PERL and PCRE: Perl Compatible Regular Expressions - Wikipedia[^] I don't know/understand if let's say 80 or 97 percent of the Regex syntax is the same between one version or another or if they are distinct subtypes with significant differences. I don't know if they all support ascii, Unicode, and utf encoding, or whether they are all capable of returning matched variables or if some are only providing a match/no match result.
-
As far as I can tell there are at least three main types; POSIX basic, POSIX extended, and Perl Compatible. There's a list of engines here: Comparison of regular expression engines - Wikipedia[^] And apparently some differences between PERL and PCRE: Perl Compatible Regular Expressions - Wikipedia[^] I don't know/understand if let's say 80 or 97 percent of the Regex syntax is the same between one version or another or if they are distinct subtypes with significant differences. I don't know if they all support ascii, Unicode, and utf encoding, or whether they are all capable of returning matched variables or if some are only providing a match/no match result.
-
I like C but I feel it's Achilles heel is string processing. I've started to do a lot of parsing of text databases in arbitrary format without documentation lately and I need to adapt. What I need to do is define patterns - expected format for the data and to store the values only if the whole string matches that known pattern. Input validation. I'd rather not run the rest of my code without verifying the input conforms. I think regular expressions are the best way to augment my existing skills without learning a new language, but regexes seem kind of varied and mixed breed. Perl (5?) Seems to have formal standardization of regexes which is supported in many searching and text editing programs. There's also PCRE which I can compile on windows or download precompiled lib/dll. Should I learn Perl regexes and use PCRE or am I overlooking things?
PCRE is a good option. It is based on the Perl regexe, although there are some minor differences under the hood (which I do not remember now). I wrote my own C++ template regex some years ago that gives me full control of behavior in my personal projects. I used other libraries like PCRE, for comparison, in my test bed, to test for speed and accuracy. That is why I know that there are some minor differences on what one considers valid and invalid syntax (implementation differences or programmers mind set - who knows?). As for Cs ability to process strings or any other data type - it is very efficient. I used to be able to look a C-code and translate it, in my head, directly to the equivalent assembly code. What you are talking about is the standard C libraries, which were designed to provide only the simple low level functionality that programmers require to develop more complex algorithms (how many ways are there to write a 'strcmp' function?). It was left to others to provide libraries that required more than a simple 'for' or while 'loop' in their functions. That being said, when I find my self doing contract work on old C-code, where I am not allowed to upgrade or use external libraries, I recreate some simple algorithms for parsing (hey its their money, so who am I to argue with a brick wall). Basically, I create equivalent functions for parsing sub-strings like the regex "\d*" or "[abd]" and wrap them in a function call - depending on what I am looking for. What little testing I have done has actually shown me that they were more efficient than using the MS implementation of regex (not a surprise). Conclusion: C is the most efficient language I have ever work with - there is a reason that all of the modern operating systems, I have worked with, were written in C. (I have not checked lately, so it is possible that C++ snuck in their some were).
INTP "Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra "I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone