I love regular expressions
-
Being a compact way to describe something is very much the problem. That is exactly why people hate them. You can't just read one and know what it does, nor expect that of your colleagues. Every time I add a RegEx to the code, I have to add an explanation of what it does. Just like in code: if it needs comments to be readable, something's very wrong.
Jeroen_R wrote:
if it needs comments to be readable, something's very wrong.
That's true in the general case, but the exceptions are legion. I'd argue regex is one of those exceptions. There is simply not a more compact and efficient way to tokenize or match flat text than DFA regular expressions. Sure you could try writing it out, but I have. Try your hand at creating a more verbose language than regex and see what it looks like. I wrote one that allows you to describe regular expressions using a BNF notation variant. The result required too much typing, let's put it that way, even though BNF is fairly compact as well. If there's a better mousetrap, nobody seems to have found it yet.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
That's why you keep your tool to yourself. :laugh:
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
At least the non-backtracking subset. DFA regular expressions. - they are a compact way to describe a simple syntax - they are plain text and brief, easily communicatable and transferable - they are cross platform (at least DFA), running in most any engine - they are incredibly efficient (again, DFA) - they are versatile, able to do validation, tokenization, and matching as well That's probably why they will always be with us. They are maybe the perfect canonical execution of a Chomsky type 3 language. Sure, they can be really terse, but this is as much a strength as it is a weakness, because it facilitates some of the above. I know some people hate them, and I can understand that. But show me a better way.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
This is the most runaway thread on CP in a long while. Way to poke the bear.
-
At least the non-backtracking subset. DFA regular expressions. - they are a compact way to describe a simple syntax - they are plain text and brief, easily communicatable and transferable - they are cross platform (at least DFA), running in most any engine - they are incredibly efficient (again, DFA) - they are versatile, able to do validation, tokenization, and matching as well That's probably why they will always be with us. They are maybe the perfect canonical execution of a Chomsky type 3 language. Sure, they can be really terse, but this is as much a strength as it is a weakness, because it facilitates some of the above. I know some people hate them, and I can understand that. But show me a better way.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I blank out on regular expressions. Too much like learning a new language. The other day, I needed to get 0 or more leading characters from a string (as an int). I Googled (regex), I went, I left. Coding challenge: get leading digits (I settled for LINQ)
var text = "123rd NY 2nd Battalion".
//
var count = text.TakeWhile( c => Char.IsDigit( c ) ).Count();
int i = count > 0 ? ConvertToInt32( text.SubString( 0, count) ) : 0;Answer: 123
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
-
It's because I slept. :) But yeah, I get bit overwhelmed when I get a lot of responses, so I kind of respond as I'm able.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I didn’t mean to imply you didn’t participate in the conversation, rather that the topic usually starts up a lively discussion. :cool:
Time is the differentiation of eternity devised by man to measure the passage of human events. - Manly P. Hall Mark Just another cog in the wheel
-
I blank out on regular expressions. Too much like learning a new language. The other day, I needed to get 0 or more leading characters from a string (as an int). I Googled (regex), I went, I left. Coding challenge: get leading digits (I settled for LINQ)
var text = "123rd NY 2nd Battalion".
//
var count = text.TakeWhile( c => Char.IsDigit( c ) ).Count();
int i = count > 0 ? ConvertToInt32( text.SubString( 0, count) ) : 0;Answer: 123
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
^[0-9]+
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
At least the non-backtracking subset. DFA regular expressions. - they are a compact way to describe a simple syntax - they are plain text and brief, easily communicatable and transferable - they are cross platform (at least DFA), running in most any engine - they are incredibly efficient (again, DFA) - they are versatile, able to do validation, tokenization, and matching as well That's probably why they will always be with us. They are maybe the perfect canonical execution of a Chomsky type 3 language. Sure, they can be really terse, but this is as much a strength as it is a weakness, because it facilitates some of the above. I know some people hate them, and I can understand that. But show me a better way.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
i enjoy the challenge of learning / writing regular expressions for my text editing of source code . as in all things mastering it is the same as being invited to perform at Carnegie Hall id est "practice practice practice" . my purpose in this post though is to express my impression from these many and expert posts utilizing fancy schmancy terms of which i do not know which seem to indicate regular expressions are more powerful / advanced / sophisticated than i know as i understand them to be nothing more than a text editing convenience . may i please inquire am i wrong in this regard . thank you kindly .
-
i enjoy the challenge of learning / writing regular expressions for my text editing of source code . as in all things mastering it is the same as being invited to perform at Carnegie Hall id est "practice practice practice" . my purpose in this post though is to express my impression from these many and expert posts utilizing fancy schmancy terms of which i do not know which seem to indicate regular expressions are more powerful / advanced / sophisticated than i know as i understand them to be nothing more than a text editing convenience . may i please inquire am i wrong in this regard . thank you kindly .
They're for text processing, but for more than text editing. The C# compiler for example, almost certainly uses a tokenizer built up of regular expressions. That said, you're basically not wrong. I mean, tokenization is almost like text matching, but with a twist.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Joking aside, more trivially I believe the term "regular" refers to the third level of Chomsky's hierarchy, which, precisely, is defined as Type3-Regular. DFA (Deterministic Finite Automaton) are FSA (Finite State Automaton).
Can confirm.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
A modest proposal: Learn the DFA subset. Commit it to memory, and forget the rest. DFA is the non-backtracking subset of regular expressions () - capture and group [] - match char ranges * - match zero or more + - match one or more ? - match zero or one . - match any single character | - match a or b (a|b)
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I used ChatGPT precisely for that and it returned a decent regex with an explanation. I needed to word my question in a manner that was generic but the result was actually helpful.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
-
Not keeping your tool to yourself is one of the leading causes of dismissal, too.
"A little song, a little dance, a little seltzer down your pants" Chuckles the clown
-
How the hell would someone know that "[0-9]{1,3}" is enough to find sequential numbers in Microsoft Word? How the hell did I find that magic? Every regular expression seems to require a convoluted google search. Crazy world...
Our Forgotten Astronomy | Object Oriented Programming with C++ | Wordle solver
-
A modest proposal: Learn the DFA subset. Commit it to memory, and forget the rest. DFA is the non-backtracking subset of regular expressions () - capture and group [] - match char ranges * - match zero or more + - match one or more ? - match zero or one . - match any single character | - match a or b (a|b)
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
cool. forgot chatgpt is hanging out there
"A little time, a little trouble, your better day" Badfinger
I had exactly the dilemna you had - I was looking for a reverse regular expression parser i.e. create a regular expression from a desired result and an initial string and I thought I would give ChatGPT a go.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
-
would not ? match zero or one not be part of that? or is that a posix extension?
"A little song, a little dance, a little seltzer down your pants" Chuckles the clown
You're right. Totally spaced that I edited.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
jschell wrote:
Not sure what you mean by that.
What I mean is that regardless of the platform you choose, there is a way to run a DFA regular expression on it. And yeah, that encompasses many different engines, which themselves are what run on a particular platform, unless you're doing code generation, which I do sometimes for them so I don't have to include the regex engine in my firmware. That code is easy to make cross platform. You'd almost have to put in extra effort to make it otherwise. :) I was maybe trying to be too brief by half. I assumed the meaning would come through, but I guess not.
jschell wrote:
Perhaps you were referring to that the simplest syntax works in different engines though
In part yes, but also, virtually every platform has a DFA regex engine for it, or alternatively you can generate DFA code for that platform, with something such as my rxcg project. I was intending to imply that as well.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
honey the codewitch wrote:
What I mean is that regardless of the platform you choose, there is a way to run a DFA regular expression on it.
If you find a programming system that doesn't allow that then you might prepare for the universe to end. Far as I can recall it would be mathematically impossible for that not to be true.
-
I recall this too. Qwerty I heard was in part laid out to slow typists down so the mechanical typewriter could keep up. I heard it from the Beagle Bros back in the 1980s so I don't know how true it is. Either way, presumably eventually that wasn't an issue anymore. And Devorak was a common alternative, or at least common as qwerty alternatives go. It was touted as better, but despite the hype I remember reading that it didn't actually improve people's WPM.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
honey the codewitch wrote:
out to slow typists down so the mechanical typewriter could keep up
Not exactly. Rather the layout exists to allow typing to be faster. QWERTY - Wikipedia[^] "but rather to speed up typing. Indeed, there is evidence that, aside from the issue of jamming, placing often-used keys farther apart increases typing speed, because it encourages alternation between the hands."
honey the codewitch wrote:
but despite the hype
I believe that was the one where the data was faked.
-
honey the codewitch wrote:
What I mean is that regardless of the platform you choose, there is a way to run a DFA regular expression on it.
If you find a programming system that doesn't allow that then you might prepare for the universe to end. Far as I can recall it would be mathematically impossible for that not to be true.
jschell wrote:
If you find a programming system that doesn't allow that then you might prepare for the universe to end
Maybe I misunderstand you, but if you're speaking in the general sense, you aren't going to run a garbage collected system for example, on an 8-bit platform with 4KB of RAM, hopefully. Even if you could, it wouldn't be practical for anything. A DFA on the other hand will run handily there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
jschell wrote:
If you find a programming system that doesn't allow that then you might prepare for the universe to end
Maybe I misunderstand you, but if you're speaking in the general sense, you aren't going to run a garbage collected system for example, on an 8-bit platform with 4KB of RAM, hopefully. Even if you could, it wouldn't be practical for anything. A DFA on the other hand will run handily there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix