Pattern matching in trees, lists and strings alike
-
There are many programming languages that do string pattern matching, either using regular expressions or using more elegant and powerful mechanisms, like SNOBOL[^]. There are also many programming languages that do tree pattern matching, like Haskell, but you cannot easily write a pattern that attempts to match an element in a list (a right descending tree), unless it is the first element. Many tasks involve complex, irregular and not very large structured data. The most appropriate programming languages for tackling such tasks should have this kind of pattern matching in their tool box. Examples of application domains for such a language:
- HTML/XML validation and correction
- C++ source code analysis
- Analysis of email chains
- Automatic chaining of tools in a workflow
- Transformation from syntax to semantics in natural language processing
- Computer algebra
Starting in the eighties, I have made and maintained such a language, Bracmat (GitHub[^],Rosetta Code[^]), and I have used it for all purposes listed above. Problem is, it is not an established programming language and therefore not the right language for production software, so I'm looking for a replacement. I hope that I am too pessimistic and that there are in fact programming languages with seizable user communities that support the pattern matching facilities that I am looking for. The programming language that I am looking for should have a pattern matching mechanism that:
- allows wild cards not only in the tail part of a pattern, but anywhere in a pattern,
- in the case of matching against a list, is capable of matching any number of consecutive elements anywhere in the list, and not just the 'head' of the list,
- is visually clean, attractive and readable,
- allows 'normal' instructions to be embedded in patterns (evaluation of variables, writing to files, function calls, other pattern match operations, etc.),
- can assign a matched part of the subject to a variable,
- c
-
There are many programming languages that do string pattern matching, either using regular expressions or using more elegant and powerful mechanisms, like SNOBOL[^]. There are also many programming languages that do tree pattern matching, like Haskell, but you cannot easily write a pattern that attempts to match an element in a list (a right descending tree), unless it is the first element. Many tasks involve complex, irregular and not very large structured data. The most appropriate programming languages for tackling such tasks should have this kind of pattern matching in their tool box. Examples of application domains for such a language:
- HTML/XML validation and correction
- C++ source code analysis
- Analysis of email chains
- Automatic chaining of tools in a workflow
- Transformation from syntax to semantics in natural language processing
- Computer algebra
Starting in the eighties, I have made and maintained such a language, Bracmat (GitHub[^],Rosetta Code[^]), and I have used it for all purposes listed above. Problem is, it is not an established programming language and therefore not the right language for production software, so I'm looking for a replacement. I hope that I am too pessimistic and that there are in fact programming languages with seizable user communities that support the pattern matching facilities that I am looking for. The programming language that I am looking for should have a pattern matching mechanism that:
- allows wild cards not only in the tail part of a pattern, but anywhere in a pattern,
- in the case of matching against a list, is capable of matching any number of consecutive elements anywhere in the list, and not just the 'head' of the list,
- is visually clean, attractive and readable,
- allows 'normal' instructions to be embedded in patterns (evaluation of variables, writing to files, function calls, other pattern match operations, etc.),
- can assign a matched part of the subject to a variable,
- c
Perhaps I'm being facetious, but shouldn't you at least study parsers/compilers? flex/yacc/bison compilerdesigndetails.blogspot.co.uk/2012/02/compiler-design.html
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan That's what machines are for. Got a problem? Sleep on it.
-
Perhaps I'm being facetious, but shouldn't you at least study parsers/compilers? flex/yacc/bison compilerdesigndetails.blogspot.co.uk/2012/02/compiler-design.html
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan That's what machines are for. Got a problem? Sleep on it.
Indeed. Bracmat would have been a better programming language if I had been a computer scientist. For software that is going to be maintained by others, Bracmat is currently not a solution. That is why I am looking for a community driven programming language that has pattern matching (on strings, lists, trees, see the list above) in its tool box. Clarify your answer if I misunderstood you.
Bart Jongejan
-
Indeed. Bracmat would have been a better programming language if I had been a computer scientist. For software that is going to be maintained by others, Bracmat is currently not a solution. That is why I am looking for a community driven programming language that has pattern matching (on strings, lists, trees, see the list above) in its tool box. Clarify your answer if I misunderstood you.
Bart Jongejan
If you posted in Algorithms then it gave the wrong impression that you are looking for help coding a solution of your own. Try Perl ..www.perl.org Also have a look at Scala which has a very small syntax to learn, and is suitable for creating a Domain Specific Language of your own. scala-lang.org
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan That's what machines are for. Got a problem? Sleep on it.
-
If you posted in Algorithms then it gave the wrong impression that you are looking for help coding a solution of your own. Try Perl ..www.perl.org Also have a look at Scala which has a very small syntax to learn, and is suitable for creating a Domain Specific Language of your own. scala-lang.org
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan That's what machines are for. Got a problem? Sleep on it.
Neither Perl or Scala can solve this simple problem with a simple pattern: Given a list of pairs, each pair consisting of a first name and a family name, give me two names of different persons that have the same family name. I think [Tom](<a href=)[^] (a cousin of Scala, I think) can do it, but the syntax of patterns in that language is very verbose. In Bracmat, you would do:
(Abel.Hirst) (Benjamin.Foster) (Letty.Johnson) (George.Hanson) (Chris.Johnson) (Priscilla.Stein):?list;
!list:? (?name1.?familyName) ? (?name2.!familyName) ? & out$(!name1 and !name2 "have the same family name (" !familyName ")") | out$"No two persons have the same family name";
which outputsLetty and Chris have the same family name ( Johnson )
The pattern is just? (?name1.?familyName) ? (?name2.!familyName) ?
. Scala only allows a wild card in the tail of a pattern. Perl 6 seems to have modernized its pattern matching, but the subject is still a string, never a list. Correct me if I'm wrong. For a perhaps more convincing, but longer, example, see Dinesman's multiple-dwelling problem[^]. -
Neither Perl or Scala can solve this simple problem with a simple pattern: Given a list of pairs, each pair consisting of a first name and a family name, give me two names of different persons that have the same family name. I think [Tom](<a href=)[^] (a cousin of Scala, I think) can do it, but the syntax of patterns in that language is very verbose. In Bracmat, you would do:
(Abel.Hirst) (Benjamin.Foster) (Letty.Johnson) (George.Hanson) (Chris.Johnson) (Priscilla.Stein):?list;
!list:? (?name1.?familyName) ? (?name2.!familyName) ? & out$(!name1 and !name2 "have the same family name (" !familyName ")") | out$"No two persons have the same family name";
which outputsLetty and Chris have the same family name ( Johnson )
The pattern is just? (?name1.?familyName) ? (?name2.!familyName) ?
. Scala only allows a wild card in the tail of a pattern. Perl 6 seems to have modernized its pattern matching, but the subject is still a string, never a list. Correct me if I'm wrong. For a perhaps more convincing, but longer, example, see Dinesman's multiple-dwelling problem[^].You can do anything in C++. your example can be done as this:
std::string mystdarr[][2] = {{"Abel","Hirst"},{"Benjamin","Foster"},{"Letty","Johnson"},{"George","Hanson"},{"Chris","Johnson"},{"Priscilla","Stein"}, {"",""}};
int arrsize = sizeof(mystdarr)/sizeof(mystdarr[0]);
for (auto it=0; it < arrsize - 1; ++it) {
auto itFound = std::find_if(&mystdarr[it+1], &mystdarr[arrsize-1], [&](const std::string (&cmp)[2]) { return !cmp[1].compare(mystdarr[it][1]); });
if (itFound != (std::string (*)[2])(mystdarr[arrsize-1]))
std::cout << mystdarr[it][0].c_str() << " and " << (*itFound)[0].c_str() << " have the same family name (" << (*itFound)[1].c_str() << ")\n";
}Or, if you'd like less cryptic code (it's cryptic because built-in arrays in C/C++ are by no means flexible), you can do this:
struct SS
{
std::string firstName;
std::string lastName;
SS(const std::string& fname, const std::string& lname) : firstName(fname), lastName(lname) {};
};
std::vector myarr;
myarr.push_back(SS("Abel", "Hirst"));
myarr.push_back(SS("Benjamin", "Foster"));
myarr.push_back(SS("Letty", "Johnson"));
myarr.push_back(SS("George", "Hanson"));
myarr.push_back(SS("Chris", "Johnson"));
myarr.push_back(SS("Priscilla", "Stein"));
for (auto it=myarr.begin(); it!=myarr.end(); ++it)
{
auto itFound = std::find_if(it + 1, myarr.end(), [&](const SS& cmp) {return !cmp.lastName.compare(it->lastName);});
if (itFound != myarr.end())
std::cout << it->firstName.c_str() << " and " << itFound->firstName.c_str() << " have the same family name (" << it->lastName.c_str() << ")\n";
}