Finite State Transducers?

Mircea Neacsu

In semi-formal terms, to the quintuple that defines a FSA (alphabet, states, initial states, final states, transitions), you add one more, the output alphabet. A hypothetical example: you are a librarian searching for a book in a labyrinthine library (think Umberto Eco, "The Name of the Rose"). As you go, you read fragments from the books you find and those direct you to other books in other rooms, until you find the book you want. So far you have a standard FSA that transitions you from state to state until you end in the final state. Now you want to be a clever librarian that will not search again for the same book and you take a piece of paper on which you write down the room numbers. At the end of your search the notes on your paper are the word in the output alphabet that describe the route to your book and you have a FST. Chances are that you've made many FSTs without calling them as such. At least in electronics, it is rare to have a FSA evolve from state to state without producing any output. The set of signals that you triggered during transitions are the output tape of a FST.

Mircea

jmaida

Here is my version of finite state transducer as I interpret that expression ( I hope to not confuse ) This is an actual code snippet for a command line processor I wrote many moons ago. Alphabet: 1 is code for number 2 is code for string Initial state: User enters a ROT command (input) ROT PART_NAME x y z "Transducer" (in this case) converts string in a finite state associated with the name of a function to execute the command. Final states => transitions "ROT_2_1_1_1", DOJO_ROTATE_2_1_1_1,"ROTATE part x y z - apply euler angles(deg) to part", "ROT_2_X_1", DOJO_ROTATE_2_X_1, "ROTATE part axis deg - rotate part on X Y or Z", "ROT_2_Y_1", DOJO_ROTATE_2_Y_1, "ROTATE part axis deg - rotate part on X Y or Z", "ROT_2_Z_1", DOJO_ROTATE_2_Z_1, "ROTATE part axis deg - rotate part on X Y or Z",

"A little time, a little trouble, your better day" Badfinger

honey the codewitch

Thanks!

Chances are that you've made many FSTs without calling them as such. At least in electronics, it is rare to have a FSA evolve from state to state without producing any output. The set of signals that you triggered during transitions are the output tape of a FST.

Typically with my FSMs I do run a sort of output tape. However, the output only gets emitted on Accept. That usually means rather than outputting each state transition, I am only outputting on specially marked "accepting" states. It's at that point that I output everything I've gathered since q0. So in effect I am logging my state transitions (it's where I capture input) but I am holding off relaying them to the consumer until I can't move/find a valid transition anymore. At that point I yield the input I've received so far, which is not exactly (forensically speaking) a map of my transitions, but rather what the transitions gathered from the input tape, so close, but not quite the same thing. (I could modify it to record state transitions easily but I don't understand why I'd do that.) Can you explain to me how this is different than an FST? For example, you talk about how "the set of signals that you triggered during transitions are the output tape of a FST" - are you saying that I write out each transition? Currently I'm only using transitions as a recording device over an input tape.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Mircea Neacsu

Starting from the bottom:

Quote:

the set of signals that you triggered during transitions are the output tape of a FST" - are you saying that I write out each transition?

In some cases, yes. Here is a simple example: a FST that decodes a quadrature encoder[^]. The input alphabet consists of the states of the 2 inputs: 00, 01, 10, 11. The output alphabet is composed of the numbers 0 to 2N-1, where N is the number of bits in the counter. The states space is 0 to 2N-1 plus E, the error state. With the input 00 01 11, the final state is "2" and the output tape shows 0, 1, 2. From the same initial state, if input is 00 01 10, the output tape showa 0, 1 but the final state is E, the only non-accepting state so we scrap the output tape (or blow a whistle, call the fire brigade, etc.).

Quote:

However, the output only gets emitted on Accept.

This still qualifies as a FST. The transitions space of an FST is defined over (input alphabet and the empty string) X (output alphabet and the empty string) X (state space). In other words you are allowed to have transitions that don't produce any output and your model produces output only on the transition to an accepting state.

Quote:

At that point I yield the input I've received so far, which is not exactly (forensically speaking) a map of my transitions, but rather what the transitions gathered from the input tape, so close, but not quite the same thing. (I could modify it to record state transitions easily but I don't understand why I'd do that.)

You are doing right. There is no need to output every transition. Your output represents a word in the output alphabet which doesn't need to be the same as the states set. One of the references you provided in the first message, sent me to a pretty good description of FSTs: Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog[^] Take a look.

Mircea

honey the codewitch

Thanks so much! This is very helpful.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Mircea Neacsu

One is glad to be of service :)

Mircea

honey the codewitch

It's interesting because you basically suggested to me (if I understand you), that a lexer/tokenizer is essentially an FST.

FA wspace = FA.Parse("[ \\t\\r\\n]+");
FA ident = FA.Parse("[A-Za-z_][A-Za-z0-9_]*");
FA num= FA.Parse(@"(0|-?([1-9][0-9]*))((\.[0-9]+[Ee]\-?[1-9][0-9]*)?|\.[0-9]+)");

FA lexer = FA.ToLexer(new FA[] { wspace, ident, num });
foreach(FAMatch match in lexer.Run("foo 100 bar baz )#($")) {
Console.Write("{0} ",match.SymbolId);
}

// produces
// 1 0 2 0 1 0 1 0 -1

Ergo, that qualifies as an FST, no?

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Mircea Neacsu

Quote:

It's interesting because you basically suggested to me (if I understand you), that a lexer/tokenizer is essentially an FST.

It's not only my opinion. According to Finite-state transducer - Wikipedia[^]:

Quote:

FSTs are used in the lexical analysis phase of compilers to associate semantic value with the discovered tokens.

As I was saying in a previous message, you were probably doing FSTs without knowing it :laugh:. In the immortal words of Moliere:

Quote:

My faith! For more than forty years I have been speaking prose while knowing nothing of it, and I am the most obliged person in the world to you for telling me so.

(translation from Wikipedia) ;P

Mircea

honey the codewitch

Ha that quote is perfect! Thanks again.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Sander Rossel

I recently learned that you say something like "inf, innit?", but "fine, aight!" These are almost the same words, but they contradict each other. It's a pair, so to say. Why are they pronounced so differently? :~ Another weird one like that, with very different meanings though, are tome and epitome. Only learned that relatively recently too :~ English is weird.

Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript