where to put the responsiblity + Scanner or Parser

Leblanc Meneses 0

I'm using a scanner to create my tokens and a parser to create the tokens into a meaningful AST. After a good start on my project, I noticed that if i made my scanner create generalized tokens my parser logic needed more work but if i create more specific tokens my parser logic was greatly reduced. anyone have a rule of thumb(s) when i should put the responsibility on the scanner or when it should be placed on the parser? -lm

Robert C Cartaino

I can only speak from a compiler-creation point of view but scanners (lexical analyzers) are generally pretty simple... much simpler than the parser-portion of the process. My "rule of thumb" is that the scanner/lexical analyzer (lexer) scans the input file, breaks it into tokens, and identifies the type of token. Done. It knows nothing about syntax. That is where the parser takes over. Maybe this is just a matter of semantics, but if your parser is taking up too much responsibility, rather than leaning on the scanner to provide more information, maybe you can break the parser down into more parts and divide the responsibility that way (again, speaking from a compiler point of view): 1. Scanner (Lexical Analysis) - Break your source code down into small tokens. 2. Parsing (Syntax Analysis) - Check for correct syntax and build your abstract syntax tree (AST). The parser checks strictly for syntactic correctness and stops there. 3. Tree Analysis (don't know the "real" name for this subtask) - Analyze and add information to your your syntax tree for semantic correctness (i.e. variables declared, initialize to default values, etc). 4. Optimization / Generation - What this is depends greatly on what your specific task is. Enjoy, Robert C. Cartaino

Leblanc Meneses 0

so how small should the scanner make the tokens? Example: i can create greedy tokens like this: DCode:= 'D'[0-9][0-9][0-9] GCode:= 'G'[0-9][0-9] MCode:= 'M'0[0-2] or specific tokens like this: GCodeG36 := 'G36' GCodeG37 := 'G37' after working some on my parser this morning i like the more specific expression better. Simplifies my parsing. my parser can just check: "is a type" matches &= scanner.Next() is DCodeD02Token vs "is a type with a value of this" matches &= scanner.Next() is DCodeToken && ((DCodeToken)scanner.Current).DCode == 2; I would like to see an example of tree analysis. might help me with my current problem of: http://www.codeplex.com/irony/Thread/View.aspx?ThreadId=35310[^]