Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. parsing interactive text

parsing interactive text

Scheduled Pinned Locked Moved C / C++ / MFC
json
6 Posts 4 Posters 9 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Lost User
    wrote on last edited by
    #1

    REMOVED

    Mircea NeacsuM T L 3 Replies Last reply
    0
    • L Lost User

      REMOVED

      Mircea NeacsuM Offline
      Mircea NeacsuM Offline
      Mircea Neacsu
      wrote on last edited by
      #2

      Member 14968771 wrote:

      what to ask Mrs Google to help me write a C++ code to

      Try: "How to learn C++". Give me about 196,000,000 results

      Mircea

      1 Reply Last reply
      0
      • L Lost User

        REMOVED

        T Online
        T Online
        trønderen
        wrote on last edited by
        #3

        Member 14968771 wrote:

        is "tokenization" a good search word ??

        Fair enough, but that is only the very beginning. A precursor to parsing. Tokenization is the chopping into atomic pieces of the input text, with no concern for how they are put together. All the tokenizer knows is how to delimit a symbol (token): That a word symbol start with a alphabetic and continues through alphanumerics but ends at the first non-alphanumeric - the tokenizer doesn't know or care whether the word is a variable name, a reserved word or something else. If it finds a digit, it devours digits. If the first non-digit is a math operator or a space, it has found an integer token. If it is a decimal point or an E (and the language permits exponents in literals), the token is a (yet incomplete) float value, and so on. The only language specific thing that the tokenizer needs to know is how to identify the end of a token. Once it has chopped the source code into pieces, its job is done. Parsing is identifying the structures formed by the tokens. Identifying block, loops, conditional statements etc. The borderline isn't necessarily razor sharp. Some would say that when the tokenizer finds an integer literal token, it might as well take the the task of converting it to a binary numeric token value, to be handed to the parser. That might be unsuitable in untyped languages where a numeric literal may be treated as a string. After identifying a word symbol, it might search a table of reserved words, possibly delivering it to the parser as a reserved word token. Again, in some languages this is unsuitable (and lots of people would say it goes far beyond a tokenizer's responsibility). If you want to analyze some input, doing an initial tokenization before starting the actual parsing is a good idea. Most compilers do that. Curious memory: One of my fellow students was in his first job after graduation set to identify bacteria in microscope photos. That was done by parsing: They had BNF grammars for different kinds of bacteria, and the image information was parsed according to the various grammars. If the number of parsing errors was too high, the verdict was 'Nope - it surely isn't that kind of bacteria, let me try another one!' Those grammars with a low error count was handed over to a human expert for confirmation, or possibly making a choice between viable alternatives, if two or more grammars gave a low error count. This mechanism took a lot of trivial work off t

        P L 2 Replies Last reply
        0
        • T trønderen

          Member 14968771 wrote:

          is "tokenization" a good search word ??

          Fair enough, but that is only the very beginning. A precursor to parsing. Tokenization is the chopping into atomic pieces of the input text, with no concern for how they are put together. All the tokenizer knows is how to delimit a symbol (token): That a word symbol start with a alphabetic and continues through alphanumerics but ends at the first non-alphanumeric - the tokenizer doesn't know or care whether the word is a variable name, a reserved word or something else. If it finds a digit, it devours digits. If the first non-digit is a math operator or a space, it has found an integer token. If it is a decimal point or an E (and the language permits exponents in literals), the token is a (yet incomplete) float value, and so on. The only language specific thing that the tokenizer needs to know is how to identify the end of a token. Once it has chopped the source code into pieces, its job is done. Parsing is identifying the structures formed by the tokens. Identifying block, loops, conditional statements etc. The borderline isn't necessarily razor sharp. Some would say that when the tokenizer finds an integer literal token, it might as well take the the task of converting it to a binary numeric token value, to be handed to the parser. That might be unsuitable in untyped languages where a numeric literal may be treated as a string. After identifying a word symbol, it might search a table of reserved words, possibly delivering it to the parser as a reserved word token. Again, in some languages this is unsuitable (and lots of people would say it goes far beyond a tokenizer's responsibility). If you want to analyze some input, doing an initial tokenization before starting the actual parsing is a good idea. Most compilers do that. Curious memory: One of my fellow students was in his first job after graduation set to identify bacteria in microscope photos. That was done by parsing: They had BNF grammars for different kinds of bacteria, and the image information was parsed according to the various grammars. If the number of parsing errors was too high, the verdict was 'Nope - it surely isn't that kind of bacteria, let me try another one!' Those grammars with a low error count was handed over to a human expert for confirmation, or possibly making a choice between viable alternatives, if two or more grammars gave a low error count. This mechanism took a lot of trivial work off t

          P Offline
          P Offline
          Peter_in_2780
          wrote on last edited by
          #4

          trønderen wrote:

          compiling bacteria!

          Is that making bugs from bacteria, or vice versa?

          Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

          1 Reply Last reply
          0
          • T trønderen

            Member 14968771 wrote:

            is "tokenization" a good search word ??

            Fair enough, but that is only the very beginning. A precursor to parsing. Tokenization is the chopping into atomic pieces of the input text, with no concern for how they are put together. All the tokenizer knows is how to delimit a symbol (token): That a word symbol start with a alphabetic and continues through alphanumerics but ends at the first non-alphanumeric - the tokenizer doesn't know or care whether the word is a variable name, a reserved word or something else. If it finds a digit, it devours digits. If the first non-digit is a math operator or a space, it has found an integer token. If it is a decimal point or an E (and the language permits exponents in literals), the token is a (yet incomplete) float value, and so on. The only language specific thing that the tokenizer needs to know is how to identify the end of a token. Once it has chopped the source code into pieces, its job is done. Parsing is identifying the structures formed by the tokens. Identifying block, loops, conditional statements etc. The borderline isn't necessarily razor sharp. Some would say that when the tokenizer finds an integer literal token, it might as well take the the task of converting it to a binary numeric token value, to be handed to the parser. That might be unsuitable in untyped languages where a numeric literal may be treated as a string. After identifying a word symbol, it might search a table of reserved words, possibly delivering it to the parser as a reserved word token. Again, in some languages this is unsuitable (and lots of people would say it goes far beyond a tokenizer's responsibility). If you want to analyze some input, doing an initial tokenization before starting the actual parsing is a good idea. Most compilers do that. Curious memory: One of my fellow students was in his first job after graduation set to identify bacteria in microscope photos. That was done by parsing: They had BNF grammars for different kinds of bacteria, and the image information was parsed according to the various grammars. If the number of parsing errors was too high, the verdict was 'Nope - it surely isn't that kind of bacteria, let me try another one!' Those grammars with a low error count was handed over to a human expert for confirmation, or possibly making a choice between viable alternatives, if two or more grammars gave a low error count. This mechanism took a lot of trivial work off t

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            Thank you very much for such extensive replay. Very unexpected , considering the other "clowns contributions " . I hope they, the other replies, are not an indicators of this site turning into social media... I have started my coding and it looks as I have to parse out non ascii alphanumeric characters first.

            1 Reply Last reply
            0
            • L Lost User

              REMOVED

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              First hit from: C++ string split - Google Search[^] : parsing - Parse (split) a string in C++ using string delimiter (standard C++) - Stack Overflow[^]

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups