Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. AI (Natural Language Processing) - How Advanced Is It Really?

AI (Natural Language Processing) - How Advanced Is It Really?

Scheduled Pinned Locked Moved The Lounge
csharptoolsbusinessjson
17 Posts 7 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Member 10415611

    Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

    L Offline
    L Offline
    Lost User
    wrote on last edited by
    #3

    80% is about expected, you can get a bit more with more data. There is no other level (yet).

    1 Reply Last reply
    0
    • M Member 10415611

      Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #4

      Member 10415611 wrote:

      Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment").

      If the remarks are made by a human, you'll need a human to interpret it. A computer will not recognize sarcasm, hypotetical hypno hypothetical situations, or personal biases. You could just as well count all the times the word "buy" appears. Also, does the algorithm "know" if the article says "updated: 1900h, confirmed hoax"?

      Member 10415611 wrote:

      on a reserved set of another 150 sentences

      Yes, but humans will generate unexpected sentences with weird opinions, and words that are "just invented" and "cool".

      Member 10415611 wrote:

      "The Stocks Are Doing Well" are still rated as neutral

      Which in my head is neutral, as it is merely a statement about the current situation. In itself, the statement cannot be said to be positive nor negative, even from a traders' perspective. It would be great if you could expand the sentences to train toward todays headlines. The more variations the AI sees, the better it becomes. In theory at least, I'm not giving any guarantees.

      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

      M 1 Reply Last reply
      0
      • M Member 10415611

        Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #5

        AI can't understand elephant.

        L 1 Reply Last reply
        0
        • M Member 10415611

          Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

          M Offline
          M Offline
          Mark_Wallace
          wrote on last edited by
          #6

          Are you using backprop or RNT?

          I wanna be a eunuchs developer! Pass me a bread knife!

          M 1 Reply Last reply
          0
          • L Lost User

            Member 10415611 wrote:

            Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment").

            If the remarks are made by a human, you'll need a human to interpret it. A computer will not recognize sarcasm, hypotetical hypno hypothetical situations, or personal biases. You could just as well count all the times the word "buy" appears. Also, does the algorithm "know" if the article says "updated: 1900h, confirmed hoax"?

            Member 10415611 wrote:

            on a reserved set of another 150 sentences

            Yes, but humans will generate unexpected sentences with weird opinions, and words that are "just invented" and "cool".

            Member 10415611 wrote:

            "The Stocks Are Doing Well" are still rated as neutral

            Which in my head is neutral, as it is merely a statement about the current situation. In itself, the statement cannot be said to be positive nor negative, even from a traders' perspective. It would be great if you could expand the sentences to train toward todays headlines. The more variations the AI sees, the better it becomes. In theory at least, I'm not giving any guarantees.

            Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

            M Offline
            M Offline
            Member 10415611
            wrote on last edited by
            #7

            Eddy, Thanks for your comments. What you say is definitely true regarding sarcasm, hypothetical situations, biases, etc. Similarly, there are many unexpected sentences, weird opinions and invented words. My data set is from about 10 different speakers with sentences randomly selected from the segments where they are specifically discussing a particular stock. Even though this is a fairly narrow domain it's amazing how many ways people come up with of saying the same thing. Also, they very seldom say "buy", "sell" or "hold" even if directly asked for their recommendation. They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation. I am not sure if training using a more general source would help. The original "model" provided with the Stanford NLP API is based on film reviews & it gave only about 52% accuracy on my original training set. It's training data contains a lot more general phrases as well as film related ones. I might try combining my set & theirs for an experimental training run & see if that helps. Ultimately, however, I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences rather than just applying a kind of matrix of +ve/-ve scores for words and phrases in the sentence. The "sentiment" analysis is only a part of my code, another part is "rule based" from looking at keywords and the structure of questions/answers. I hope by combining these two things that I can get a bit further.

            L 1 Reply Last reply
            0
            • M Member 10415611

              Eddy, Thanks for your comments. What you say is definitely true regarding sarcasm, hypothetical situations, biases, etc. Similarly, there are many unexpected sentences, weird opinions and invented words. My data set is from about 10 different speakers with sentences randomly selected from the segments where they are specifically discussing a particular stock. Even though this is a fairly narrow domain it's amazing how many ways people come up with of saying the same thing. Also, they very seldom say "buy", "sell" or "hold" even if directly asked for their recommendation. They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation. I am not sure if training using a more general source would help. The original "model" provided with the Stanford NLP API is based on film reviews & it gave only about 52% accuracy on my original training set. It's training data contains a lot more general phrases as well as film related ones. I might try combining my set & theirs for an experimental training run & see if that helps. Ultimately, however, I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences rather than just applying a kind of matrix of +ve/-ve scores for words and phrases in the sentence. The "sentiment" analysis is only a part of my code, another part is "rule based" from looking at keywords and the structure of questions/answers. I hope by combining these two things that I can get a bit further.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #8

              Member 10415611 wrote:

              They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation.

              That's your basic human, not willing to commit to anything and giving vague descriptions instead of a simple "42" with complete specs.

              Member 10415611 wrote:

              I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences

              "Understanding" would be a holy grail like achievement. Perhaps you don't need complete understanding of the language - if you can identify the sentiment more correctly than simple statistics can, then you'd have an advantage over those who can not. And perhaps it would be helpful to combine those ideas, since even sarcasm follows a pattern* that humans must be able to recognize. *) in a single language the syntax should be predictable

              Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

              P 1 Reply Last reply
              0
              • P PIEBALDconsult

                AI can't understand elephant.

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #9

                :laugh: ..and a good example too.

                Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

                1 Reply Last reply
                0
                • M Mark_Wallace

                  Are you using backprop or RNT?

                  I wanna be a eunuchs developer! Pass me a bread knife!

                  M Offline
                  M Offline
                  Member 10415611
                  wrote on last edited by
                  #10

                  Mark, The Stanford system uses a "Recursive Neural Tensor Network" to train its sentiment model. The sentences are first parsed and processed into a set of "binary trees" with sentiment scores (0, 1, 2 for -ve, neutral & +ve in my case) attached to each word and phrase. I'm afraid I'm not an expert in the theory of NLP so not sure how that fits into "backprop or RNT." I've learnt a bit about the overall field of NLP & a fair bit about the Stanford approach with the hope of creating my application without making some dumb error but that's as far as it goes.

                  M 1 Reply Last reply
                  0
                  • M Member 10415611

                    Mark, The Stanford system uses a "Recursive Neural Tensor Network" to train its sentiment model. The sentences are first parsed and processed into a set of "binary trees" with sentiment scores (0, 1, 2 for -ve, neutral & +ve in my case) attached to each word and phrase. I'm afraid I'm not an expert in the theory of NLP so not sure how that fits into "backprop or RNT." I've learnt a bit about the overall field of NLP & a fair bit about the Stanford approach with the hope of creating my application without making some dumb error but that's as far as it goes.

                    M Offline
                    M Offline
                    Mark_Wallace
                    wrote on last edited by
                    #11

                    Hmm. If they want predictions, they should go with backpropagation, rather than RNT(N). RNT isn't quite as "intelligent" (which, partly, means that you can understand its decisions, because it's more instance-tree population, ergo a percentage player, than a genuinely "intelligent" solution). Backprop seems ideally suited to the problem you're working on, but I suppose it's not someone's "pet concept" at the moment. That's universities, for you.

                    I wanna be a eunuchs developer! Pass me a bread knife!

                    M 1 Reply Last reply
                    0
                    • M Member 10415611

                      Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

                      J Offline
                      J Offline
                      Joe Woodbury
                      wrote on last edited by
                      #12

                      A friend of mine is a computational linguist. He says that at best it's gotten to about 85%, but that's with well structured source. What surprises me is that even with highly specific data, getting it better gets very complicated (though even in the highly specific stuff he recently worked on, 50% accuracy saves so much time that even that level of accuracy is worth it.)

                      M 1 Reply Last reply
                      0
                      • M Mark_Wallace

                        Hmm. If they want predictions, they should go with backpropagation, rather than RNT(N). RNT isn't quite as "intelligent" (which, partly, means that you can understand its decisions, because it's more instance-tree population, ergo a percentage player, than a genuinely "intelligent" solution). Backprop seems ideally suited to the problem you're working on, but I suppose it's not someone's "pet concept" at the moment. That's universities, for you.

                        I wanna be a eunuchs developer! Pass me a bread knife!

                        M Offline
                        M Offline
                        Member 10415611
                        wrote on last edited by
                        #13

                        Interesting. I'll have to look into backprop.

                        1 Reply Last reply
                        0
                        • J Joe Woodbury

                          A friend of mine is a computational linguist. He says that at best it's gotten to about 85%, but that's with well structured source. What surprises me is that even with highly specific data, getting it better gets very complicated (though even in the highly specific stuff he recently worked on, 50% accuracy saves so much time that even that level of accuracy is worth it.)

                          M Offline
                          M Offline
                          Member 10415611
                          wrote on last edited by
                          #14

                          Joe, Interesting. What do you mean when you say:

                          Joe Woodbury wrote:

                          50% accuracy saves so much time that even that level of accuracy is worth it.)

                          J 1 Reply Last reply
                          0
                          • M Member 10415611

                            Joe, Interesting. What do you mean when you say:

                            Joe Woodbury wrote:

                            50% accuracy saves so much time that even that level of accuracy is worth it.)

                            J Offline
                            J Offline
                            Joe Woodbury
                            wrote on last edited by
                            #15

                            If you're using natural language processing to assist in some task which is completed by humans, getting 50% completely right could save those humans a tremendous amount of time. Further, getting 85% accuracy with, say, even a 10% error rate may actually cause the humans to take even more time than if they hadn't used the computer program in the first place.

                            1 Reply Last reply
                            0
                            • M Member 10415611

                              Hi All, First post here. Hope this is a suitable subject. You've no doubt all noticed that AI is the flavour of the month on numerous web sites and media outlets but I'm wondering (based on my current personal experience discussed below) if it truly is as far advanced as the hype would have us believe. Recently I've been working on a personal project to attempt to summarize into a very simple form the remarks of stock analysts on specific stocks on a Canadian TV show. Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment"). I've been using NLP tools from Stanford & Microsoft Cognitive Services (MSC) as part of a C# .Net program. Using the Stanford sentiment API, I have trained a model on a set of about 900 plus sentences & about 3000 business word rated as + or - by other. After about 4 iterations I have about 90% accuracy on the training set & near 80% on a reserved set of another 150 sentences. Seems OK but, for example, "Stock's Up 180% This Year" & "The Stocks Are Doing Well" are still rated as neutral and "I'm Very Cautious On The Utility" is rated as positive! I haven't spent as much effort on the MSC sentiment tool but it did not do too well on a limited test. For example, "It's Underperforming The Market" gave a score of 0.82 which is very positive in the Stanford rating system. Overall, there does not seem to be that much "intelligence" at work in these classifiers despite words such as "deep learning" being mentioned. Sorry about the length of post required to provide context. Any thoughts on this topic appreciated. Am I expecting too much or am I missing something vital that take things to another level? Regards, RB

                              D Offline
                              D Offline
                              dandy72
                              wrote on last edited by
                              #16

                              It's pretty hard to convince the general public that AI's making any sort of headway considering how easy it still is for something that should be as simple as Windows Update to completely mess up their computer. (I realize this is apples and oranges...but you go ahead and explain it to those not in the field...)

                              1 Reply Last reply
                              0
                              • L Lost User

                                Member 10415611 wrote:

                                They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation.

                                That's your basic human, not willing to commit to anything and giving vague descriptions instead of a simple "42" with complete specs.

                                Member 10415611 wrote:

                                I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences

                                "Understanding" would be a holy grail like achievement. Perhaps you don't need complete understanding of the language - if you can identify the sentiment more correctly than simple statistics can, then you'd have an advantage over those who can not. And perhaps it would be helpful to combine those ideas, since even sarcasm follows a pattern* that humans must be able to recognize. *) in a single language the syntax should be predictable

                                Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

                                P Offline
                                P Offline
                                PIEBALDconsult
                                wrote on last edited by
                                #17

                                in a single language the syntax should be predictable the syntax in a single language should be predictable the syntax should be predictable in a single language predictable the syntax in a single language should be predictable in a single language the syntax should be predictable in a single language should be the syntax predictable in a single language should the syntax be I'm sure there are more, but now I'm bored.

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups