AI (Natural Language Processing) - How Advanced Is It Really?

Lost User

80% is about expected, you can get a bit more with more data. There is no other level (yet).

Lost User

Member 10415611 wrote:

Amongst other things, I want to identify the stock & figure out if the remarks were +ve, neutral or -ve (i.e. their "sentiment").

If the remarks are made by a human, you'll need a human to interpret it. A computer will not recognize sarcasm, hypotetical hypno hypothetical situations, or personal biases. You could just as well count all the times the word "buy" appears. Also, does the algorithm "know" if the article says "updated: 1900h, confirmed hoax"?

Member 10415611 wrote:

on a reserved set of another 150 sentences

Yes, but humans will generate unexpected sentences with weird opinions, and words that are "just invented" and "cool".

Member 10415611 wrote:

"The Stocks Are Doing Well" are still rated as neutral

Which in my head is neutral, as it is merely a statement about the current situation. In itself, the statement cannot be said to be positive nor negative, even from a traders' perspective. It would be great if you could expand the sentences to train toward todays headlines. The more variations the AI sees, the better it becomes. In theory at least, I'm not giving any guarantees.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

PIEBALDconsult

AI can't understand elephant.

Mark_Wallace

Are you using backprop or RNT?

I wanna be a eunuchs developer! Pass me a bread knife!

Member 10415611

Eddy, Thanks for your comments. What you say is definitely true regarding sarcasm, hypothetical situations, biases, etc. Similarly, there are many unexpected sentences, weird opinions and invented words. My data set is from about 10 different speakers with sentences randomly selected from the segments where they are specifically discussing a particular stock. Even though this is a fairly narrow domain it's amazing how many ways people come up with of saying the same thing. Also, they very seldom say "buy", "sell" or "hold" even if directly asked for their recommendation. They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation. I am not sure if training using a more general source would help. The original "model" provided with the Stanford NLP API is based on film reviews & it gave only about 52% accuracy on my original training set. It's training data contains a lot more general phrases as well as film related ones. I might try combining my set & theirs for an experimental training run & see if that helps. Ultimately, however, I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences rather than just applying a kind of matrix of +ve/-ve scores for words and phrases in the sentence. The "sentiment" analysis is only a part of my code, another part is "rule based" from looking at keywords and the structure of questions/answers. I hope by combining these two things that I can get a bit further.

Lost User

Member 10415611 wrote:

They will usually say about 5 or 6 sentences which I, as a human, can interpret as a veiled buy/sell/hold recommendation.

That's your basic human, not willing to commit to anything and giving vague descriptions instead of a simple "42" with complete specs.

Member 10415611 wrote:

I think that what's needed is another "layer" of intelligence that actually puts things together to "understand" the sentences

"Understanding" would be a holy grail like achievement. Perhaps you don't need complete understanding of the language - if you can identify the sentiment more correctly than simple statistics can, then you'd have an advantage over those who can not. And perhaps it would be helpful to combine those ideas, since even sarcasm follows a pattern* that humans must be able to recognize. *) in a single language the syntax should be predictable

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

Lost User

:laugh: ..and a good example too.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

Member 10415611

Mark, The Stanford system uses a "Recursive Neural Tensor Network" to train its sentiment model. The sentences are first parsed and processed into a set of "binary trees" with sentiment scores (0, 1, 2 for -ve, neutral & +ve in my case) attached to each word and phrase. I'm afraid I'm not an expert in the theory of NLP so not sure how that fits into "backprop or RNT." I've learnt a bit about the overall field of NLP & a fair bit about the Stanford approach with the hope of creating my application without making some dumb error but that's as far as it goes.

Mark_Wallace

Hmm. If they want predictions, they should go with backpropagation, rather than RNT(N). RNT isn't quite as "intelligent" (which, partly, means that you can understand its decisions, because it's more instance-tree population, ergo a percentage player, than a genuinely "intelligent" solution). Backprop seems ideally suited to the problem you're working on, but I suppose it's not someone's "pet concept" at the moment. That's universities, for you.

I wanna be a eunuchs developer! Pass me a bread knife!

Joe Woodbury

A friend of mine is a computational linguist. He says that at best it's gotten to about 85%, but that's with well structured source. What surprises me is that even with highly specific data, getting it better gets very complicated (though even in the highly specific stuff he recently worked on, 50% accuracy saves so much time that even that level of accuracy is worth it.)

Member 10415611

Interesting. I'll have to look into backprop.

Member 10415611

Joe, Interesting. What do you mean when you say:

Joe Woodbury wrote:

50% accuracy saves so much time that even that level of accuracy is worth it.)

Joe Woodbury

If you're using natural language processing to assist in some task which is completed by humans, getting 50% completely right could save those humans a tremendous amount of time. Further, getting 85% accuracy with, say, even a 10% error rate may actually cause the humans to take even more time than if they hadn't used the computer program in the first place.

dandy72

It's pretty hard to convince the general public that AI's making any sort of headway considering how easy it still is for something that should be as simple as Windows Update to completely mess up their computer. (I realize this is apples and oranges...but you go ahead and explain it to those not in the field...)

PIEBALDconsult

in a single language the syntax should be predictable the syntax in a single language should be predictable the syntax should be predictable in a single language predictable the syntax in a single language should be predictable in a single language the syntax should be predictable in a single language should be the syntax predictable in a single language should the syntax be I'm sure there are more, but now I'm bored.