Parsing user input
-
Maximilien wrote:
If you start doing that, there will always be outliers that you will miss.
Software development is a constant war with the universe... Developers trying to do better idiot-proof software and the universe trying to do even dumber users... So far the universe is winning
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
Nelek wrote:
Software development is a constant war with the universe... Developers trying to do better idiot-proof software and the universe trying to do even dumber users...
You made my day with this phrase!
-
Maximilien wrote:
If you start doing that, there will always be outliers that you will miss.
Software development is a constant war with the universe... Developers trying to do better idiot-proof software and the universe trying to do even dumber users... So far the universe is winning
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
The universe will always win.
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
I did an mailing list cleanup like this in the Jurassic era using dBase ][. I ended up trimming excess blanks, doing upper/lower case normalization and translation table lookup for common variants to translate. I don't remember how I identified exceptions back then, but now I'd use a dialog with options to add a option to manually correct, ignore (add to lookup as IGNORE string), add a translation record. Then there is the problem of dealing with addresses foreign to your country ... whew! Yup, this a problem to be managed, not solved, if unfiltered inputs are continuously added.
-
You really need to parse addresses ?If you start doing that, there will always be outliers that you will miss. :confused:
I'd rather be phishing!
Maximilien wrote:
You really need to parse addresses ?If you start doing that, there will always be outliers that you will miss.
Sadly yes. And outliers are acceptable as we're trying to fill in some form fields that break out address, PO Box, and Rural Routes, and if everything fails, the address just gets put into the Address1 field. We're aiming for improvement rather than perfection. :) Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
-
When we put our mail on vacation hold, it validates and 'normalizes' the address, so I do understand what you're working with. Where I grew up, our address was RR#1; it wasn't until I was in my teens that we had an address with a number and street name. So.. consider this.. are you only dealing with P.O. and its variants or do you have R.R. addresses as well?
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
My guess on the "K" is that some robot filled it in based on a record created via OCR. The United States Post office has a service you can use to "normalize" addresses. I suspect that each country has something similar. There is probably a service provider that aggregates all of these normalization services into one spot. (Amazon?)
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
Marc Clifton wrote:
The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'.
Optically Corrupted Recognition?
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, waging all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies. -- Sarah Hoyt
-
RR, CR, HC, etc., as well as regular street addresses (as best as those are). Perfect accuracy is not necessary, just best guess. :) Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
-
I smell OCR in the mix - hence the BOK, BOS, B0X, etc.
Software Zen:
delete this;
-
When we put our mail on vacation hold, it validates and 'normalizes' the address, so I do understand what you're working with. Where I grew up, our address was RR#1; it wasn't until I was in my teens that we had an address with a number and street name. So.. consider this.. are you only dealing with P.O. and its variants or do you have R.R. addresses as well?
Excellent point. Are there services that allow you to force user input validation of addresses against the USPS databases?
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
Your could fashion the UI to eliminate the need to parse P.O. Box... etc. Have a drop down that contains these options: Street #, P.O. Box, RR#, CR, HC, etc And to the right of it, place a text box that accepts the actual number. Just a thought off the top.
Cheers, Mike Fidler "I intend to live forever - so far, so good." Steven Wright "I almost had a psychic girlfriend but she left me before we met." Also Steven Wright "I'm addicted to placebos. I could quit, but it wouldn't matter." Steven Wright yet again.
-
welcome to my life
Woah... haven't seen you in a long time Chris. How's it going these days?
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
-
Woah... haven't seen you in a long time Chris. How's it going these days?
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
i'm here occasionally. not constantly, as previously. it goes... on and on and on and on. :)
-
Nelek wrote:
Software development is a constant war with the universe... Developers trying to do better idiot-proof software and the universe trying to do even dumber users...
You made my day with this phrase!
You are welcome :) :-D
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
-
i'm here occasionally. not constantly, as previously. it goes... on and on and on and on. :)
I still remember your old profile pic - with hand on your thoughtful face. Got it somewhere? :)
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
-
I still remember your old profile pic - with hand on your thoughtful face. Got it somewhere? :)
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
Vikram A Punathambekar wrote:
Got it somewhere
He probably has his face at the same bbody place you have yours. :rolleyes:
-
Examples (#'s have been removed):
P O BOX
P.O. BOX
PMB
PO B0X
PO BO X
PO BOK
PO BOS
BOX:sigh: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'. The hardest part about parsing crap like this (there are 166,333 records) is determining what other variants I did not parse correctly (for example, considered as a street address, not a PO Box), not which ones I successfully accounted for. Marc
Latest Article - Create a Dockerized Python Fiddle Web App Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny Artificial intelligence is the only remedy for natural stupidity. - CDP1802
Maybe who entered used Hungarian autocorrect in, let's say, Word. box autocorrects to boksz, [s]he tried to correct that to something sounding right, but deleted the wrong letter. Or gave up fighting autocorrect :)
-
Vikram A Punathambekar wrote:
Got it somewhere
He probably has his face at the same bbody place you have yours. :rolleyes:
Smarty pants ;P
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh: