Does such a software exist?
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
I'd doubt it - it's kinda specialist stuff. If you think about it, it needs speech recognition and MP3 processing, to get the timestamp. There are open source SR packages though, like CMUSphinx Open Source Speech Recognition[^] for example which you may be able to modify to suit your need - but I've never used it so I have no idea how complex the whole idea would be (Guess: "lots"). Good luck!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
-
I'd doubt it - it's kinda specialist stuff. If you think about it, it needs speech recognition and MP3 processing, to get the timestamp. There are open source SR packages though, like CMUSphinx Open Source Speech Recognition[^] for example which you may be able to modify to suit your need - but I've never used it so I have no idea how complex the whole idea would be (Guess: "lots"). Good luck!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
Thanks. Will take a look.
-
Never thought it would involve neural nets. Seems complicated enough.
-
Never thought it would involve neural nets. Seems complicated enough.
It's complex stuff: you saying "rainy" in a lecture won't "look the same" as Herself screaming it at the TV!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
-
It's complex stuff: you saying "rainy" in a lecture won't "look the same" as Herself screaming it at the TV!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
Surely there should be something invariant between these two utterances of 'rainy'.
-
Surely there should be something invariant between these two utterances of 'rainy'.
Broadly there are similarities, but ... most males speak at a lower pitch than most females, and have thicker vocal chords - so that will affect the MP3 data. And think about accents for a moment. I don't know about regional speech differences in India, but I'd suspect that there would be a significant difference in accent between someone from Dehli and a resident of Tamil Nadu (most of my Indian friends are [ex]Malaysian Tamils so I don't hear different regions enough to identify an accent) Certainly the way a Glaswegian would pronounce a word would give a very different waveform to that of a Londoner, a Brummie, or a Liverpudlian. And then you get to Welsh or Irish natives ... :D I don't think there is anything you could rely upon to identify a generic word as a part of a MP3 file without some form of speech recognition.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
-
Broadly there are similarities, but ... most males speak at a lower pitch than most females, and have thicker vocal chords - so that will affect the MP3 data. And think about accents for a moment. I don't know about regional speech differences in India, but I'd suspect that there would be a significant difference in accent between someone from Dehli and a resident of Tamil Nadu (most of my Indian friends are [ex]Malaysian Tamils so I don't hear different regions enough to identify an accent) Certainly the way a Glaswegian would pronounce a word would give a very different waveform to that of a Londoner, a Brummie, or a Liverpudlian. And then you get to Welsh or Irish natives ... :D I don't think there is anything you could rely upon to identify a generic word as a part of a MP3 file without some form of speech recognition.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
Yes. I understand. So many variations. Finding an invariant is hopelessly difficult, what they call as a hard problem.
-
Yes. I understand. So many variations. Finding an invariant is hopelessly difficult, what they call as a hard problem.
You could have listened to an hour of it by now :-D
Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
Maybe you can use one of these speech-recognition-libraries[^]
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
use a text to speech tool that generates .VTT close captioning files. Each snippet of a few words will have a timestamp to indicate when it's displayed. Azure has an API for it, and I think AWS has recently upgraded their video to text processing tools to offer the same.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius
-
You could have listened to an hour of it by now :-D
Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP
If I had one file, I wouldn't have asked. I have at least 100 files, and have heard them once at least, and would like to find whether certain keywords occur in them. Something like batch-processing is what I an looking for. That's why I asked.
-
If I had one file, I wouldn't have asked. I have at least 100 files, and have heard them once at least, and would like to find whether certain keywords occur in them. Something like batch-processing is what I an looking for. That's why I asked.
Fur enuff :-D
Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP
-
use a text to speech tool that generates .VTT close captioning files. Each snippet of a few words will have a timestamp to indicate when it's displayed. Azure has an API for it, and I think AWS has recently upgraded their video to text processing tools to offer the same.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius
To get a feeling for how reliable such systems are, get yourself a YouTube account so that you can upload a few files there, and play them back selecting 'Auto generated' subtitles. A few years ago, that function tried to autogenerate subtitles even from instrumental music, which could lead to some really funny results. Today, it just says [music], if it cannot detect any vocal part. For vocal music, it quite often misinterprets. For speech, it is surprisingly good, as long as the background noise level is low, the speech is distinct and only one person at a time is speaking. For English only, of course. (I assume that those who really put resources into this kind of stuff also has high performance versions for Russian speech, but don't expect that to be released to the civilian society :-).)
-
To get a feeling for how reliable such systems are, get yourself a YouTube account so that you can upload a few files there, and play them back selecting 'Auto generated' subtitles. A few years ago, that function tried to autogenerate subtitles even from instrumental music, which could lead to some really funny results. Today, it just says [music], if it cannot detect any vocal part. For vocal music, it quite often misinterprets. For speech, it is surprisingly good, as long as the background noise level is low, the speech is distinct and only one person at a time is speaking. For English only, of course. (I assume that those who really put resources into this kind of stuff also has high performance versions for Russian speech, but don't expect that to be released to the civilian society :-).)
I'm well aware of the limitations in computerized speech to text. The versions offered as services by the big tech companies are the least bad ones available though. I've never looked into generating non-english transcripts/captions.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.
assuming you don't have a life ;) what a great research project. Me? I just want to learn how to hang interior doors square, but we all have our goals.
Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.
-
I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.