Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Does such a software exist?

Does such a software exist?

Scheduled Pinned Locked Moved The Lounge
question
26 Posts 14 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Amarnath S

    Never thought it would involve neural nets. Seems complicated enough.

    OriginalGriffO Offline
    OriginalGriffO Offline
    OriginalGriff
    wrote on last edited by
    #6

    It's complex stuff: you saying "rainy" in a lecture won't "look the same" as Herself screaming it at the TV!

    "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

    "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
    "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

    A 1 Reply Last reply
    0
    • OriginalGriffO OriginalGriff

      It's complex stuff: you saying "rainy" in a lecture won't "look the same" as Herself screaming it at the TV!

      "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

      A Offline
      A Offline
      Amarnath S
      wrote on last edited by
      #7

      Surely there should be something invariant between these two utterances of 'rainy'.

      OriginalGriffO C 2 Replies Last reply
      0
      • A Amarnath S

        Surely there should be something invariant between these two utterances of 'rainy'.

        OriginalGriffO Offline
        OriginalGriffO Offline
        OriginalGriff
        wrote on last edited by
        #8

        Broadly there are similarities, but ... most males speak at a lower pitch than most females, and have thicker vocal chords - so that will affect the MP3 data. And think about accents for a moment. I don't know about regional speech differences in India, but I'd suspect that there would be a significant difference in accent between someone from Dehli and a resident of Tamil Nadu (most of my Indian friends are [ex]Malaysian Tamils so I don't hear different regions enough to identify an accent) Certainly the way a Glaswegian would pronounce a word would give a very different waveform to that of a Londoner, a Brummie, or a Liverpudlian. And then you get to Welsh or Irish natives ... :D I don't think there is anything you could rely upon to identify a generic word as a part of a MP3 file without some form of speech recognition.

        "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

        "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
        "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

        A 1 Reply Last reply
        0
        • OriginalGriffO OriginalGriff

          Broadly there are similarities, but ... most males speak at a lower pitch than most females, and have thicker vocal chords - so that will affect the MP3 data. And think about accents for a moment. I don't know about regional speech differences in India, but I'd suspect that there would be a significant difference in accent between someone from Dehli and a resident of Tamil Nadu (most of my Indian friends are [ex]Malaysian Tamils so I don't hear different regions enough to identify an accent) Certainly the way a Glaswegian would pronounce a word would give a very different waveform to that of a Londoner, a Brummie, or a Liverpudlian. And then you get to Welsh or Irish natives ... :D I don't think there is anything you could rely upon to identify a generic word as a part of a MP3 file without some form of speech recognition.

          "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

          A Offline
          A Offline
          Amarnath S
          wrote on last edited by
          #9

          Yes. I understand. So many variations. Finding an invariant is hopelessly difficult, what they call as a hard problem.

          pkfoxP 1 Reply Last reply
          0
          • A Amarnath S

            Yes. I understand. So many variations. Finding an invariant is hopelessly difficult, what they call as a hard problem.

            pkfoxP Offline
            pkfoxP Offline
            pkfox
            wrote on last edited by
            #10

            You could have listened to an hour of it by now :-D

            Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP

            A 1 Reply Last reply
            0
            • A Amarnath S

              I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

              R Offline
              R Offline
              RickZeeland
              wrote on last edited by
              #11

              Maybe you can use one of these speech-recognition-libraries[^]

              1 Reply Last reply
              0
              • A Amarnath S

                I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                D Offline
                D Offline
                Dan Neely
                wrote on last edited by
                #12

                use a text to speech tool that generates .VTT close captioning files. Each snippet of a few words will have a timestamp to indicate when it's displayed. Azure has an API for it, and I think AWS has recently upgraded their video to text processing tools to offer the same.

                Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius

                T 1 Reply Last reply
                0
                • pkfoxP pkfox

                  You could have listened to an hour of it by now :-D

                  Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP

                  A Offline
                  A Offline
                  Amarnath S
                  wrote on last edited by
                  #13

                  If I had one file, I wouldn't have asked. I have at least 100 files, and have heard them once at least, and would like to find whether certain keywords occur in them. Something like batch-processing is what I an looking for. That's why I asked.

                  pkfoxP L 2 Replies Last reply
                  0
                  • A Amarnath S

                    If I had one file, I wouldn't have asked. I have at least 100 files, and have heard them once at least, and would like to find whether certain keywords occur in them. Something like batch-processing is what I an looking for. That's why I asked.

                    pkfoxP Offline
                    pkfoxP Offline
                    pkfox
                    wrote on last edited by
                    #14

                    Fur enuff :-D

                    Life should not be a journey to the grave with the intention of arriving safely in a pretty and well-preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!" - Hunter S Thompson - RIP

                    1 Reply Last reply
                    0
                    • D Dan Neely

                      use a text to speech tool that generates .VTT close captioning files. Each snippet of a few words will have a timestamp to indicate when it's displayed. Azure has an API for it, and I think AWS has recently upgraded their video to text processing tools to offer the same.

                      Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius

                      T Offline
                      T Offline
                      trønderen
                      wrote on last edited by
                      #15

                      To get a feeling for how reliable such systems are, get yourself a YouTube account so that you can upload a few files there, and play them back selecting 'Auto generated' subtitles. A few years ago, that function tried to autogenerate subtitles even from instrumental music, which could lead to some really funny results. Today, it just says [music], if it cannot detect any vocal part. For vocal music, it quite often misinterprets. For speech, it is surprisingly good, as long as the background noise level is low, the speech is distinct and only one person at a time is speaking. For English only, of course. (I assume that those who really put resources into this kind of stuff also has high performance versions for Russian speech, but don't expect that to be released to the civilian society :-).)

                      D 1 Reply Last reply
                      0
                      • T trønderen

                        To get a feeling for how reliable such systems are, get yourself a YouTube account so that you can upload a few files there, and play them back selecting 'Auto generated' subtitles. A few years ago, that function tried to autogenerate subtitles even from instrumental music, which could lead to some really funny results. Today, it just says [music], if it cannot detect any vocal part. For vocal music, it quite often misinterprets. For speech, it is surprisingly good, as long as the background noise level is low, the speech is distinct and only one person at a time is speaking. For English only, of course. (I assume that those who really put resources into this kind of stuff also has high performance versions for Russian speech, but don't expect that to be released to the civilian society :-).)

                        D Offline
                        D Offline
                        Dan Neely
                        wrote on last edited by
                        #16

                        I'm well aware of the limitations in computerized speech to text. The versions offered as services by the big tech companies are the least bad ones available though. I've never looked into generating non-english transcripts/captions.

                        Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius

                        1 Reply Last reply
                        0
                        • A Amarnath S

                          I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                          J Offline
                          J Offline
                          jmaida
                          wrote on last edited by
                          #17

                          MP3 to text conversion would be step 1?

                          "A little time, a little trouble, your better day" Badfinger

                          1 Reply Last reply
                          0
                          • A Amarnath S

                            I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                            M Offline
                            M Offline
                            megaadam
                            wrote on last edited by
                            #18

                            I suggest you google "speech to text" that is the industry terminology.

                            "If we don't change direction, we'll end up where we're going"

                            1 Reply Last reply
                            0
                            • A Amarnath S

                              I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                              C Offline
                              C Offline
                              charlieg
                              wrote on last edited by
                              #19

                              assuming you don't have a life ;) what a great research project. Me? I just want to learn how to hang interior doors square, but we all have our goals.

                              Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

                              1 Reply Last reply
                              0
                              • A Amarnath S

                                I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                                D Offline
                                D Offline
                                dandy72
                                wrote on last edited by
                                #20

                                Would it be enough to use a program that can do transcripts, and then you can search through the resulting text? No timestamps, but whether that's sufficient for you depends on the use case...

                                1 Reply Last reply
                                0
                                • A Amarnath S

                                  Surely there should be something invariant between these two utterances of 'rainy'.

                                  C Offline
                                  C Offline
                                  CodeWomble
                                  wrote on last edited by
                                  #21

                                  Yes and no. The similarities can be washed out by the noise in the sentence, particularly with all of the different accents/varieties of English that are spoken around the world. Also, just identifying word endings can be tricky. Picking out words from "rainy days" can give "rainy days", "ray knee days" or "ray need as'" without changing the sound.

                                  1 Reply Last reply
                                  0
                                  • A Amarnath S

                                    If I had one file, I wouldn't have asked. I have at least 100 files, and have heard them once at least, and would like to find whether certain keywords occur in them. Something like batch-processing is what I an looking for. That's why I asked.

                                    L Offline
                                    L Offline
                                    Lost User
                                    wrote on last edited by
                                    #22

                                    Run the files through a (batch) speech recognition program and dump the whole thing to a text file. You can (re)search the text without having to go back to the mp3. Looking at a file in its entirety will help in deciding how to "train" the (ML) process. [Transcribe your recordings](https://support.microsoft.com/en-us/office/transcribe-your-recordings-7fc2efec-245e-45f0-b053-2a97531ecf57)

                                    "Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

                                    1 Reply Last reply
                                    0
                                    • R Rage

                                      Here[^], but it is an API, not a finished software. Don't you have a trainee, or a teenager lying somewhere doing teenaging stuff ? :-D

                                      Do not escape reality : improve reality !

                                      R Offline
                                      R Offline
                                      r_hyde
                                      wrote on last edited by
                                      #23

                                      The answer in that link leads to the VOSK project[^], which in turn has a page that lists several projects that integrate the VOSK toolkit. One of those projects is pretty much exactly what the OP is asking for: mp4grep[^]. Looks like it's Linux-only, and using it is a two-step process unless you happen to be starting with a 16khz wav file, but it appears to automate the process of using speech recognition to generate a timestamped transcript, and then grep'ing the result to find the word or expression of interest. I didn't actually try to use it, so I can't vouch for its quality, but it looks pretty neat if it works!

                                      1 Reply Last reply
                                      0
                                      • A Amarnath S

                                        I have an MP3 file. It is an hour-long lecture on a topic about weather. I would like to find out ('seek') the exact time(s) when the word 'rainy' gets played in this MP3 file, without listening to the entire hour-long lecture. Similar to finding a substring in a long string, but finding an audio clip within an audio file. Does such a software functionality exist? My Google search didn't yield much.

                                        L Offline
                                        L Offline
                                        Lost User
                                        wrote on last edited by
                                        #24

                                        Hmmm, Did you find a solution for this? [Whisper](https://openai.com/blog/whisper/) was released just yesterday. [GitHub - OpenAI/whisper](https://github.com/openai/whisper) Looks really easy to use.

                                        A 1 Reply Last reply
                                        0
                                        • L Lost User

                                          Hmmm, Did you find a solution for this? [Whisper](https://openai.com/blog/whisper/) was released just yesterday. [GitHub - OpenAI/whisper](https://github.com/openai/whisper) Looks really easy to use.

                                          A Offline
                                          A Offline
                                          Amarnath S
                                          wrote on last edited by
                                          #25

                                          Thanks a lot. Will download and take a look.

                                          L 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups