Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How to know wheter a string contains a url?

How to know wheter a string contains a url?

Scheduled Pinned Locked Moved C#
questioncomtutorial
11 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W Offline
    W Offline
    Waleed Eissa
    wrote on last edited by
    #1

    How can I know whether a string contains a url? It's very easy if it starts with http:// but I'm talking about urls that don't start with http://, I don't want to extract the url, I just want to know whether a string contains a url or not, any ideas?

    Waleed Eissa Software Developer Sydney

    M C P 3 Replies Last reply
    0
    • W Waleed Eissa

      How can I know whether a string contains a url? It's very easy if it starts with http:// but I'm talking about urls that don't start with http://, I don't want to extract the url, I just want to know whether a string contains a url or not, any ideas?

      Waleed Eissa Software Developer Sydney

      M Offline
      M Offline
      Manas Bhardwaj
      wrote on last edited by
      #2

      Use regular expressions

      Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

      W 1 Reply Last reply
      0
      • W Waleed Eissa

        How can I know whether a string contains a url? It's very easy if it starts with http:// but I'm talking about urls that don't start with http://, I don't want to extract the url, I just want to know whether a string contains a url or not, any ideas?

        Waleed Eissa Software Developer Sydney

        C Offline
        C Offline
        Christian Graus
        wrote on last edited by
        #3

        The answer you were given was good, the only other thing you could do, to see if it's a VALID URL, is to do a HTTP Post to it and see what you get back.

        Christian Graus No longer a Microsoft MVP, but still happy to answer your questions.

        W 1 Reply Last reply
        0
        • M Manas Bhardwaj

          Use regular expressions

          Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

          W Offline
          W Offline
          Waleed Eissa
          wrote on last edited by
          #4

          Thanks for your reply, actually the problem I find with using a regular expression is that it can become really hard to distinguish normal text from a url, as I mentioned in my post, I want to find urls even if they don't start with http://, this what makes it really challenging.

          Waleed Eissa Software Developer Sydney

          1 Reply Last reply
          0
          • C Christian Graus

            The answer you were given was good, the only other thing you could do, to see if it's a VALID URL, is to do a HTTP Post to it and see what you get back.

            Christian Graus No longer a Microsoft MVP, but still happy to answer your questions.

            W Offline
            W Offline
            Waleed Eissa
            wrote on last edited by
            #5

            Thanks for your answer but I'm afraid this is not possible, I'm just trying to write a spam filter for my website, so I can't keep users waiting that long, I thought about searching for all TLDs but I don't think it's a good idea, performance-wise. Do you know of any good spam filter that I can call from ASP.NET application? ie. send it a string and gets something like a boolean indicating whether it's spam or not, a percentage will even be much better than a boolean (the percentage of how likely this post is spam), thanks.

            Waleed Eissa Software Developer Sydney

            C 1 Reply Last reply
            0
            • W Waleed Eissa

              Thanks for your answer but I'm afraid this is not possible, I'm just trying to write a spam filter for my website, so I can't keep users waiting that long, I thought about searching for all TLDs but I don't think it's a good idea, performance-wise. Do you know of any good spam filter that I can call from ASP.NET application? ie. send it a string and gets something like a boolean indicating whether it's spam or not, a percentage will even be much better than a boolean (the percentage of how likely this post is spam), thanks.

              Waleed Eissa Software Developer Sydney

              C Offline
              C Offline
              Christian Graus
              wrote on last edited by
              #6

              I think I just answered this in the ASP.NET forum. There is no way of knowing if a string is a *valid* URL without posting to it. Telling if a string is a valid URL is easy with regex tho.

              Christian Graus No longer a Microsoft MVP, but still happy to answer your questions.

              W 1 Reply Last reply
              0
              • C Christian Graus

                I think I just answered this in the ASP.NET forum. There is no way of knowing if a string is a *valid* URL without posting to it. Telling if a string is a valid URL is easy with regex tho.

                Christian Graus No longer a Microsoft MVP, but still happy to answer your questions.

                W Offline
                W Offline
                Waleed Eissa
                wrote on last edited by
                #7

                Ok, now I get your point, actually I don't care whether they are valid or not, as I mentioned before it's just for spam filtering so it's not important to check whether they are valid .. Let me explain from the beginning (hopefully you have the time to read all this :)) In my website, users should be adding a lot of posts in a short time and I want the site to be as fast and responsive as possible when they do this, so, basically I'm looking for a spam filter that will run on my machine (as opposed to spam filters that call a web service on another website, like akismet, which can be good for blogs and sites that don't receive many posts). Unfortunately I wasn't able, so far, to find such thing, this is why I'm trying to write it myself and it seems more complicated than what I thought. Well, I thought of two approaches that I can use to detect spam: - Using naive bayesian (there's an article here on code project that talks about that, see http://www.codeproject.com/KB/recipes/BayesianCS.aspx[^]) - Using some rules that usually apply to spam and this is what I'm trying to do. Actually naive bayesian is very effective in most cases but it's basically because of something related to my app. Read on: Due to the nature of my website, users wouldn't normally post any text that contains links (and I don't change links that start with http:// to anchor tags). So, it's reasonable to assume that posts that contain links will most likely be spam. Spammers can spam your site for two reasons, first to get a higher page rank for some website, more accurately for some web page (which is not true in my case as I don't change links into anchor tags, and even if I was I could use rel="nofollow" as most people do) but anyway the point is that the spam contains a url, second to advertise something and in this case they have to leave a url, email or a phone number (if you can't reach the advertiser then the ad is useless, right?). Probably you're thinking that if I don't change the links into anchor tags they won't spam my site, I can assure you they are dumb enough to do this, I have seen many other websites that don't change links into anchors still they are heavily spammed (but may be not because they are dumb, it might be because it's rumored that google detects any links that start with http:// when crawling your site even if they are not in an

                M 1 Reply Last reply
                0
                • W Waleed Eissa

                  How can I know whether a string contains a url? It's very easy if it starts with http:// but I'm talking about urls that don't start with http://, I don't want to extract the url, I just want to know whether a string contains a url or not, any ideas?

                  Waleed Eissa Software Developer Sydney

                  P Offline
                  P Offline
                  Paul Conrad
                  wrote on last edited by
                  #8

                  Use regular expressions as you have already been told. Instead of checking for something like http://, why not just check for things like .com, .net, .edu, etc.

                  "The clue train passed his station without stopping." - John Simmons / outlaw programmer "Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

                  W 1 Reply Last reply
                  0
                  • P Paul Conrad

                    Use regular expressions as you have already been told. Instead of checking for something like http://, why not just check for things like .com, .net, .edu, etc.

                    "The clue train passed his station without stopping." - John Simmons / outlaw programmer "Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

                    W Offline
                    W Offline
                    Waleed Eissa
                    wrote on last edited by
                    #9

                    Hi Paul, thanks for your answer, the problem with checking for domain names, like .com, .net .. etc, is that there are too many TLDs to check for (because you have to check for ccTLDs which are very commonly used by spammers), this is along with some other problems too, please refer to my last post. Regards

                    Waleed Eissa Software Developer Sydney

                    1 Reply Last reply
                    0
                    • W Waleed Eissa

                      Ok, now I get your point, actually I don't care whether they are valid or not, as I mentioned before it's just for spam filtering so it's not important to check whether they are valid .. Let me explain from the beginning (hopefully you have the time to read all this :)) In my website, users should be adding a lot of posts in a short time and I want the site to be as fast and responsive as possible when they do this, so, basically I'm looking for a spam filter that will run on my machine (as opposed to spam filters that call a web service on another website, like akismet, which can be good for blogs and sites that don't receive many posts). Unfortunately I wasn't able, so far, to find such thing, this is why I'm trying to write it myself and it seems more complicated than what I thought. Well, I thought of two approaches that I can use to detect spam: - Using naive bayesian (there's an article here on code project that talks about that, see http://www.codeproject.com/KB/recipes/BayesianCS.aspx[^]) - Using some rules that usually apply to spam and this is what I'm trying to do. Actually naive bayesian is very effective in most cases but it's basically because of something related to my app. Read on: Due to the nature of my website, users wouldn't normally post any text that contains links (and I don't change links that start with http:// to anchor tags). So, it's reasonable to assume that posts that contain links will most likely be spam. Spammers can spam your site for two reasons, first to get a higher page rank for some website, more accurately for some web page (which is not true in my case as I don't change links into anchor tags, and even if I was I could use rel="nofollow" as most people do) but anyway the point is that the spam contains a url, second to advertise something and in this case they have to leave a url, email or a phone number (if you can't reach the advertiser then the ad is useless, right?). Probably you're thinking that if I don't change the links into anchor tags they won't spam my site, I can assure you they are dumb enough to do this, I have seen many other websites that don't change links into anchors still they are heavily spammed (but may be not because they are dumb, it might be because it's rumored that google detects any links that start with http:// when crawling your site even if they are not in an

                      M Offline
                      M Offline
                      Manas Bhardwaj
                      wrote on last edited by
                      #10

                      Waleed Eissa wrote:

                      Using naive bayesian

                      But again, Naive Bayes algorithm doesn't have inteliigence on its own. It has to be trained in proper manner to produce results. The more you train him, the better results it will yield.

                      Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

                      W 1 Reply Last reply
                      0
                      • M Manas Bhardwaj

                        Waleed Eissa wrote:

                        Using naive bayesian

                        But again, Naive Bayes algorithm doesn't have inteliigence on its own. It has to be trained in proper manner to produce results. The more you train him, the better results it will yield.

                        Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

                        W Offline
                        W Offline
                        Waleed Eissa
                        wrote on last edited by
                        #11

                        Actually I'm not esp. interested in Naive Bayes algorithm or any other algorithm, I'm just trying to filter out the spam, can you suggest a better way for doing this? And if you know of a good spam filter that I can use in my application that will even be much better. Regards

                        Waleed Eissa Software Developer Sydney

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups