Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Detecting similar URLs

Detecting similar URLs

Scheduled Pinned Locked Moved C#
databasecomquestion
2 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Meysam Mahfouzi
    wrote on last edited by
    #1

    I've got a table in database in which I store URLs that users bookmark. But before inserting a url into database, I want to make sure it's not been bookmarked before by another user. To do so, I have to search for similar forms of a url. i.e. if someone inserts www.yahoo.com, I want to avoid inserting http://yahoo.com again to prevent duplicate entries. The first thing that came to my mind as a solution, was to make urls canonical before inserting them into database, i.e. remove www from the beginning of url (if any) and add http:// to it. This seems a good workaround. The problems is, I don't like to manipulate the initial urls. I mean, If a user wants to bookmark www.yahoo.com, I don't like to insert http://yahoo.com into database for some urls, will not open if you remove www from the beginning of them. Any idea dudes?

    W 1 Reply Last reply
    0
    • M Meysam Mahfouzi

      I've got a table in database in which I store URLs that users bookmark. But before inserting a url into database, I want to make sure it's not been bookmarked before by another user. To do so, I have to search for similar forms of a url. i.e. if someone inserts www.yahoo.com, I want to avoid inserting http://yahoo.com again to prevent duplicate entries. The first thing that came to my mind as a solution, was to make urls canonical before inserting them into database, i.e. remove www from the beginning of url (if any) and add http:// to it. This seems a good workaround. The problems is, I don't like to manipulate the initial urls. I mean, If a user wants to bookmark www.yahoo.com, I don't like to insert http://yahoo.com into database for some urls, will not open if you remove www from the beginning of them. Any idea dudes?

      W Offline
      W Offline
      Wendelius
      wrote on last edited by
      #2

      Maysam Mahfouzi wrote:

      The problems is, I don't like to manipulate the initial urls

      If you don't want to manipulate the URL, doesn't that actually mean you store each URL just as it is (just check that the exact URL isn't found)?

      Maysam Mahfouzi wrote:

      The first thing that came to my mind as a solution, was to make urls canonical before inserting them into database

      One thing is that you could make a canonical version first, store it in parent-table and then store the unmodified url in child-table. Something like

      CanonicalUrl (
      CanonicalUrlId int,
      Url varchar(500)
      )

      Url (
      UrlIdId int,
      CanonicalUrl int,
      Url varchar(500)
      )

      If you want you could also build a calculated column to Url table to represent the canonical form. However all these add extra logic to the data handling. So I'm wondering why do you want to prevent storing similar url's at all. Of course storing exactly the same may not be wise, but that's easily prevented.

      The need to optimize rises from a bad design.My articles[^]

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups