Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Searching with optional parameters [modified]

Searching with optional parameters [modified]

Scheduled Pinned Locked Moved Database
databasealgorithmsperformancetutorialquestion
14 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Pascal Ganaye

    In most application I have seen code that searches records using a number of optional fields. I have to build such a screen with a table that is expected to contains several million records. Here a short example that exposes my questions, I would like to here your comments on it:

    CREATE PROCEDURE SearchContacts
    (
    @Reference varchar(8) = NULL,
    @DateOfBirth datetime = NULL,
    @ContactName varchar(50) = NULL,
    @ContactType varchar(10) = NULL
    )
    AS
    BEGIN
    SELECT TOP 100 * FROM Contacts WHERE
    (@Reference IS NULL OR Contacts.Reference = @Reference)
    AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
    AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
    AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
    END

    This query is very simple however it is probably not going to take advantage of the indexes that are available. As I have to deal with a table that will contain a lot of record, this won't be acceptable. An alternative would be to hard code many if :

    IF (@ContactName IS NULL AND @ContactType IS NULL AND @Reference IS NOT NULL)
    BEGIN
    SELECT * FROM Contacts WHERE
    (@Reference IS NULL OR Contacts.Reference = @Reference)
    AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
    END
    ELSE IF (@Reference IS NULL AND @DateOfBirth IS NULL)
    BEGIN
    SELECT * FROM Contacts WHERE
    (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
    AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
    END
    ...

    This can probably use an index but it is very tedious to write in this example the 16 combination of parameters that can happen. The next step is to try with a dynamic query

    SET @Sql = 'SELECT TOP 100 * FROM Contacts WHERE TRUE '
    IF @Reference IS NOT NULL
    BEGIN
    SET @Sql = @Sql + ' AND Contacts.Reference = ''' + QUOTENAME(@Reference,'') + ''''
    END
    IF @DateOfBirth IS NOT NULL
    BEGIN
    SET @Sql = @Sql + ' AND Contacts.DateOfBirth = ''' + QUOTENAME(@DateOfBirth,'') + ''''
    END
    ...

    On this last sample I assume this will allow the optimizer to use the right index. But will the optimization would have to be done on every request? Is there a more elegant way to write this sort of queries while keeping maximum efficiency ?

    modified on Tuesday, August 16, 2011 11:34 AM

    P Offline
    P Offline
    PIEBALDconsult
    wrote on last edited by
    #2

    It looks as if using a stored procedure is the limiting factor. I'd just cobble up the SQL in code and execute it. Slick as snot. Something along the lines of (not tested, and I'm not awake yet either):

    List<string:gt; parts = new List<string>() ;
    List<IDbDataParameter> parms = new List<IDbDataParameter>() ;

    cmd.CommandText = "SELECT TOP 100 * FROM Contacts " ;

    if ( !System.String.IsNullOrEmpty ( Reference ) )
    {
    parts.Add ( "Contacts.Reference = @Reference " ) ;

    IDbDataParameter prm = cmd.CreateParameter() ;
    prm.Name = "@Reference" ;
    prm.Value = Reference ;
    parms.Add ( prm ) ;
    

    }

    ...

    if ( parts.Count > 0 )
    {
    cmd.CommandText += "WHERE " ;

    cmd.CommandText += parts \[ 0 \] ;
    cmd.Parameters.Add ( parms \[ 0 \] ) ;
    
    for ( int i = 1 ; i < parts.Count ; i++ )
    {
        cmd.CommandText += "AND " ;
    
        cmd.CommandText += parts \[ i \] ;
        cmd.Parameters.Add ( parms \[ i \] ) ;
    }
    

    }

    cmd.ExecuteReader() ;

    P 1 Reply Last reply
    0
    • P PIEBALDconsult

      It looks as if using a stored procedure is the limiting factor. I'd just cobble up the SQL in code and execute it. Slick as snot. Something along the lines of (not tested, and I'm not awake yet either):

      List<string:gt; parts = new List<string>() ;
      List<IDbDataParameter> parms = new List<IDbDataParameter>() ;

      cmd.CommandText = "SELECT TOP 100 * FROM Contacts " ;

      if ( !System.String.IsNullOrEmpty ( Reference ) )
      {
      parts.Add ( "Contacts.Reference = @Reference " ) ;

      IDbDataParameter prm = cmd.CreateParameter() ;
      prm.Name = "@Reference" ;
      prm.Value = Reference ;
      parms.Add ( prm ) ;
      

      }

      ...

      if ( parts.Count > 0 )
      {
      cmd.CommandText += "WHERE " ;

      cmd.CommandText += parts \[ 0 \] ;
      cmd.Parameters.Add ( parms \[ 0 \] ) ;
      
      for ( int i = 1 ; i < parts.Count ; i++ )
      {
          cmd.CommandText += "AND " ;
      
          cmd.CommandText += parts \[ i \] ;
          cmd.Parameters.Add ( parms \[ i \] ) ;
      }
      

      }

      cmd.ExecuteReader() ;

      P Offline
      P Offline
      Pascal Ganaye
      wrote on last edited by
      #3

      In C# you're right it is easier, you won't have the sql injection problem. Nice piece of code you got my 5. A collegue gave me a good solution in T-SQL. I'll be posting it too.

      1 Reply Last reply
      0
      • P Pascal Ganaye

        In most application I have seen code that searches records using a number of optional fields. I have to build such a screen with a table that is expected to contains several million records. Here a short example that exposes my questions, I would like to here your comments on it:

        CREATE PROCEDURE SearchContacts
        (
        @Reference varchar(8) = NULL,
        @DateOfBirth datetime = NULL,
        @ContactName varchar(50) = NULL,
        @ContactType varchar(10) = NULL
        )
        AS
        BEGIN
        SELECT TOP 100 * FROM Contacts WHERE
        (@Reference IS NULL OR Contacts.Reference = @Reference)
        AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
        AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
        AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
        END

        This query is very simple however it is probably not going to take advantage of the indexes that are available. As I have to deal with a table that will contain a lot of record, this won't be acceptable. An alternative would be to hard code many if :

        IF (@ContactName IS NULL AND @ContactType IS NULL AND @Reference IS NOT NULL)
        BEGIN
        SELECT * FROM Contacts WHERE
        (@Reference IS NULL OR Contacts.Reference = @Reference)
        AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
        END
        ELSE IF (@Reference IS NULL AND @DateOfBirth IS NULL)
        BEGIN
        SELECT * FROM Contacts WHERE
        (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
        AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
        END
        ...

        This can probably use an index but it is very tedious to write in this example the 16 combination of parameters that can happen. The next step is to try with a dynamic query

        SET @Sql = 'SELECT TOP 100 * FROM Contacts WHERE TRUE '
        IF @Reference IS NOT NULL
        BEGIN
        SET @Sql = @Sql + ' AND Contacts.Reference = ''' + QUOTENAME(@Reference,'') + ''''
        END
        IF @DateOfBirth IS NOT NULL
        BEGIN
        SET @Sql = @Sql + ' AND Contacts.DateOfBirth = ''' + QUOTENAME(@DateOfBirth,'') + ''''
        END
        ...

        On this last sample I assume this will allow the optimizer to use the right index. But will the optimization would have to be done on every request? Is there a more elegant way to write this sort of queries while keeping maximum efficiency ?

        modified on Tuesday, August 16, 2011 11:34 AM

        P Offline
        P Offline
        Pascal Ganaye
        wrote on last edited by
        #4

        The answer above is valid in C#. My good collegue and friend James H gave me some hints on how to fix it in T-SQL.

        DECLARE @ContactName NVARCHAR(255)
        DECLARE @Reference NVARCHAR(255)
        DECLARE @sql NVARCHAR(255)

        SET @sql = 'SELECT TOP 100 FamilyName,Reference FROM Contact C WHERE 1=1'

        -- SET @ContactName = 'Smith'
        -- SET @Reference = 'A%'

        IF NOT @ContactName IS NULL
        SET @sql = @sql + ' AND C.ContactName = @ContactName'
        IF NOT @Reference IS NULL
        SET @sql = @sql + ' AND C.Reference LIKE @Reference'

        EXECUTE sp_executesql
        @sql,
        N'@FamilyName NVARCHAR(255), @CoName NVARCHAR(255)',
        @ContactName, @Reference

        This approach is much better for several reasons: - no quoted parameters, the parameters are passed by name so there is no possible SQL injection - because the query do not change on each request we can assume it will be optimized and cached

        P 1 Reply Last reply
        0
        • P Pascal Ganaye

          In most application I have seen code that searches records using a number of optional fields. I have to build such a screen with a table that is expected to contains several million records. Here a short example that exposes my questions, I would like to here your comments on it:

          CREATE PROCEDURE SearchContacts
          (
          @Reference varchar(8) = NULL,
          @DateOfBirth datetime = NULL,
          @ContactName varchar(50) = NULL,
          @ContactType varchar(10) = NULL
          )
          AS
          BEGIN
          SELECT TOP 100 * FROM Contacts WHERE
          (@Reference IS NULL OR Contacts.Reference = @Reference)
          AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
          AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
          AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
          END

          This query is very simple however it is probably not going to take advantage of the indexes that are available. As I have to deal with a table that will contain a lot of record, this won't be acceptable. An alternative would be to hard code many if :

          IF (@ContactName IS NULL AND @ContactType IS NULL AND @Reference IS NOT NULL)
          BEGIN
          SELECT * FROM Contacts WHERE
          (@Reference IS NULL OR Contacts.Reference = @Reference)
          AND (@DateOfBirth IS NULL OR Contacts.DateOfBirth = @DateOfBirth)
          END
          ELSE IF (@Reference IS NULL AND @DateOfBirth IS NULL)
          BEGIN
          SELECT * FROM Contacts WHERE
          (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
          AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)
          END
          ...

          This can probably use an index but it is very tedious to write in this example the 16 combination of parameters that can happen. The next step is to try with a dynamic query

          SET @Sql = 'SELECT TOP 100 * FROM Contacts WHERE TRUE '
          IF @Reference IS NOT NULL
          BEGIN
          SET @Sql = @Sql + ' AND Contacts.Reference = ''' + QUOTENAME(@Reference,'') + ''''
          END
          IF @DateOfBirth IS NOT NULL
          BEGIN
          SET @Sql = @Sql + ' AND Contacts.DateOfBirth = ''' + QUOTENAME(@DateOfBirth,'') + ''''
          END
          ...

          On this last sample I assume this will allow the optimizer to use the right index. But will the optimization would have to be done on every request? Is there a more elegant way to write this sort of queries while keeping maximum efficiency ?

          modified on Tuesday, August 16, 2011 11:34 AM

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #5

          Just curious to know, what makes you think that SQL Server will not use indexes for this query?

          SELECT TOP 100 * FROM Contacts WHERE
          (@Reference IS NULL OR Contacts.Reference = @Reference)
          AND (@DateOfBith IS NULL OR Contacts.DateOfBith = @DateOfBith)
          AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
          AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)

          P 1 Reply Last reply
          0
          • L Lost User

            Just curious to know, what makes you think that SQL Server will not use indexes for this query?

            SELECT TOP 100 * FROM Contacts WHERE
            (@Reference IS NULL OR Contacts.Reference = @Reference)
            AND (@DateOfBith IS NULL OR Contacts.DateOfBith = @DateOfBith)
            AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
            AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)

            P Offline
            P Offline
            Pascal Ganaye
            wrote on last edited by
            #6

            I am trying to stay database agnostic on this, I might be wrong on some databases. When you write the query below in a stored procedure, the database engine will typically build an execution plan for it.

            SELECT TOP 100 * FROM Contacts WHERE
            (@Reference IS NULL OR Contacts.Reference = @Reference)
            AND (@DateOfBith IS NULL OR Contacts.DateOfBith = @DateOfBith)
            AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
            AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)

            At that time it will still have to decide which index to use. Whatever choice it makes at compile time is doomed to be wrong. The optimizer can't choose the right index, because there is not right index. It would have to choose the index that is right with a set of given parameters. Even if it were to delay the choice till the first use of the procedure, then the second use might be done with different parameters and another index should be used. From what I can see in MS SQL2008, the execution plan is invariably a Clustered Index Scan. Index Scan meaning that it will read the entire index, from A to Z. This can be terribly slow on a very large table. If the query is dynamic however then the query becomes a lot simpler.

            SELECT TOP 100 * FROM Contacts WHERE
            (Reference = @Reference)
            AND (DateOfBith = @DateOfBith)

            The 'OR' disappear. The execution plan then is a lot better. Typically it will use an Index Seek which is only a few disk access and not a full read. If you use SQL Server Management Studio you can check that using the 'Include Actual Execution Plan' button.

            P 1 Reply Last reply
            0
            • P Pascal Ganaye

              The answer above is valid in C#. My good collegue and friend James H gave me some hints on how to fix it in T-SQL.

              DECLARE @ContactName NVARCHAR(255)
              DECLARE @Reference NVARCHAR(255)
              DECLARE @sql NVARCHAR(255)

              SET @sql = 'SELECT TOP 100 FamilyName,Reference FROM Contact C WHERE 1=1'

              -- SET @ContactName = 'Smith'
              -- SET @Reference = 'A%'

              IF NOT @ContactName IS NULL
              SET @sql = @sql + ' AND C.ContactName = @ContactName'
              IF NOT @Reference IS NULL
              SET @sql = @sql + ' AND C.Reference LIKE @Reference'

              EXECUTE sp_executesql
              @sql,
              N'@FamilyName NVARCHAR(255), @CoName NVARCHAR(255)',
              @ContactName, @Reference

              This approach is much better for several reasons: - no quoted parameters, the parameters are passed by name so there is no possible SQL injection - because the query do not change on each request we can assume it will be optimized and cached

              P Offline
              P Offline
              PIEBALDconsult
              wrote on last edited by
              #7

              Pascal Ganaye wrote:

              we can assume it will be optimized and cached

              As I understand it, in SQL Server every statement is cached, and may get pushed out of the cache. I assume that this way will also cause the dynamic SQL to be cached separately from the main SQL. So I don't think there's any appreciable savings over the non-stored-procedure way.

              M 1 Reply Last reply
              0
              • P Pascal Ganaye

                I am trying to stay database agnostic on this, I might be wrong on some databases. When you write the query below in a stored procedure, the database engine will typically build an execution plan for it.

                SELECT TOP 100 * FROM Contacts WHERE
                (@Reference IS NULL OR Contacts.Reference = @Reference)
                AND (@DateOfBith IS NULL OR Contacts.DateOfBith = @DateOfBith)
                AND (@ContactName IS NULL OR Contacts.ContactName = @ContactName)
                AND (@ContactType IS NULL OR Contacts.ContactType = @ContactType)

                At that time it will still have to decide which index to use. Whatever choice it makes at compile time is doomed to be wrong. The optimizer can't choose the right index, because there is not right index. It would have to choose the index that is right with a set of given parameters. Even if it were to delay the choice till the first use of the procedure, then the second use might be done with different parameters and another index should be used. From what I can see in MS SQL2008, the execution plan is invariably a Clustered Index Scan. Index Scan meaning that it will read the entire index, from A to Z. This can be terribly slow on a very large table. If the query is dynamic however then the query becomes a lot simpler.

                SELECT TOP 100 * FROM Contacts WHERE
                (Reference = @Reference)
                AND (DateOfBith = @DateOfBith)

                The 'OR' disappear. The execution plan then is a lot better. Typically it will use an Index Seek which is only a few disk access and not a full read. If you use SQL Server Management Studio you can check that using the 'Include Actual Execution Plan' button.

                P Offline
                P Offline
                PIEBALDconsult
                wrote on last edited by
                #8

                Pascal Ganaye wrote:

                I am trying to stay database agnostic

                Then don't use stored procedures; not all databases support them.

                (@ContactType IS NULL OR Contacts.ContactType = @ContactType)

                How about

                Contacts.ContactType = COALESCE(@ContactType,Contacts.ContactType)

                :-D (Not that that is necessarily agnostic either. :sigh: )

                1 Reply Last reply
                0
                • P PIEBALDconsult

                  Pascal Ganaye wrote:

                  we can assume it will be optimized and cached

                  As I understand it, in SQL Server every statement is cached, and may get pushed out of the cache. I assume that this way will also cause the dynamic SQL to be cached separately from the main SQL. So I don't think there's any appreciable savings over the non-stored-procedure way.

                  M Offline
                  M Offline
                  Mycroft Holmes
                  wrote on last edited by
                  #9

                  PIEBALDconsult wrote:

                  So I don't think there's any appreciable savings over the non-stored-procedure way

                  This really hurts but that is my understanding as well. This one of the areas where sql strings from the business layer are actually easier to do!

                  Never underestimate the power of human stupidity RAH

                  P 1 Reply Last reply
                  0
                  • M Mycroft Holmes

                    PIEBALDconsult wrote:

                    So I don't think there's any appreciable savings over the non-stored-procedure way

                    This really hurts but that is my understanding as well. This one of the areas where sql strings from the business layer are actually easier to do!

                    Never underestimate the power of human stupidity RAH

                    P Offline
                    P Offline
                    PIEBALDconsult
                    wrote on last edited by
                    #10

                    Mycroft Holmes wrote:

                    sql strings from the business layer are actually easier to do

                    I think that's true all the time. Using stored procedures requires that you write the stored procedure and still you have to write some SQL (an EXEC statement) in the code and add the appropriate parameters, and error handling, etc. Keeping it all in code simplifies things -- everything in one place, the SQL won't change or disappear mysteriously.

                    M S 2 Replies Last reply
                    0
                    • P PIEBALDconsult

                      Mycroft Holmes wrote:

                      sql strings from the business layer are actually easier to do

                      I think that's true all the time. Using stored procedures requires that you write the stored procedure and still you have to write some SQL (an EXEC statement) in the code and add the appropriate parameters, and error handling, etc. Keeping it all in code simplifies things -- everything in one place, the SQL won't change or disappear mysteriously.

                      M Offline
                      M Offline
                      Mycroft Holmes
                      wrote on last edited by
                      #11

                      We've already had this conversation!

                      Never underestimate the power of human stupidity RAH

                      P 1 Reply Last reply
                      0
                      • M Mycroft Holmes

                        We've already had this conversation!

                        Never underestimate the power of human stupidity RAH

                        P Offline
                        P Offline
                        PIEBALDconsult
                        wrote on last edited by
                        #12

                        Others may have missed it. :-D

                        1 Reply Last reply
                        0
                        • P PIEBALDconsult

                          Mycroft Holmes wrote:

                          sql strings from the business layer are actually easier to do

                          I think that's true all the time. Using stored procedures requires that you write the stored procedure and still you have to write some SQL (an EXEC statement) in the code and add the appropriate parameters, and error handling, etc. Keeping it all in code simplifies things -- everything in one place, the SQL won't change or disappear mysteriously.

                          S Offline
                          S Offline
                          S Douglas
                          wrote on last edited by
                          #13

                          PIEBALDconsult wrote: everything in one place, the SQL won't change or disappear mysteriously. You have some odd database issues if your procs are disappearing :)


                          Common sense is admitting there is cause and effect and that you can exert some control over what you understand.

                          P 1 Reply Last reply
                          0
                          • S S Douglas

                            PIEBALDconsult wrote: everything in one place, the SQL won't change or disappear mysteriously. You have some odd database issues if your procs are disappearing :)


                            Common sense is admitting there is cause and effect and that you can exert some control over what you understand.

                            P Offline
                            P Offline
                            PIEBALDconsult
                            wrote on last edited by
                            #14

                            Just one, on SQL Server 6 on WinNT, about ten years ago -- it simply disappeared occasionally. I see no reason to trust them when there are better solutions now.

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups