Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Newbie GROUP BY question

Newbie GROUP BY question

Scheduled Pinned Locked Moved Database
questiondatabasesql-serversysadmin
6 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    moredip
    wrote on last edited by
    #1

    Hi, I'm currently working on some SQL stuff as part of a project I've been assigned to, and I have a newbie question regarding the GROUP BY clause in a SELECT statement. I have a table, Activity_Session_Summary, that contains information about the results of the online quizzes that our product administers. Each tuple represents a summary of a user's results for a specific attempt at a specific quiz. Its columns include 'ActivityID', 'PercentCorrect' and 'WhenCompleted'. What I need is a SELECT statement that returns the PercentCorrect value for the most receently completed quiz attempt, for each activity. So far I have

    SELECT ActivityID, MAX(WhenCompleted) FROM Activity_Session_Summary
    WHERE StudentID = '%s' AND ActivityID IN ( %s )
    GROUP BY ActivityID

    where the %s's are format placeholders. This statement almost does what I need, but obviously doesn't give me the PercentCorrect. However I can't just add PercentCorrect to the SELECT's column list, because SQL Server politely tells me "'PercentCorrect' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' Can anyone point out what I need to do to get this to work as required?

    C 1 Reply Last reply
    0
    • M moredip

      Hi, I'm currently working on some SQL stuff as part of a project I've been assigned to, and I have a newbie question regarding the GROUP BY clause in a SELECT statement. I have a table, Activity_Session_Summary, that contains information about the results of the online quizzes that our product administers. Each tuple represents a summary of a user's results for a specific attempt at a specific quiz. Its columns include 'ActivityID', 'PercentCorrect' and 'WhenCompleted'. What I need is a SELECT statement that returns the PercentCorrect value for the most receently completed quiz attempt, for each activity. So far I have

      SELECT ActivityID, MAX(WhenCompleted) FROM Activity_Session_Summary
      WHERE StudentID = '%s' AND ActivityID IN ( %s )
      GROUP BY ActivityID

      where the %s's are format placeholders. This statement almost does what I need, but obviously doesn't give me the PercentCorrect. However I can't just add PercentCorrect to the SELECT's column list, because SQL Server politely tells me "'PercentCorrect' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' Can anyone point out what I need to do to get this to work as required?

      C Offline
      C Offline
      Colin Angus Mackay
      wrote on last edited by
      #2

      Okay - Before I answer your question I am going to do my usual pointer to this article on SQL Injection Attacks and How to Prevent Them[^] whenever I see code that could potentially be attacked. On to your question. What you need to do is take the statement you have already and use that as a subquery and have the table join onto itself via the subquery. It sounds more difficult than it is.

      SELECT main.ActivityID, main.PercentComplete, main.WhenComplete
      FROM Activity_Session_Summary AS main
      INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenComplete
      FROM Activity_Session_Summary
      WHERE StudentID = %s AND ActivityID IN ( %s )
      GROUP BY ActivityID) AS sub
      ON main.ActivityID = sub.ActivityID AND main.WhenComplete = sub.WhenComplete
      WHERE StudentID = '%s' AND ActivityID IN ( %s )

      I've left your format placeholders even though they represent a potential security risk. Your original query has now been made the subquery and as you can see, nothing has changed there except that I named the aggregate column so I could refer to it later. The main query selects the data that you want and joins onto the subquery. You will notice from the join that it is joined on the result of the aggregate as well as the ActivityID (I'm guessing that is a foreign key onto another table). Finally, your where clause is repeated on the main query, although it may not be needed - I don't know enough about your data model to make that determination. If it isn't needed it won't do any harm. Does this help?


      My: Blog | Photos | Next SQL Presentation WDevs.com - Open Source Code Hosting, Blogs, FTP, Mail and More

      M 2 Replies Last reply
      0
      • C Colin Angus Mackay

        Okay - Before I answer your question I am going to do my usual pointer to this article on SQL Injection Attacks and How to Prevent Them[^] whenever I see code that could potentially be attacked. On to your question. What you need to do is take the statement you have already and use that as a subquery and have the table join onto itself via the subquery. It sounds more difficult than it is.

        SELECT main.ActivityID, main.PercentComplete, main.WhenComplete
        FROM Activity_Session_Summary AS main
        INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenComplete
        FROM Activity_Session_Summary
        WHERE StudentID = %s AND ActivityID IN ( %s )
        GROUP BY ActivityID) AS sub
        ON main.ActivityID = sub.ActivityID AND main.WhenComplete = sub.WhenComplete
        WHERE StudentID = '%s' AND ActivityID IN ( %s )

        I've left your format placeholders even though they represent a potential security risk. Your original query has now been made the subquery and as you can see, nothing has changed there except that I named the aggregate column so I could refer to it later. The main query selects the data that you want and joins onto the subquery. You will notice from the join that it is joined on the result of the aggregate as well as the ActivityID (I'm guessing that is a foreign key onto another table). Finally, your where clause is repeated on the main query, although it may not be needed - I don't know enough about your data model to make that determination. If it isn't needed it won't do any harm. Does this help?


        My: Blog | Photos | Next SQL Presentation WDevs.com - Open Source Code Hosting, Blogs, FTP, Mail and More

        M Offline
        M Offline
        moredip
        wrote on last edited by
        #3

        Hi Colin, Wow! Thanks for such a rapid and comprehensive reply, I really appreciate it. I also appreciate your warning re: injection attacks. However, this project is for internal use with hardcoded input values, and will not be accessible in any way outside the machine it's run on. Famous last words? Well, if our security is so screwed that someone can access the script and modify the SQL statement then I can think of many easier ways for them to accomplish their evil ends. But I appreciate your concern! I think I understand the SQL you gave. I was hoping that there was a way to do this without subqueries (they make my head hurt a little at times), but suspected that they would be necessary. I think that the StudentID /is/ necessary in the main query. It's theoretically possible that two rows in the result set don't differ in ActivityID or WhenCompleted, but only by StudentID. Not likely, as it would involve two students completing the same activity within a second of each other (or whatever the resolution of DateTime is in SQL Server), but possible none the less. Again, thanks a lot Colin, you've really helped me out here, and hopefully helped me learn a little so I don't ask the same question in a different form in 2 months time! Cheers, Pete

        C 1 Reply Last reply
        0
        • M moredip

          Hi Colin, Wow! Thanks for such a rapid and comprehensive reply, I really appreciate it. I also appreciate your warning re: injection attacks. However, this project is for internal use with hardcoded input values, and will not be accessible in any way outside the machine it's run on. Famous last words? Well, if our security is so screwed that someone can access the script and modify the SQL statement then I can think of many easier ways for them to accomplish their evil ends. But I appreciate your concern! I think I understand the SQL you gave. I was hoping that there was a way to do this without subqueries (they make my head hurt a little at times), but suspected that they would be necessary. I think that the StudentID /is/ necessary in the main query. It's theoretically possible that two rows in the result set don't differ in ActivityID or WhenCompleted, but only by StudentID. Not likely, as it would involve two students completing the same activity within a second of each other (or whatever the resolution of DateTime is in SQL Server), but possible none the less. Again, thanks a lot Colin, you've really helped me out here, and hopefully helped me learn a little so I don't ask the same question in a different form in 2 months time! Cheers, Pete

          C Offline
          C Offline
          Colin Angus Mackay
          wrote on last edited by
          #4

          moredip wrote: I think that the StudentID /is/ necessary in the main query. It's theoretically possible that two rows in the result set don't differ in ActivityID or WhenCompleted, but only by StudentID. Not likely, as it would involve two students completing the same activity within a second of each other (or whatever the resolution of DateTime is in SQL Server), but possible none the less. Best to leave it in then. If it is taken out then it can be a nasty time-consuming bug to track down if that condition does happen. I know, I spent two days tracking something similar, which was why I was cautious about removing it without knowing more about your data. moredip wrote: thanks a lot Colin, you've really helped me out Your most welcome.


          My: Blog | Photos | Next SQL Presentation WDevs.com - Open Source Code Hosting, Blogs, FTP, Mail and More

          1 Reply Last reply
          0
          • C Colin Angus Mackay

            Okay - Before I answer your question I am going to do my usual pointer to this article on SQL Injection Attacks and How to Prevent Them[^] whenever I see code that could potentially be attacked. On to your question. What you need to do is take the statement you have already and use that as a subquery and have the table join onto itself via the subquery. It sounds more difficult than it is.

            SELECT main.ActivityID, main.PercentComplete, main.WhenComplete
            FROM Activity_Session_Summary AS main
            INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenComplete
            FROM Activity_Session_Summary
            WHERE StudentID = %s AND ActivityID IN ( %s )
            GROUP BY ActivityID) AS sub
            ON main.ActivityID = sub.ActivityID AND main.WhenComplete = sub.WhenComplete
            WHERE StudentID = '%s' AND ActivityID IN ( %s )

            I've left your format placeholders even though they represent a potential security risk. Your original query has now been made the subquery and as you can see, nothing has changed there except that I named the aggregate column so I could refer to it later. The main query selects the data that you want and joins onto the subquery. You will notice from the join that it is joined on the result of the aggregate as well as the ActivityID (I'm guessing that is a foreign key onto another table). Finally, your where clause is repeated on the main query, although it may not be needed - I don't know enough about your data model to make that determination. If it isn't needed it won't do any harm. Does this help?


            My: Blog | Photos | Next SQL Presentation WDevs.com - Open Source Code Hosting, Blogs, FTP, Mail and More

            M Offline
            M Offline
            moredip
            wrote on last edited by
            #5

            Hi again Colin, I modified the SQL you provided slightly from this: Colin Angus Mackay wrote:

            SELECT main.ActivityID, main.PercentComplete, main.WhenComplete
            FROM Activity_Session_Summary AS main
            INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenComplete
            FROM Activity_Session_Summary
            WHERE StudentID = %s AND ActivityID IN ( %s )
            GROUP BY ActivityID) AS sub
            ON main.ActivityID = sub.ActivityID AND main.WhenComplete = sub.WhenComplete
            WHERE StudentID = '%s' AND ActivityID IN ( %s )

            to this:

            SELECT main.ActivityID, main.PercentCorrect
            FROM Activity_Session_Summary AS main
            INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenCompleted
            FROM Activity_Session_Summary
            WHERE StudentID = '%s' AND ActivityID IN ( %s )
            GROUP BY ActivityID) AS sub
            ON main.ActivityID = sub.ActivityID AND main.WhenCompleted = sub.WhenCompleted
            WHERE main.StudentID = '%s'

            I just wanted to confirm with you that removing the final AND ActivityID IN ( %s ) makes sense. It seems to me that it's redundant, since the main.ActivityID = sub.ActivityID join will ensure that only ActivityID's in the supplied ( %s ) list will be considered. Is that correct? Cheers, Pete

            C 1 Reply Last reply
            0
            • M moredip

              Hi again Colin, I modified the SQL you provided slightly from this: Colin Angus Mackay wrote:

              SELECT main.ActivityID, main.PercentComplete, main.WhenComplete
              FROM Activity_Session_Summary AS main
              INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenComplete
              FROM Activity_Session_Summary
              WHERE StudentID = %s AND ActivityID IN ( %s )
              GROUP BY ActivityID) AS sub
              ON main.ActivityID = sub.ActivityID AND main.WhenComplete = sub.WhenComplete
              WHERE StudentID = '%s' AND ActivityID IN ( %s )

              to this:

              SELECT main.ActivityID, main.PercentCorrect
              FROM Activity_Session_Summary AS main
              INNER JOIN (SELECT ActivityID, MAX(WhenCompleted) AS WhenCompleted
              FROM Activity_Session_Summary
              WHERE StudentID = '%s' AND ActivityID IN ( %s )
              GROUP BY ActivityID) AS sub
              ON main.ActivityID = sub.ActivityID AND main.WhenCompleted = sub.WhenCompleted
              WHERE main.StudentID = '%s'

              I just wanted to confirm with you that removing the final AND ActivityID IN ( %s ) makes sense. It seems to me that it's redundant, since the main.ActivityID = sub.ActivityID join will ensure that only ActivityID's in the supplied ( %s ) list will be considered. Is that correct? Cheers, Pete

              C Offline
              C Offline
              Colin Angus Mackay
              wrote on last edited by
              #6

              moredip wrote: Is that correct? Yes, you are quite right. Since it is on the join already, it is redundant in the WHERE clause.


              My: Blog | Photos | Next SQL Presentation WDevs.com - Open Source Code Hosting, Blogs, FTP, Mail and More

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups