Table join performance - the saga continues...
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
I'll take that $20 and raise it. My guess is that the CTO had "a bad experience" early in his career with databases and joins, so much so that his view is prejudiced with obsolete knowledge. Run into it time and again with other technical items.
Charlie Gilley Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
-
I had a similiar discussion once with our Oracle DBA. I was asked to run queries on a views he had over tables rather then the table directly as the view will not impact on the tables performance...:confused:
If it was MS SQL Server, it wouldn't surprise me if his views all had the
WITH (NOLOCK)
hint on every table. But AFAIK, Oracle rather sensibly doesn't have that option.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Don't argue about it - measure it! Measure the performance of his solution, loading the data from all tables into memory and then manipulating it with imperative code. And then measure the performance of a query that joins the tables in the database. If his version is faster, then there's something seriously wrong with the query, or with the database. :)
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
Don't argue about it - measure it! Measure the performance of his solution, loading the data from all tables into memory and then manipulating it with imperative code. And then measure the performance of a query that joins the tables in the database. If his version is faster, then there's something seriously wrong with the query, or with the database. :)
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
This was my thought too. Seems like a no-brainer. However if there are ugly politics at play or people's egos getting bruised one must weight his own's benefit it interfering. It sounds like Marc may be doing this by choosing to let this guy wallow in his own shit.
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Well, of course joining two tables will take some time. It doesn't matter where you join them -- in the DB or in the Python code --- it's going to take some time. However, it is always better to let the DB do what it is good it --- working with data. I'm always amazed the number of developers who don't recognize that simple divide: - EVERYTHING related to data (querying, filtering, selecting) should be done in the Database. - NOTHING else should be.
Truth, James
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Sounds like you have a combination of emotional maturity and political savvy. I mean, you could wrote a few queries that prove him wrong. And given 10 minutes, you could explain the data structures underlying a relational database in sufficient detail to show when joins will be expensive, and when they won't. Often enough, though, being right (and proving it) hurts you. Definitely better to know when to drop it and move on. :)
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Marc, Column counts are a huge issue. I can't stand "Select *" for anything more than debugging! I was called in to help fix an Oracle Speed Issue. The result was a query that brought back 10,000 times less data! They were doing "Select *" on a 200 column table, joined with 2 other tables. Where they needed 12 columns. And they were looking at the "full table scans". (imagine 12 columns that fit on a 80 character wide screen. Imagine retrieving at least 4 varchar(4000) fields with each row. Fields that NOBODY wanted, NOBODY needed. But it was easier to write * instead of determine the 12 fields up front. OMG.) Of course you are going to do full table scans. You ASKED for every column of a LOT of rows. In fact, forcing an index scan, caused the query to run slower, as I showed them. But reducing the column count of the data coming back, and measuring the data size that came back to the program, they SAW the bottleneck. I am not all BCNF or 3NF. Denormalization makes sense at times. The "It Depends" answer is simply this: You should create the DB and the queries you are going to use to get to ALL of the data and reports before you build the system. If the queries you run 100,000 times a day require 15 joins, and the queries that you run once a month are trivial, you may have optimized the wrong end of the equation. Finally, creating materialized views, or reporting snapshots are QUITE reasonable things to do. The correct format for data while processing MAY NOT be the correct format for data while reporting. In an OLTP type system, constantly joining across many tables that are constantly being updated will produce HORRIBLE performance if writes block reads, or UNDO/REDO contention exists. Thanks for that link. A great article. I saved it!
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
My fearless leader was/is a DBA. We were talking about indices and I got some interesting lessons I'd never have picked up on my own. When a compound index makes sense, how you write your select statement can use or waste all the value-added speed of the index. It's his job to know the nuances of that; it's my job to create the connections and interfaces for the users in whatever language or language mix makes sense. As I noted in posts at other times, we'd hash and thrash trough possible solutions, give the whys and wherefores, and then pick the best solution we can come up with, sans egoism. When possible, with the long view. Golden days.
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you are seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
I run into that kind of nonsense quite more often than I'd like. My answer is now "Show me the query plan or no deal". 99.9% of the time an index is missing somewhere.
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
I had to deal with this same problem until I was asked to rewrite an application in which all of the joins were done in Visual Basic code, rather than in the database. I think the problem was that my boss didn't understand SQL that well and did not know how to write the queries with joins. So, I used JOINs in views and greatly simplified the VB code, even though I had to explain the SQL many times. I had one problem in that our views were in a local database and the tables were in a remote database, requiring that full tables had to be moved across the wire to be joined in the local database. My boss for a long time insisted that I should join the data in the VB code and that somehow that would be faster. I fixed that by joining sub-queries that used column specifications and WHERE clauses to greatly reduce that amount of data that needed to be transferred across the wire before the joins were made. The difference in speed was huge. Of course, that made the SQL more complex, but my boss has finally accepted it after realizing that the VB code would be much more complex and time consuming both to write and execute.
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
It depends on your 'relationship' with the data. If you are creating reports for mostly non-changing data, denormalized tables makes sense to you. If you are a developer charged with maintaining data integrity, flexibility, and performance (in that order) then you would certainly favor normalized structures. At any rate, given the speed of things nowadays, the difference between join and no-join would be practically imperceptible...at least on a small scale. :)
"Go forth into the source" - Neal Morse
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
-
So I avoided getting into an argument with the CTO about the performance of table joins. The nice thing about communication channels like Slack is that you simply don't need to reply. His contention is that table joins are "hugely expensive", particularly for reporting. My contention is of course that a well designed database will not have this issue. I came across this[^] interesting discussion. But the bottom line is, I wonder why he has this attitude, particularly when his reports are dog slow, and the page request to the server times out if you ask for a report that spans more than a day (maybe a week.) Looking at the code, no wonder, because the reports seem to be generated almost entirely in Python, without leveraging table joins on the server itself at all! Granted, there are certainly times when a very specialized report might benefit from denormalization, but for Pete's sake, these are basic transaction detail and transaction total queries. There are so many things I disagree with the CTO about. I must have reached some level of emotional maturity before taking this job because I'm able to not get into an argument, I just say "whatever, dude" and move along. Marc
Imperative to Functional Programming Succinctly Contributors Wanted for Higher Order Programming Project! Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
No worries. A former "CTO" of mine blew up when I suggested changing the font instead of redesigning a report for a new column. (He didn't grasp (initially) that one could "change" fonts on laser output ... I think he still had 1403 impact printers on the brain. I thought of him as "VP of the mainframe".) Someone once told YOUR "CTO" that a particular job is running long because of "joins"... so therefore, "all joins are bad".
-
To play devil's advocate, learning how the database engine and query optimizer work can be daunting to those who have never coded or haven't had to code in years/decades such as your run of the mill CTO. Like a lot of developers that I have known, they do not realize the huge performance impact of improper table design, like making the primary key a string or Guid.
if (Object.DividedByZero == true) { Universe.Implode(); } Meus ratio ex fortis machina. Simplicitatis de formae ac munus. -Foothill, 2016