Best fit polynomial curve to non-continuous data points [Solved]
-
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
-
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
-
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
Hi Ian, I don't think there are any closed formula's for what you are trying. What I would recommend is this, inspired by Newton-Raphson: 1. have some values a1,b1,c1,d1 and the corresponding value of your error function e1. 2. slightly change a1 (say by 1%) and recalculate e; find the "a-sensitivity" as (change in e)/(change in a) 3. same for b1, then c1, then d1 4. now choose new values a2,b2,c2,d2 based on the four sensitivities, maybe like so: a2=a1+coef*e/"a-sensitivity" where maybe coef is one. then iterate until convergence is reached (i.e. sufficiently accurate); warning: this might diverge right away, if so try smaller values of coef (you could halve it in each consecutive try). FWIW: with more-or-less random data, you'll never find a good fit with a single polynomial. What one normally does is a local approximation, which applies in a small region only. That is similar to a font being drawn as a sequence of Bezier curves; it pays of to have several simple Bezier curves, rather than trying to have a single very complex one (which in general will not be good enough). :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
Kind of the opposite of where I'm going here... The goal isn't to pass through every point, but to give more of a trend-line. Generally going to be working with a few hundred data points, but only looking for maybe a second or third-degree function in most cases. Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Kind of the opposite of where I'm going here... The goal isn't to pass through every point, but to give more of a trend-line. Generally going to be working with a few hundred data points, but only looking for maybe a second or third-degree function in most cases. Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Hi Ian, I don't think there are any closed formula's for what you are trying. What I would recommend is this, inspired by Newton-Raphson: 1. have some values a1,b1,c1,d1 and the corresponding value of your error function e1. 2. slightly change a1 (say by 1%) and recalculate e; find the "a-sensitivity" as (change in e)/(change in a) 3. same for b1, then c1, then d1 4. now choose new values a2,b2,c2,d2 based on the four sensitivities, maybe like so: a2=a1+coef*e/"a-sensitivity" where maybe coef is one. then iterate until convergence is reached (i.e. sufficiently accurate); warning: this might diverge right away, if so try smaller values of coef (you could halve it in each consecutive try). FWIW: with more-or-less random data, you'll never find a good fit with a single polynomial. What one normally does is a local approximation, which applies in a small region only. That is similar to a font being drawn as a sequence of Bezier curves; it pays of to have several simple Bezier curves, rather than trying to have a single very complex one (which in general will not be good enough). :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
Hmm, didn't think of the sensitivity thing... Giving that a shot now.
FWIW: with more-or-less random data, you'll never find a good fit with a single polynomial. What one normally does is a local approximation, which applies in a small region only. That is similar to a font being drawn as a sequence of Bezier curves; it pays of to have several simple Bezier curves, rather than trying to have a single very complex one (which in general will not be good enough).
It's not actually random data... It's graphing related values, so it should roughly cluster around y=x unless there's another factor (Which is what I'm trying to illustrate). Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Hmm, didn't think of the sensitivity thing... Giving that a shot now.
FWIW: with more-or-less random data, you'll never find a good fit with a single polynomial. What one normally does is a local approximation, which applies in a small region only. That is similar to a font being drawn as a sequence of Bezier curves; it pays of to have several simple Bezier curves, rather than trying to have a single very complex one (which in general will not be good enough).
It's not actually random data... It's graphing related values, so it should roughly cluster around y=x unless there's another factor (Which is what I'm trying to illustrate). Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)If you suspect some periodicity, I'd suggest you first subtract x, so according to what you said the average becomes zero, and the first-order approximation would be the x-axis itself. Then consider a Fourier analysis. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
If you suspect some periodicity, I'd suggest you first subtract x, so according to what you said the average becomes zero, and the first-order approximation would be the x-axis itself. Then consider a Fourier analysis. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
Not a cyclic function, really... Bond valuations are a bit odd, so hard to explain... But dropping it to the X-axis does kinda help the initial approximation. Still needs a lot more work (Got it to approximate the first degree, and working on expanding from there), but the sensitivity thing is working like a charm. Thanks :) Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Not a cyclic function, really... Bond valuations are a bit odd, so hard to explain... But dropping it to the X-axis does kinda help the initial approximation. Still needs a lot more work (Got it to approximate the first degree, and working on expanding from there), but the sensitivity thing is working like a charm. Thanks :) Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)you're welcome. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
Assuming that standard numerical methods such as simplex aren't going to work here, and given that you do alreaduy have a fitness function available and a few coefficients to tweak, I'm wondering if perhaps a genetic algorithm or something like simulated annealing would work here?
-
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
Thanks for the input, guys... Got me thinking along the right lines, and that got me to the right google search... I was searching for things like "curve fitting" and "best fit algorithm"... Turned out the 'proper' term was "Linear Regression"... And that got me here: An Algorithm for Weighted Linear Regression[^] Figures, the solution would be right here on CP :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
I'm sure there's a relatively-straightforward way to do this, but my research so far has been a bit overwhelming... Heavy statistics and calculus have never been one of my strengths... Basically, I need to calculate a best-fit curve on a scatter graph... Points are not evenly spaced on either axis, and there's no cost function to work with (These are derivations of stock/bond price histories)... Just raw (X,Y) values. I've been trying to work out an algorithm that'll start with a second-degree polynomial function, iterate through and minimize the difference, and then jump to third-degree if it's not within tolerance (And so on, up to a maximum degree). Juggling the data around is something I can do blindfolded, but there's one thing I can't figure out... Given the polynomial coefficients ((a,b,c,d) -> y = a + bx + cx^2 + dx^3) and a function to determine the total difference across the curve (Or the total square difference, or whatever's needed), how do I adjust the coefficients for the next pass? I mean, how do I choose new values? I was originally going to try to fix the highest-degree coefficient first, then drill down to the zeroth-degree one (I have a root-solving algorithm that I use for yield calculations - Was going to adapt that), but then I looked at it again with the proper amount of caffeine, and realized that solving one coefficient at a time wasn't going to work :) SOLVED: See my reply below... Linear/Polynomial Regression
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)modified on Tuesday, May 24, 2011 9:41 AM
I'm surprised with all those answers, no one gave you the right approach: Polynomial Regression: http://en.wikipedia.org/wiki/Polynomial_regression[^] It will give you a best-fit quadratic or cubic equation (or even higher orders if you want). The Wikipedia link defines it, but the explanation there isn't very intuitive. The basic idea is you get an equation for the error terms, take the derivative, and set it to zero. Solving that gives you the equation with the minimum error.
-
I'm surprised with all those answers, no one gave you the right approach: Polynomial Regression: http://en.wikipedia.org/wiki/Polynomial_regression[^] It will give you a best-fit quadratic or cubic equation (or even higher orders if you want). The Wikipedia link defines it, but the explanation there isn't very intuitive. The basic idea is you get an equation for the error terms, take the derivative, and set it to zero. Solving that gives you the equation with the minimum error.
Well, to be fair, the answers got me thinking along the right lines, and led me to the right terms, which brought me right back one of Walt's articles on CP... See my above post :) Marked the original as solved
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Well, to be fair, the answers got me thinking along the right lines, and led me to the right terms, which brought me right back one of Walt's articles on CP... See my above post :) Marked the original as solved
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)But if your data set naturally has a curve, a linear model will be inaccurate.
-
But if your data set naturally has a curve, a linear model will be inaccurate.
Well it's multiple linear regression... It lets me specify the degree. Using Walt's library, it's looking perfect... Using third-degree (x^3, so a possible S-shape in the curve), and releasing the update next week.
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)