Assigning random lives: an algorithm

DQNOK

I need to take an existing populations of items (say for example, light bulbs, each bulb having a different number of existing hours already on it) and randomly assign an "expected life" to each item. This assumes I already have a valid PDF (Probability Density Function) from which to assign expected lives for NEW items. The problem is that the items in my population are not new. If I randomly generate a life from the PDF expecting to assign that life to an existing item, and that life happens to be less than the current ACTUAL life of the item, I'm hosed! I can't just ignore that life because then I'm skewing the PDF used to assign the lives. I'm sure this is probably a no-brainer to a skilled statistician, but I don't recall anything like this in school, and haven't been able to find anything in books or in the limited Google'ing I've done. Following is an algorithm I came up with while pondering the problem. Any thoughts? ----------------------------------------- Here is a procedure for assigning "expected lives" to an existing population, given the PDF parameters for lives of "new" parts. Here's the scenario. We have a (assumed or tested) PDF for how long NEW parts last. We have an existing population of parts that have accrued a given number of hours. We want to use the PDF parameters to randomly assign expected lives to all the items in the existing population. Given the existing population's age, we should be able to "overlay" this with the "known" PDF and determine how many of the initial population have already failed. Here's an example: Suppose we have a light bulb that has made it to the 99'th percentile of life. Since only 1% of bulbs make it to this point, and we've got that bulb in our hands, clearly 99 other bulbs have already failed, or, there must have been an initial population of 100 light bulbs. Thus, by having one light bulb at the 99'th percentile, we can infer that our initial population must have been 100. Now suppose that we have two bulbs, both at the 99'th percentile of life. Obviously our initial population was 200 bulbs. 200 * (1-0.99) = 2. Or, 200 = 2/(1.0 - 0.99) Or, 200 = 1/(1.0 - 0.99) + 1/(1.0 - 0.99) Basically, each non-failed part infers its own initial population of parts that must have existed in order to produce this single non-failed part. Tn = Total of all initial populations = sum over all parts 'i'( 1/(1 - probability of past failure(age of part 'i'))) Based on the population PDF parameters, we

Alan Balkany

"I need to take an existing populations of items (say for example, light bulbs, each bulb having a different number of existing hours already on it) and randomly assign an "expected life" to each item." For an existing item that has lived n time units, its remaining life is described by a SUBSET of the original PDF. I.e., by living n time units, it's eliminated all the cases from the original PDF that are shorter than n time units. So, to get an expected life, you take the average of this subset of the original PDF. I.e. you average all the lifespans of the PDF that are at least n time units. (This is a calculated expectation -- not a randomly-assigned expected life.)

DQNOK

Thanks! I still need to put a random spin on it though, since I'm doing a Monte Carlo simulation. That is, using the "calculated expectation" life from the algorithm you describe above would in-fact give the "expected" life (just like using the mean life is the best expectation for the life of a light-bulb), but it doesn't capture the "spread"ness of what is actually going to happen. If I was dealing exclusively with new parts, I still wouldn't merely assign the mean life to the anticipated part-life. I would use the PDF and randomly assign the lives (remember, this is a Monte Carlo simulation). It's clear enough how to do it for NEW parts, I'm just not clear on how to do it for USED parts; although I THINK my algorithm may capture it. By the way; would your answer be covered in an "Elements of Statistics" book? I have a "Statistics for Engineering and the Sciences" book, and either it's not in there, or I just don't know what I'm looking for and am simply overlooking it. Thanks again for the help. David

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

DQNOK

OK, you've got me thinking... The PDF appropriate for a given used part is the subset of the original PDF that is (thinking graphically) "to the right" of that part's age. BUT, this part of the graph doesn't have unit area (the area under this part of the curve is no longer unity). But even that doesn't seem such a large handicap; just normalize it by the area that IS there. If I could somehow mathematically model this "left-truncated" PDF, I could -- as you alluded to in your response, use it to assign a random life. As you pointed out, taking the mean of the remainder of the original PDF gives the best EXPECTED life; I just need to figure out how to use that remainder to assign a random life. Hmmmmm... Maybe it's not so hard. I'll have to think about it a while.

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

Alan Balkany

For a used part you just do the same thing with a subset of the PDF. (Just like you use the whole PDF for a new part.) So, for example, if you have bell-shaped PDF with mean n, and your item has existed for n time units, you would make a random selection from the upper half of the bell-shaped curve. Although this has elements of both probability and statistics, I would consider this slightly more probability than statistics.

DQNOK

Right! Now I just have to figure how to do that.

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

DQNOK

Do you know the algorithm for using the rand() function (which generates a uniform PDF) to generate values within a specified PDF? I've thought thru it before, but didn't write it down. Also, without having a parameterized function for the PDF (I doubt it's even possible to come up with a formula for a left-truncated PDF like we're talking about -- plus, it would change for each item) this seems hard. UNLESS! I don't model the PDF with a parameterized function, and instead model it via a table of values obtained from the original PDF formula. That should be easy enough to normalize by just numerically integrating the remainder of the curve. Well, maybe this won't be so hard. (Famous last words...)

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

DQNOK

DQNOK wrote:

Do you know the algorithm for using the rand() function (which generates a uniform PDF) to generate values within a specified PDF?

I answered my own question. For a PDF called 'y(x)' (where x is the random varaible), and a computer library function rand() that returns a random value in the range [0, 1) from a uniform PDF, we seek to map rand's return value (call it rx) to the correct x. Effectively, we are looking for x such that the area under the curve y(x) LEFT OF x is equal to the area under the uniform PDF left of rx. But since rand produces values within the unit interval [0,1), rx IS the area under the curve left of rx. So, we seek x such that: rx = integral from -infinity to x of y(x) Exactly HOW we evaluate the integral (i.e. solve for x) is an implementation detail that depends on how y(x) is defined. If the integral is already given via a formula, we just use a standard non-linear scalar solver, like a Newton-Raphson. It not, then perhaps I would just use a stepping approach, something like:ndx = getBestStartingIndex(rx); x = X_STARTS[ndx]; // X_STARTS[ndx] <= rx < X_STARTS[ndx+1] area = AREAS[ndx]; // precomputed areas based on x. old_y = y(x); while( area < rx ) { x += dx; new_y = y(x); area += dx * (old_y + new_y)/2 ; //trapezoid rule old_y = new_y; } return x;
Which can be polished for better precision, time-performance etc.

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

modified on Thursday, April 17, 2008 9:28 AM