#### ADMIN: Re: Re: Freight car distribution

laramielarry <ostresh@...>

Hi Folks

A week or so ago I posted a message about a "random train" Excel
(message #75066). Briefly, the spreadsheet generates a list of 40
boxcars chosen at random from a universe of cars which approximates
the U.S. boxcar fleet ownership in 1949. Experienced Excel users can
using whatever universe they would like.

In message #75229 I described how I automated the spreadsheet by
running it 100,000 times. That is, it created 100,000 randomly
generated car lists, with 40 cars per list. The main purpose of this
simulation was to test whether the random train spreadsheet was
operating correctly; if it was, then over the long run, the average
proportions of the randomly generated cars should tend to the
proportions of the universe (they did). The simulation also recorded
the maximum number of cars generated during any of the 100,000
iterations (for each road). For example, the number of NYC cars in
the three lists in message #75066 was 4, 1 and 6. During the
simulation, there was at least one car list with 14 NYC cars. (In a
list of 40 cars that is proportional to 1949 national averages, one
should expect 4 from the NYC.)

Subsequently, I have refined the simulation so as to record the
entire distribution of cars for each road during a simulation of
100,000 car lists. For example, the national proportion of New Haven
boxcars was less than 1% in 1949; most random trains of 40 cars would
not have any NH cars, but sometimes there will be one or more. The
next list shows the frequency distribution of 0, 1, 2, … NH cars
generated by the simulation of 100,000 car lists (71,508 car lists

0___71,508
1___24,068
2___3,963
3___421
4___37
5___3
These numbers can be converted to probabilities by dividing by
100,000. Thus the probability of a car list with 40 cars and none
from the NH is .715, 1 car = .241, 2 cars = .040, etc.

After examining the results of this simulation, it seemed to me that
the process of random car selection was much like the ball and urn
ago: An urn has some red and white balls of a known proportion.
Reach in and grab a ball; if red, then record it as a "success", and
if white as a "failure"; replace the ball then repeat the process for
a certain number of times, say 40. What is the probability of 0
successes? Exactly 1 success? Exactly 2, 3, … ? These
probabilities are given by the binomial distribution. The next list
shows the binomial distribution for 0, 1, 2, … (multiplied by
100,000) for 40 trials and a "probability of success on each
trial".0084 = .84% (this is the national proportion of NH boxcars in
1949 that I used for my simulation).
0___71,483
1___24,098
2___3,960
3___423
4___33
5___2

Note the close correspondence of the simulation and the binomial
distributions in the two lists. This and the examination of other
simulation results convinced me that my process of random car
selection could be effectively modeled by the binomial distribution
(I also compared the Poisson distribution). If anyone would like a
copy of my simulation results, contact me off list.

To use the binomial distribution, all you need to specify is the
number of trials (read boxcars in a train) and the probability of
success on each trial (read proportion of cars of a particular
ownership or type). The proportions of cars can be national,
regional, or any other proportion you wish to use. You can make the
calculations with the aid of tables, programs such as Excel, or any
of several on-line calculators.

I should point out a key difference between my simulation model and
the real world: Just like cards, a train "has memory". This means
that once a car is removed from the population and placed in the
train, it cannot be placed again in the same train. Once the first
NH car is chosen with a probability of success on each trial of
6,012 / 719,349 (NH boxcars divided by national boxcars, 1949) the
probability of success on each trial for the next one changes to
6,011 / 719,348. This is the difference between sampling with
replacement (my simulation) and sampling without replacement (real
world). The binomial distribution also assumes sampling with
replacement.

One use for the binomial distribution is to test real world examples
for randomness. Again reaching back many years to my statistics
classes, I am reminded of the "null hypothesis": A researcher
discovers something interesting and suspects it is not merely
random. The null hypothesis is that it IS random, while the
alternative hypothesis is that it is not. The null hypothesis is
assumed to be true unless the researcher is 95% or 99% confident that
it is false (these are typical confidence levels).

The UP train with the large number of SP boxcars is an example. My
understanding is that this train had some 90 boxcars, 36 of which
were SP. In order to calculate the binomial distribution, we need to
know the number of "trials" (i.e., cars in the train, say 90) and
the "probability of success on each trial" (i.e., the proportion of
SP cars in the national fleet, say 4% = .04). From this you can find
the probability of a train with exactly 0, 1, 2, …, 36, … SP cars.

Or maybe not: It turns out that the probability of 36 or more cars
is so low that Excel cannot calculate it. For example a 90 boxcar
train with a "mere" 20 or more SP boxcars would occur only once in
every 19.5 billion trains. Conclusion: This train could not have
occurred by chance alone. (A friend of mine who has lived in Laramie
all his life – in particular the 1940s and 50s – describes these cars
as a "transfer run".)

Suppose that the 4% number is wrong; Tim Gilbert's data lists 4.9% SP-
Pac ownership in 1956. Let's be generous and make it 5%. Then a 90
car train would have 20 or more SP boxcars once in every 356 million
trains. (Tim's data are at "4060totalboxcarsUSownership.xls" in the
files section of this list.)

Rather than using the proportion of the national fleet, how about
giving more "weight" to SP cars on the UP because of the "connection"
between the two railroads, or because of nearness or whatever? Let's
say we "weight" the SP cars by a factor of two (Mike Brock suggests a
weight of 1.5). To apply the desired weight, multiply it by the
national proportion: e.g., 2 * 5% = 10%. Using a "probability of
success on each trial" of 10% and a 90 boxcar train we find that
Excel still cannot calculate it because the probability is too low (a
train with "only" 30 or more SP boxcars would occur once every 3
billion trains). Conclusion: No reasonable weighting will reproduce
the train actually observed – we must reject the null hypothesis.
That is, the observed train composition is not the result of chance
alone.

I suspect that if we begin applying the binomial distribution to real
world data we will find many cases in which we should reject the null
hypothesis of random car assignment. This does not imply that the
random assignment model should be ignored, of course; it simply means
that other factors (real world consists, photos, personal choice,
etc.) should also be considered. For example, we may find cases such
as transfer runs or large shippers where it makes sense to treat
blocks of cars as a unit and to assign these blocks, rather than the
individual cars, to trains.

Best wishes,
Larry Ostresh
Laramie, Wyoming

Join main@RealSTMFC.groups.io to automatically receive all group messages.