#### A Purpose For Frt Car Distribution Studies

Anthony Thompson <thompson@...>

Malcolm Laughlin wrote:
I'm OK with the average as the core datum, but recognizing that the average in the car distribution case may be at one side of a skewed distribution orm otherwise not a true average. Any average is better than zero or infinity as the starting point.
An average always means the same thing, Malcolm, though in a skewed distribution it will not lie at the same location as the most frequent data, which lay persons usually assume the average to be (and which it is in a symmetrical distribution). But I have no idea what a "true average" is.

Tony Thompson Editor, Signature Press, Berkeley, CA
2906 Forest Ave., Berkeley, CA 94705 www.signaturepress.com
(510) 540-6538; fax, (510) 540-1937; e-mail, thompson@...
Publishers of books on railroad history

Gene Green <bierglaeser@...>

--- In STMFC@..., Anthony Thompson <thompson@...> wrote:

An average always means the same thing, Malcolm, though in a
skewed distribution it will not lie at the same location as the most
frequent data, which lay persons usually assume the average to be
(and > which it is in a symmetrical distribution). But I have no idea
what a > "true average" is.

-Does- average always mean the same thing? I learned (correctly, I
hope) that "average" could be one of three things; mean, median or
mode. Usually when folks - those unencumbered by formal education -
say "average" they actually mean "arithmetic mean."

Gene Green
OitwTtoEP

Tim O'Connor

-Does- average always mean the same thing? I learned (correctly, I
hope) that "average" could be one of three things; mean, median or
mode. Usually when folks - those unencumbered by formal education -
say "average" they actually mean "arithmetic mean."
Gene Green

Larry posted his comparison spreadsheet showing a straight line
fleet-percentage vs observed-percentage for each railroad. I then
said that one really needs to compute a standard deviation (using
multiple data sets as Larry added) to know how closely observations
match the Tim's theory of random distribution.

A simple way is to consider a sample of 50 box cars. For a railroad
that owns 2% of the U.S. fleet, the "average" number of its box cars
in the sample would be 1. But for a SINGLE sample, what's the chance
that NONE of its cars are present? Easy .98**50 ~= .36. That is, 36%
chance. For a railroad with 10% of the U.S. fleet, .90**50 ~= .005.
That is, there's only a 1 in 200 probability that NONE of those cars
are in the sample.

If you have a layout that has 100 box cars, you probably should own
300 to 400 box cars proportionately distributed among railroads except
for the home road, which should be over-represented. (You need so many
cars because this gives you a "precision" of 0.25% so you can have
examples from very small as well as large railroads.)

Then allow random assignments to take their course -- over time you'll
see different trains but the 'average' train (after hundreds of samples)
will match the national fleet percentages (after discounting for the

Of course, you can then add to this random mix, local facts that skew
the mix. Perhaps you have an auto parts or assembly plant. Or maybe a
grain elevator that receives only corn, or only wheat. Or it may be a
certain time of year -- e.g. grain rush. Etc etc. Then you can 'skew'
some car assignments to reflect those traffic patterns.

Tim O'Connor

Anthony Thompson <thompson@...>

Gene Green wrote:
-Does- average always mean the same thing? I learned (correctly, I hope) that "average" could be one of three things; mean, median or mode. Usually when folks - those unencumbered by formal education - say "average" they actually mean "arithmetic mean."
Yes, Gene, I agree that the person in the street thinks of the arithmetic mean as the "average," and that definition is fixed, which is why I expressed doubt about the concept of a "true average." For a unimodal symmetrical distribution, the mean, median and mode lie at the same place, so for many situations you can lump them all together as the "average." But the minute the distribution is skewed, and many real-world distributions are skewed, the three definitions give different answers, and the "middle" of the distribution no longer lies at the most frequent value (the mode).

Tony Thompson Editor, Signature Press, Berkeley, CA
2906 Forest Ave., Berkeley, CA 94705 www.signaturepress.com
(510) 540-6538; fax, (510) 540-1937; e-mail, thompson@...
Publishers of books on railroad history

Rich <SUVCWORR@...>

--- In STMFC@..., Anthony Thompson <thompson@...> wrote:

Malcolm Laughlin wrote:
I'm OK with the average as the core datum, but recognizing that
the
average in the car distribution case may be at one side of a
skewed
distribution orm otherwise not a true average. Any average is
better
than zero or infinity as the starting point.
An average always means the same thing, Malcolm, though in
a
skewed distribution it will not lie at the same location as the
most
frequent data, which lay persons usually assume the average to be
(and
which it is in a symmetrical distribution). But I have no idea what
a
"true average" is.
It would appear that we are headed down the road of confusing average
and mean. Which are totally different. Given 10,000 freight cars
past a given point, the average will most likely mimic the national
fleet based on Tim and Dave's data. However, the mean may be skewed
in one direction or the other significantly.

Rich Orr

Rich Orr

Malcolm Laughlin <mlaughlinnyc@...>

Posted by: "Gene Green" bierglaeser@... bierglaeser Sun Aug 17, 2008 8:07 am (PDT) --- In STMFC@yahoogroups. com, Anthony Thompson <thompson@.. .> wrote:

An average always means the same thing, Malcolm, though in a skewed distribution it will not lie at the same location as the most frequent data, which lay persons usually assume the average to be (and which it is in a symmetrical distribution) . But I have no idea what a "true average" is.
I'm well aware of the mean-mode-median thing, but to me "average" has always meant the sum of the observations divided by the count. But in the case of the distribution of freight cars 50 years ago, it really doesn't matter, we can't get enough data to make distinctions of that sort. I'd be amazed if anyone could find a good number for the percentage cars on-line by ownership for any day, week, month or year in the 50's or 60's. In the 50's the data couldn't have existed. If certain computer tapes on the Southern or NYC (and a very few others) from the mid-60's were still around we could do a count. There really was no reason for a railroad to do the count because it had no value for car management.

I'll repeat what I said yesterday about the value of an average number in this environment. Somoene said an average was a good start point. I agree in the sense that it's better than no number and beginning with an estimate of zero or infinity.

As for the percentage of cars by ownership on any one railroad, I think it's absurd to suggest that it might be the national average ownership.

I'm willing to wager that there was never a day when the distribution of cars across a railroad was ever within 20 percent of that average for more than 50 percent of the ownerships on line, excluding roads and ownerships too small for significance. Tim's calculation of that statistic, or just the fact that it can be calculated, does not make the result meaningful in the real world.

Taking a few consists (1000 is few in this context) and finding that they come close to the estimate is very shgaky statiatically. In many fields of research, using selected historical data to verify statistics is frowned upon because it is very subject to selection bias. That is when the researcher finds a correlation and announces it. But what really should be done is to make a good number if independent estimates using other data sets of the same kind to see if the results are reproducible. In this case that would require taking some number, like 10, other sets of unrelated consists on other railroads in different sections of the countryand comparing the results.

Malcolm Laughlin, Editor 617-489-4383
New England Rail Shipper Directories
19 Holden Road, Belmont, MA 02478

Rich <SUVCWORR@...>

-Does- average always mean the same thing? I learned (correctly, I
hope) that "average" could be one of three things; mean, median or
mode. Usually when folks - those unencumbered by formal education -
say "average" they actually mean "arithmetic mean."

Gene Green
OitwTtoEP
Average amd median are the same thing. total events/number of
observations eg. you observe 100 trains and those trains have 273 ATSF
cars in them. The average number of ATSF cars in a train is 2.73

Mean is that value where 50% of the observations are above or below it

Mode it the value that occurs most frequently.

eg. you observe 10 trains with the following number of PRR cars

3, 0, 3, 1l, 0, 0, 1, 8, 1, 13 total number of cars = 40 average =
4.0 cars per train mean = 2 mode = 0

Rich orr

Malcolm Laughlin <mlaughlinnyc@...>

It would appear that we are headed down the road of confusing average
and mean. Which are totally different. Given 10,000 freight cars past a given point, the average will most likely mimic the national fleet based on Tim and Dave's data. However, the mean may be skewed in one direction or the other significantly.
====================

Tim's data doesn't say that. He has the percentages of system and foreign ownerships on line and off line. It's a big leap of faith to extrapolate that distribution of individual ownerships.

Malcolm Laughlin, Editor 617-489-4383
New England Rail Shipper Directories
19 Holden Road, Belmont, MA 02478

Bruce Smith

On Sun, August 17, 2008 9:06 pm, Rich wrote:
It would appear that we are headed down the road of confusing average
and mean. Which are totally different. Given 10,000 freight cars
past a given point, the average will most likely mimic the national
fleet based on Tim and Dave's data. However, the mean may be skewed
in one direction or the other significantly.

Rich Orr
Um, Rich,

The mean is the average. Are you thinking about the median? (the point at
which half the samples are above that value and half are below).,

Regards
Bruce

Bruce Smith
Auburn, AL

 1 - 9 of 9