An average always means the same thing, Malcolm, though in a skewed distribution it will not lie at the same location as the most frequent data, which lay persons usually assume the average to be (and which it is in a symmetrical distribution) . But I have no idea what a "true average" is.
I'm well aware of the mean-mode-median thing, but to me "average" has always meant the sum of the observations divided by the count. But in the case of the distribution of freight cars 50 years ago, it really doesn't matter, we can't get enough data to make distinctions of that sort. I'd be amazed if anyone could find a good number for the percentage cars on-line by ownership for any day, week, month or year in the 50's or 60's. In the 50's the data couldn't have existed. If certain computer tapes on the Southern or NYC (and a very few others) from the mid-60's were still around we could do a count. There really was no reason for a railroad to do the count because it had no value for car management.

I'll repeat what I said yesterday about the value of an average number in this environment. Somoene said an average was a good start point. I agree in the sense that it's better than no number and beginning with an estimate of zero or infinity.

As for the percentage of cars by ownership on any one railroad, I think it's absurd to suggest that it might be the national average ownership.

I'm willing to wager that there was never a day when the distribution of cars across a railroad was ever within 20 percent of that average for more than 50 percent of the ownerships on line, excluding roads and ownerships too small for significance. Tim's calculation of that statistic, or just the fact that it can be calculated, does not make the result meaningful in the real world.

Taking a few consists (1000 is few in this context) and finding that they come close to the estimate is very shgaky statiatically. In many fields of research, using selected historical data to verify statistics is frowned upon because it is very subject to selection bias. That is when the researcher finds a correlation and announces it. But what really should be done is to make a good number if independent estimates using other data sets of the same kind to see if the results are reproducible. In this case that would require taking some number, like 10, other sets of unrelated consists on other railroads in different sections of the countryand comparing the results.

