Tim O'Connor writes:
"Larry, does 2,267 box cars on the UP mainline even represent the traffic of a single typical day? So here we are looking at freight trains spread over four months... or on any given day, less than 1% of the box cars on the main line are sampled."
Yes Tim, I think 2,267 box cars represent three or four days of UP freight traffic across Wyoming in the late 1930s. (Data from 1949 provided by Mark Amfahr show 30 to 33 trains a day between Laramie and Cheyenne.) The information in the three train books covers 120 trains and about 7,000 cars. Ferguson's book is from May 13 to June 21, 40 days, 41 trains, over 2,000 cars, about a 3% sample. Fraley's and Fitz's books span September 12 to October 24, 43 days, 89 trains, nearly 5,000 cars, about a 6.6% sample. Pooling them gives about a 4.5% sample.
I'm not sure that the percentage of the sample matters as much as its size, however. The Gallup Poll routinely uses a sample size of about 1,000 adults when conducting its survey research. If there are 200 million adults in the country, the sample percentage is about 0.0005%. Also, what are the sample sizes and percentages for the data on which the G-N model is based? Does anyone know?
A far more serious potential problem than sample size or percentage is its randomness. Random sampling is generally required for statistically accurate results, and it would be hard to argue that these are random samples. There is nothing we can do about it other than exercise caution when interpreting the results however. And there is no need to over-emphasize this problem either. I think the train books are far preferable to other methods we have of reconstituting the past (photos, videos, ICC reports, etc.).
"What can be learned from this? Nothing, I believe."
I on the other hand learned a lot. So can others if they look at it carefully (and critically). How can one have data from 120 trains and 7,000 cars and learn nothing?!?
"Look at it another way -- suppose you had conductors books for the
same time period from three other conductors. Do you think that the
tallies would be the same? I don't -- not one chance in a thousand.
So then, which would be the representative sample? Answer: neither."
No, they wouldn't be the same, but they are likely to be similar. For example, I would expect them to show the same dominance of UP and SP cars, and I think it is likely that the Central Western ICC region will have more than its fair share of cars, even with UP and SP removed. The New England and Southern regions may have fewer than expected representation also. In any event, that is a question that may have an answer in a few months, because I in fact do have three more conductors' reports from September/October 1938 that I intend to transcribe.
And if it were me I wouldn't worry about which is representative. I would simply pool them.
"If you had ALL of the conductors books for every day for a full week, now that would be interesting!"
Yes, and if a Big Boy were to pull a load of 1950-era cars through Laramie today, that would also be interesting too! :) (And how would we know if it is a typical week?)