One of the main problems with all of this data analysis is that you are trying to take a very small sample size (really insignificant) and extrapolate to the whole fleet. This also assumes that the any sample you analyze is representative of the whole, which is not very likely.
As has been pointed out such an approach is also limited by the year (the rosters will change with time), season and location.
Also as pointed out in this post, most modelers do not have a large fleet of cars (how long would it take to build 500 detailed resin cars?)
So this is what I think would be a reasonable approach for the analysis: (remark I am a research scientist at The University of Texas at Austin)
1. Use data that is likely to reflect trains that would be hauling a representative sample. This would have to be through freights, not local freights.
2. Then compare train data set to roster data set for that year in a very easy to follow format (I suggest an excel spread sheet). This would be done for each train data set. Post the results in a data base for the group.
3. The groupings of small railroads would be necessary to simplify the process.
4. Then ask the question - does the train data compared to the roster data support that the cars are distributed according to the size of the national roster information.
5. It would be important to do this for box cars, gondolas and flat cars in separate groupings. One could do this for hopper cars and stock cars, but they are likely to be impacted more by location, but this would become evident in such an analysis..
If Tim has already done this, then present the data in such a format. What I have seen does not do this. Please do not tell us that is what it shows, show us the data then each person can determine if this applicable to their situation and then they can decide how to use the information.
6. Also include the time of year and the location for the train data set as there may be seasonal and regional impacts that one may wish to consider.
7. Then the user can apply their knowledge of differences in traffic patterns, etc for their own railroad on a case by case basis. It is not likely that the same approach will work for all model railroads.
8. Forget the statistical analysis - the train data sets are far too small to reasonably apply them. If you want to do a statistical analysis, you would need to apply some sort of boot strap process to resample the data.
Once the data is presented anyone can then take that and apply it as they see fit for their model railroad.
I will tell you how I have gone about it for my model railroad of the Houston East & West Texas circa 1910.
1. Use the data from the railroad commission reports on commodities shipped on the railroad to determine the split of cars by type - box car, refrigerator car, flat car, gondola car, stock car, etc. Other data in the reports tells the train lengths and split between loads and empties.
2. I would obtain the ICC Blue Book data for 1910 to determine the ratio of home road cars to other cars. In my case for the HE&WT these data would likely to be applicable only to box cars, Knowing that based on other information most flat car and gondola loads were generated locally with home road (SP lines) cars. Other more robust data sources discussed below are used for tank cars, stock cars and refrigerator cars.
3. If we can show that in 1910 the non home box cars are distributed by national roster data, use this to determine the number of box cars for each railroad.
4. Realize that in 1910 (it is not necessarily the same for later years), most tank car and refrigerator cars used on the HE&WT were owned by private owners, so that would mean another type of analysis.. Use the railroad commission reports of mileage for private owner cars (since there was significant variation from year to year, average over a span of 3 years - 1909, 1910 and 1911) to determine companies likely to show up on the HE&WT. This number of cars for each company on the model railroad would then be coupled with the industries served on the model railroad, the availability of information on the company's cars (not as much of a problem for the 1950's), availability of suitable decals to letter the cars and the availability of kits (or the willingness to scratch build or kit bash the cars needed) to determine the tank car and refrigerator car fleets for the model railroad.. The data for the private owner cars during this time period was no where near the split
based on number of cars nationally, with some large fleets of cars never showing up at all. A case in point would be the Union Tank Line - very small percentage, but there was a large representation from Merchants and Planters Oil Company and Higgins Oil & Fuel Company, both with rather small fleets of cars. The same sort of thing happened with refrigerator cars. PFE was not well represented as they had recently started operations. Armour owned companies had the largest representation. Santa Fe cars did not have much mileage on the HE&WT. Another problem is that refrigerator cars from the Houston Packing Company had a good deal of mileage, but I can not locate any information on what they lookied like or how they were lettered. The same thing happens with many of the tank cars, many of which were not listed in the ORER.
5. Flat cars and gondolas would be mainly SP lines cars.
6. Stock cars would be a mix of SP lines cars, private owner cars (much of the live stock was shipped on private owner cars in 1910) and other cars. However, there was not much live stock shipped on the HE&WT.
I have done all of the above except for the following exceptions:. I have not had the ICC Blue Book data (I just found out about it) nor have I done the percentage of box cars based on the national fleet. I intend to do that if I can locate the Blue Book data for the HE&WT for 1910 and then I will compare the results of this approach to the other one I used earlier. That being - 50% of the cars being local (SP lines in this case), 25% from lines that connected with the HE&WT and the other 25% from the national fleet based roughly on the percentage of their car fleet. The split of model SP Lines box, flat and gondola cars was based on the percentage of each type of car on the roster at that time and the road names. As mentioned earlier, this presents some problems as many of these cars would have to be scratch built.
John Stokes <ggstokes@...> wrote:
Finally got some response other than the repetition of the studies. No, not meant to be insulting at all, but I think correct for the notion that this "data" applies absolutely. I don't think Tim is saying that the equal distribution accurately depicts the status on every railroad in the country on every day of every month of every year I do not recall his making the claim that was the case, or that every freight train or groups of freight trains on any given railroad would always have the predicted percentages. What I think flies in the face of reality is the notion that this formula works every time and that it directly and specifically applies to modeling situations on most people's layouts.
I believe what some people are saying is that the idea that there was this perfect mathematical distribution of box cars to every railroad in the land in proportion to ownership is hard to see in light of the almost infinite variables that would affect such distribution, and photographic and personal observation evidence to the contrary. Who made this happen, or how did it happen. Some unseen guiding hand? What did Mark Twain say about statistics?
Actually, this all has little direct effect on most modelers, since none of us, except the virtual guys and really large clubs, even begin to approach the traffic potential to allow any one to follow these theories.
And I have looked at the facts, and they are more than statistics based on small samples, and they say that this mathematical precision did not occur in real life. But I would like to hear why that is not correct. Tell those who disbelieve, in a good concise paragraph again, the meat of the theory and the facts that back it up. I am willing to try to learn.
To: STMFC@...: thompson@...: Mon, 18 Aug 2008 10:40:08 -0700Subject: Re: [STMFC] Re: freight car distribution - rejecting the equal distribution hypothesis.
John Stokes wrote:> I agree with Malcolm. The very notion that box cars were distributed > on all railroads across the country in proportion to ownership is > patently absurd on its face.Gosh, John, this sure saves you from doing any data analysis or for that matter, even looking at any data. To call Tim Gilbert's work "lahlah land" is insulting and ignorant. But hey, opinion trumps facts, right?Tony Thompson Editor, Signature Press, Berkeley, CA2906 Forest Ave., Berkeley, CA 94705 www.signaturepress.com(510) 540-6538; fax, (510) 540-1937; e-mail, thompson@... of books on railroad history