Re: Numbers/percentages of important box car types

Dave Nelson <muskoka@...>

Most conductor's logs will contain more than 25 trains. Tim and I have
looked at several thousand boxcar entries from hundreds of trains. They
give a clear and unambigous body of data that shows those railroads with the
most boxcars (e.g., PRR, NYC) have their cars recorded most often and those
with the least, least often, with everybody else in between, generally
falling in line by the size of their boxcar fleet. Yes, there is some
variation from the expected -- we'll see road A in 17th place where expected
but road B in 18th instead of 19th. But on the whole the correlation between
sightings and fleet size is very high... IIRC the correlation I calculated
exceeded 0.98.

What comes next is then the challenge of trying to describe what has been
seen. I think we've pretty much settled on the qualifiers: Post WWII, class
1 Mainline routes, US boxcars. There are some questions about North:South
routes, corner cases (e.g., Seattle, San Diego, Miami, and Bangor), Canadian
cars, rural vs urban locations, and the amount of influence from connections
(as in if you expected to see 1.25% and see 1.81%, does that mean
connections are usually 1.5X expected OR does it mean thinking in terms of 2
cars/100 modeled instead of 1 -- an insignificant variance?). Some of that
can be quite important and there just isn't enough data yet to close the
questions. But given what we can agree on so far -- Post WWII, Class 1
mainline, US Boxcars, AND perhaps most important, an absence of other more
complete data for the specific locationa and time of interest, a decent rule
of thumb is to build your boxcar roster according to the road ratios found
in the US fleet and then compose your trains as desired. Swap cars to/from
storage to add variety.

Dave Nelson

A person intimately involved in statistical research would say that we
need way more data to be able to make real use of predictive models,
but we don't have all that consist data. A person might argue that we
need at least 25 trains worth of data to develop a model with a certain
confidence of being statistically useful.


Join to automatically receive all group messages.