As I noted in my message of August 18, we have just about drained the useful
information out of the frt car distribution thread. I also asked in that
thread for opinions regarding the acceptable "error" in the predicting
capability of what I refer to as the Nelson/Gilbert theory. The point has
always been, to me, that, while the theory is logically interesting, how
does it perform?

"Tim's and Dave's analysis was only on a few trains on a couple of
lines (UP and Southern, if I read everything correctly), but the
analysis yields results that are more accurate the more
trains/lines/months/years you check."

Well, I don't have the data from the Southern train but I do have Tim's results from his 1947 UP data and, of course, I have the 1949 UP data...which I copied to Tim and he analyzed. The theory worked fairly well with the 1947 data...777 box cars...but the 1949 data...almost twice the size...1325 box cars...in Tim's words "blew it all to hell".

"What I (and I think Tim and
Dave)am saying is that once you get a large enough sample size (and
the point I was trying to make using the poker chips, surveys and
birthdays is that the necessary sample size is a lot smaller and need
not be nearly as wide-spread as most people think) you can make quite
accurate projections for a much larger universe."

That's fine. I would have no problems with the theory IF it were "proven". IOW, suppose we had data from say 30 RRs, each of about 35 trains. If the analyzer studied the data and noticed that the % of foreign box cars on a given RR matched the national % of the foreign box cars for these 1050 trains, I would think we had a good theory. I would think it would predict the number of foreign box cars on a given RR for a much longer period of time...say a yr. If we only had the data from 2 RRs to work from I would conclude that we MIGHT have a good theory. If, however, we had the data from 3 RRs to work from and the data from the largest did NOT support the theory, I would try to alter the theory to match the data or accept the theory at great risk. Unfortunately, this is the case when one applies the theory to the 1949 UP data. IMO. If what I am saying is incorrect, please let me know.

The thread will remain open for replies. I would also remind the members that this is a discussion within the scope of the group. It should also be conducted within the rules of the group...meaning in a civil manner.

Well, I don't have the data from the Southern train but I do have Tim's
results from his 1947 UP data and, of course, I have the 1949 UP
data...which I copied to Tim and he analyzed. The theory worked fairly well
with the 1947 data...777 box cars...but the 1949 data...almost twice the
size...1325 box cars...in Tim's words "blew it all to hell".

If, however, we had the data from
3 RRs to work from and the data from the largest did NOT support the theory,
I would try to alter the theory to match the data or accept the theory at
great risk. Unfortunately, this is the case when one applies the theory to
the 1949 UP data. IMO. If what I am saying is incorrect, please let me know.
Mike,

I think Tim may have overstated the divergence of the 1949 data from the theory <G>. The difference in 1949 are those annoying extra SP cars... with the exception of those, the national fleet model is still pretty accurate, correct?

If you did not have any knowledge of the 1949 wheel reports, how would you set up your fleet?

I have looked high and low for wheel reports, or for that matter even tower sheets from my chosen period of June 1944 for Columbia PA. I have found nothing. Nada. Photos are rare due to wartime concerns. Films do exist, but mostly on other parts of the PRR (they tend to confirm the Gilbert fleet model and the Brock NP model). So, what should my starting point be?

As I see it, there are two major competing models being espoused (for boxcars)

1) The loco-regional interchange model. This model says that by virtue of proximity, connecting road percentages will be higher than roads that are further away. FWIW, that does not mean that if distant road X has 10% of the national fleet and close road Y has 2% that there should be more Y cars than X cars, but only that the % of Y should be above 2% and by default, the % of X must be below 10%. No data sets have been offered to support this model.

2) The national fleet model. This model says that because boxcars were freely interchanged that the % of a given car seen on any given railroad over time should approximate the % of the car in the national fleet. Model 2 is supported by several small data sets.

Additionally, there are data sets such as the 1949 Fraley and the Potomac yard set that do not appear to match EITHER model exactly. In both these cases, one to several roads appear to be "out of kilter" compared to the rest.

Railroad historians can argue the whys forever, but I have a layout to populate, so, what does this mean to us as modelers?

For you (Mike), is 1953 like either 1949 or 1947 or is it different still? Does the fact that there's a war on make a difference? Do we just throw up our hands and say it is unknowable, there is no perfect model and I'll just put anything I damn well care to on my layout? (we'll call that model 3 <G>) I've always held that the national fleet model is a STARTING place and that arriving at a realistic fleet is an iterative process. It's not that the model is wrong and useless, it is that 1949 on this line is a case where the model needs to be adjusted. Both 1947 and 1949 follow the national fleet numbers for most of the fleet, so why not start there and then perhaps increase the SP numbers slightly, maybe 50% over expected numbers? To model this line using the loco-regional model might result in the correct % of SP cars, and the incorrect % of just about every other road. I don't view that as a logical solution ;^)

My point is that while the national fleet model may not predict with 100% accuracy, it is a STARTING place, and in the absence of any other data, provides you with a reasonable representation of the steam era (based on the data sets). If someone is lucky enough to develop additional data sources, then those can be used to modify the national fleet model to represent some of those local deviations. Situations that might be included would be some that have been named, such as grain rush season and areas dense with automobile manufacturers (and hence assigned service cars).

Bruce, I've decided to use model #4, which is I'll select from the
universe of box cars built before my modeling date of November 1941,
and painted box car red (running for shelter).

"I think Tim may have overstated the divergence of the 1949 data from
the theory <G>. The difference in 1949 are those annoying extra SP
cars... with the exception of those, the national fleet model is
still pretty accurate, correct?"

Not really. There is a similar problem with Milw, CB&Q and C&NW cars.

"If you did not have any knowledge of the 1949 wheel reports, how
would you set up your fleet?"

Hmmm. Good question. The trouble is, I have the Big Boy Collection video which shows 4 complete UP trains. One has those 36 SP box cars...

"I have looked high and low for wheel reports, or for that matter even
tower sheets from my chosen period of June 1944 for Columbia PA. I
have found nothing. Nada. Photos are rare due to wartime concerns.
Films do exist, but mostly on other parts of the PRR (they tend to
confirm the Gilbert fleet model and the Brock NP model). So, what
should my starting point be?"

As I see it, there are two major competing models being espoused (for
boxcars)

1) The loco-regional interchange model. This model says that by
virtue of proximity, connecting road percentages will be higher than
roads that are further away. FWIW, that does not mean that if
distant road X has 10% of the national fleet and close road Y has 2%
that there should be more Y cars than X cars, but only that the % of
Y should be above 2% and by default, the % of X must be below 10%.
No data sets have been offered to support this model."

The model that I prefer is a modified Nelson/Gilbert model which states that RRs with "significant interchange" should have from 2 to 2.5 times the national %. The 1949 Fraley supports this scenario. "Significant interchange" would be one in which one RR terminates and a very high % of its traffic continues on another. Examples are UP/SP at Ogden, UT, UP/Milw and UP/CB&Q and UP/C&NW at Council Bluffs/Omaha. Add GN/CB&Q and NP/CB&Q in Minneapolis/St. Paul and SP/RI and SP/SSW. I would not include UP/Mopac at Omaha or the other Omaha RRs. There might be others as well. I would not include RRs that simply connect as in the case of Southern and SP and Mopac at New Orleans, PRR and Mopac at St. Louis, SR and B&O at St. Louis, SR with N&W at Cicinnati. FEC and SR, SAL and RF&P would need study.

"2) The national fleet model. This model says that because boxcars
were freely interchanged that the % of a given car seen on any given
railroad over time should approximate the % of the car in the
national fleet. Model 2 is supported by several small data sets.

Additionally, there are data sets such as the 1949 Fraley and the
Potomac yard set that do not appear to match EITHER model exactly.
In both these cases, one to several roads appear to be "out of
kilter" compared to the rest.

Railroad historians can argue the whys forever, but I have a layout
to populate, so, what does this mean to us as modelers?

For you (Mike), is 1953 like either 1949 or 1947 or is it different
still? Does the fact that there's a war on make a difference? Do we
just throw up our hands and say it is unknowable, there is no perfect
model and I'll just put anything I damn well care to on my layout?
(we'll call that model 3 <G>)"

No. Like I say, I prefer a modified Nelson/Gilbert...at this time.

"I've always held that the national
fleet model is a STARTING place and that arriving at a realistic
fleet is an iterative process. It's not that the model is wrong and
useless, it is that 1949 on this line is a case where the model needs

My point.

"Both 1947 and 1949 follow the national fleet numbers
for most of the fleet, so why not start there and then perhaps
increase the SP numbers slightly, maybe 50% over expected numbers?
To model this line using the loco-regional model might result in the
correct % of SP cars, and the incorrect % of just about every other
road. I don't view that as a logical solution ;^)"

Neither do I. There are a lot of RRs that went into Omaha. I would choose to raise the number of cars of only 3...Milw, C&NW and CB&Q.

"My point is that while the national fleet model may not predict with
100% accuracy, it is a STARTING place, and in the absence of any
other data, provides you with a reasonable representation of the
steam era (based on the data sets). If someone is lucky enough to
develop additional data sources, then those can be used to modify the
national fleet model to represent some of those local deviations.
Situations that might be included would be some that have been named,
such as grain rush season and areas dense with automobile
manufacturers (and hence assigned service cars)."

We agree. Unfortunately, I'm beginning to analyze a Fraley 1956 book. It will probably show Maine Central cars in great numbers. <G>. It DOES include one very surprising train. Only UP could...

"Bruce, I've decided to use model #4, which is I'll select from the
universe of box cars built before my modeling date of November 1941,
and painted box car red (running for shelter)."

I like that one, Walter. Uh...which box car red?<G>

I will try to keep my response to this afternoon’s postings in the spirit of the following from Mike Brock.

> The thread will remain open for replies. I would also remind the members that this is a discussion within the scope of the group. It should also be conducted within the rules of the group...meaning in a civil manner.

Tony said
> “Malcolm has decided to oppose the idea, no matter what he's told. For his sake, and for the others in his camp, I've stopped trying to explain further.”

Trying to observe last four words of the above quopte from Mike, I can sympathize with Tony’s frustration in having a theory apparently dear to his heart logically questioned after its being uncontested for several years. But ……. no point going further absent a summary of information buried in the archives about what Tim G actually did calculate.

One other comment that should have a response

> 1) The loco-regional interchange model. This model says that by
virtue of proximity, connecting road percentages will be higher than
roads that are further away. ………..No data sets have been offered to support this model.

There are no data sets that truly support either model. Because railroads didn’t keep counts of foreign cars on line by ownership, the necessary data sets probably never existed. So the choice is between

a) a model that begins with a distribution in proportion to ownership, supported by a miniscule set of observed data not representative of any whole railroad.

b) a region/distance based model based on purely qualitative factors that are known to have influenced cars to move towards their home railroads.

My basic contention is that there is no reason to believe that some unknown factors negated the fiive factors that I mentioned in a way that caused cars to distribute themselves in proportion to ownership.

Walter Clark writes:

"Bruce, I've decided to use model #4, which is I'll select from the
universe of box cars built before my modeling date of November 1941,
and painted box car red (running for shelter)."

I like that one, Walter. Uh...which box car red?<G>

Walter Clark's voice echoes from the bomb shelter "Mike, it's got to
be the correct box car red. You know that."

I might just allow one box car in something other than box car red.
How about one of the MKT yellow cars to carry bananas?

I haven't been following this thread closely since I model a point-to-point
short line railroad which had interchanges with both the SP and ATSF; as
such, there wasn't any through freight and all foreign cars had to be handed
over from/to one of these two interchanges. From a number of different
factors, I have concluded that the YV was more closely affiliated with the
SP rather than the ATSF (Pullman interchange with the SP and not the ATSF in
later years, leasing of SP engines when needed, leasing an SP diner each
summer, etc.) From a VERY limited number of resources, it is still obvious
to me that there were more SP foreign cars on the YV than any other
railroad, with ATSF cars second. All of the box cars tended to be from
western railroads...SP, ATSF, WP, NP, etc. Tank cars were, from my limited
information, all UTLX tank cars. Refrigerator cars were limited to PFE and
some ATSF. But what prompted me to contribute to the confusion radiating
from this thread is a statement from Malcolm:

Because railroads didnï¿½t keep counts of foreign cars on line by
ownership, the necessary data sets probably never existed. So
the choice is between...
I have photocopies of some ledger sheets prepared by the YV for the
Association of American Railroads for a report entitled "Empty Cars on Hand
as of Specific Date" which lists the empty cars on the YV as of the 15th of
the month and the end of the month by OWNER and CAR TYPE. For example, SP
box car, ATSF box car, UP auto car, etc. The information I have is very
limited....only from December 31, 1936 to March 31, 1937. But apparently
railroads did track and report this stuff, at least at one time....

Trying to observe last four words of the above quopte from Mike, I
can sympathize with Tony’s frustration in having a theory apparently
dear to his heart logically questioned after its being uncontested for
several years. But ……. no point going further absent a summary of
information buried in the archives about what Tim G actually did
calculate.
Having failed to make a cogent argument, Malcolm is resorting to
insult. I've replied to him in more detail off-list and will have
nothing further to say to him ON this list.

I have photocopies of some ledger sheets prepared by the YV for the Association of American Railroads for a report entitled "Empty Cars on Hand as of Specific Date" which lists the empty cars on the YV as of the 15th of the month and the end of the month by OWNER and CAR TYPE. .... The information I have is very limited....only from December 31, 1936 to March 31, 1937. But apparently railroads did track and report this stuff, at least at one time....

No question of the feasibility of this kind of report for a small railroad. What was the total number of cars on line Jack ? In the case of the YV, it would have been easy to keep such a record just from the single daily interchange report that had all cars coming on the railroad for that day.

In contrast, consider a railroad such as the PRR or ATSF with thousands of cars coming through hundreds of interchanges every day. Consider the fact that all of the paper with that information flowed into the system car accountant's office with time lags of days to weeks. Then imagine the massive clerical task to do the count. No computers around to help. Not even photocopiers !

It would have been a huge expense to get information that would have been too old to use for any kind of management decision making.

No question of the feasibility of this kind of report for a
small railroad. What was the total number of cars on line Jack ?
In the case of the YV, it would have been easy to keep such a
record just from the single daily interchange report that had all
cars coming on the railroad for that day.

In contrast, consider a railroad such as the PRR or ATSF with
thousands of cars coming through hundreds of interchanges every
day. Consider the fact that all of the paper with that
information flowed into the system car accountant's office with
time lags of days to weeks. Then imagine the massive clerical
task to do the count. No computers around to help. Not even
photocopiers !

It would have been a huge expense to get information that would
have been too old to use for any kind of management decision making.
But wouldn't that information be needed every day to pay for per diem
charges?

But wouldn't that information be needed every day to pay for per diem
charges? Jack Burgess

Yes, on the PRR at least, each interchange point had a record (I have some)
of cars set-out and received, and the time, and that data was assembled for
forwarding to the finance offices for payment of per diem charges. Units
train blocks were provided in sum on that same sheet.

I had the opportunity to go through numerous indexes at the PRR archives
earlier this year, and was, as always, unable to find any of this
information.

I have been told many times that the Business Management folks in the PRR
kept many of these records, for use in business planning, which seems
obvious. But, they also destroyed the raw data (at some location), from what
I was told, since that data would be a valuable tool to competitors, or if in
the wrong hands, could be used to influence stock prices, if assembled
correctly. That was why train consists were destroyed, and why us PRR guys
have only few examples.

If the PRR had a policy of destroying lists of who got what, how much, and
when, we may never be able to answer some of these questions.

Oh, I also wrote a multi-piece article on what I did for my timeframe and
locale, in TKM, and there was only one person even vaguely interested. I
think I could have better spent my time (hundreds of hours) building more
models!

There are no data sets that truly support either model. Because railroads
didn’t keep counts of foreign cars on line by ownership, the necessary data
sets probably never existed. So the choice is between

I would suggest that this data was collected to some extent by the
accounting departments. How else would they determine the per diem payments to the
various roads. Yes, I recognize that per deims were frequently offset by Road
A with what Road B owed Road A and only the balance actually paid. But
nevertheless the number of cars on property each day needed to be known.

Rich Orr

Yes, Jack you are correct and those records for all railroads were forwarded to the car accounting offices. Most of the per diem was a paper exchange, but not all of it. Then of course there were the privately owned cars that were due their share as well.

At 6 AM each morning the entire yard was check and recorded on a
special form. Two copies were sent to the accounting department and
two copies to the car department and one filed in the files where
they were recorded. This was done on the whole RR and was done on The
Un Pac and John Santa Fe. It is of my opinion that this was done on
every RR in the good old US of A, So at 6 AM every freight car in the
good old US of A was on record. So they not only needed to be known,
they were known.
Which is well and good to know, but where are all those mountains of reports now when we need them?

Which is well and good to know, but where are all those mountains of reports now when we need them?
They're up at high altitude, helping warm the planet <g>.

Hi Folks

A week or so ago I posted a message about a "random train" Excel
(message #75066). Briefly, the spreadsheet generates a list of 40
boxcars chosen at random from a universe of cars which approximates
the U.S. boxcar fleet ownership in 1949. Experienced Excel users can
using whatever universe they would like.

In message #75229 I described how I automated the spreadsheet by
running it 100,000 times. That is, it created 100,000 randomly
generated car lists, with 40 cars per list. The main purpose of this
simulation was to test whether the random train spreadsheet was
operating correctly; if it was, then over the long run, the average
proportions of the randomly generated cars should tend to the
proportions of the universe (they did). The simulation also recorded
the maximum number of cars generated during any of the 100,000
iterations (for each road). For example, the number of NYC cars in
the three lists in message #75066 was 4, 1 and 6. During the
simulation, there was at least one car list with 14 NYC cars. (In a
list of 40 cars that is proportional to 1949 national averages, one
should expect 4 from the NYC.)

Subsequently, I have refined the simulation so as to record the
entire distribution of cars for each road during a simulation of
100,000 car lists. For example, the national proportion of New Haven
boxcars was less than 1% in 1949; most random trains of 40 cars would
not have any NH cars, but sometimes there will be one or more. The
next list shows the frequency distribution of 0, 1, 2, … NH cars
generated by the simulation of 100,000 car lists (71,508 car lists

0___71,508
1___24,068
2___3,963
3___421
4___37
5___3
These numbers can be converted to probabilities by dividing by
100,000. Thus the probability of a car list with 40 cars and none
from the NH is .715, 1 car = .241, 2 cars = .040, etc.

After examining the results of this simulation, it seemed to me that
the process of random car selection was much like the ball and urn
ago: An urn has some red and white balls of a known proportion.
Reach in and grab a ball; if red, then record it as a "success", and
if white as a "failure"; replace the ball then repeat the process for
a certain number of times, say 40. What is the probability of 0
successes? Exactly 1 success? Exactly 2, 3, … ? These
probabilities are given by the binomial distribution. The next list
shows the binomial distribution for 0, 1, 2, … (multiplied by
100,000) for 40 trials and a "probability of success on each
trial".0084 = .84% (this is the national proportion of NH boxcars in
1949 that I used for my simulation).
0___71,483
1___24,098
2___3,960
3___423
4___33
5___2

Note the close correspondence of the simulation and the binomial
distributions in the two lists. This and the examination of other
simulation results convinced me that my process of random car
selection could be effectively modeled by the binomial distribution
(I also compared the Poisson distribution). If anyone would like a
copy of my simulation results, contact me off list.

To use the binomial distribution, all you need to specify is the
number of trials (read boxcars in a train) and the probability of
success on each trial (read proportion of cars of a particular
ownership or type). The proportions of cars can be national,
regional, or any other proportion you wish to use. You can make the
calculations with the aid of tables, programs such as Excel, or any
of several on-line calculators.

I should point out a key difference between my simulation model and
the real world: Just like cards, a train "has memory". This means
that once a car is removed from the population and placed in the
train, it cannot be placed again in the same train. Once the first
NH car is chosen with a probability of success on each trial of
6,012 / 719,349 (NH boxcars divided by national boxcars, 1949) the
probability of success on each trial for the next one changes to
6,011 / 719,348. This is the difference between sampling with
replacement (my simulation) and sampling without replacement (real
world). The binomial distribution also assumes sampling with
replacement.

One use for the binomial distribution is to test real world examples
for randomness. Again reaching back many years to my statistics
classes, I am reminded of the "null hypothesis": A researcher
discovers something interesting and suspects it is not merely
random. The null hypothesis is that it IS random, while the
alternative hypothesis is that it is not. The null hypothesis is
assumed to be true unless the researcher is 95% or 99% confident that
it is false (these are typical confidence levels).

The UP train with the large number of SP boxcars is an example. My
understanding is that this train had some 90 boxcars, 36 of which
were SP. In order to calculate the binomial distribution, we need to
know the number of "trials" (i.e., cars in the train, say 90) and
the "probability of success on each trial" (i.e., the proportion of
SP cars in the national fleet, say 4% = .04). From this you can find
the probability of a train with exactly 0, 1, 2, …, 36, … SP cars.

Or maybe not: It turns out that the probability of 36 or more cars
is so low that Excel cannot calculate it. For example a 90 boxcar
train with a "mere" 20 or more SP boxcars would occur only once in
every 19.5 billion trains. Conclusion: This train could not have
occurred by chance alone. (A friend of mine who has lived in Laramie
all his life – in particular the 1940s and 50s – describes these cars
as a "transfer run".)

Suppose that the 4% number is wrong; Tim Gilbert's data lists 4.9% SP-
Pac ownership in 1956. Let's be generous and make it 5%. Then a 90
car train would have 20 or more SP boxcars once in every 356 million
trains. (Tim's data are at "4060totalboxcarsUSownership.xls" in the
files section of this list.)

Rather than using the proportion of the national fleet, how about
giving more "weight" to SP cars on the UP because of the "connection"
between the two railroads, or because of nearness or whatever? Let's
say we "weight" the SP cars by a factor of two (Mike Brock suggests a
weight of 1.5). To apply the desired weight, multiply it by the
national proportion: e.g., 2 * 5% = 10%. Using a "probability of
success on each trial" of 10% and a 90 boxcar train we find that
Excel still cannot calculate it because the probability is too low (a
train with "only" 30 or more SP boxcars would occur once every 3
billion trains). Conclusion: No reasonable weighting will reproduce
the train actually observed – we must reject the null hypothesis.
That is, the observed train composition is not the result of chance
alone.

I suspect that if we begin applying the binomial distribution to real
world data we will find many cases in which we should reject the null
hypothesis of random car assignment. This does not imply that the
random assignment model should be ignored, of course; it simply means
that other factors (real world consists, photos, personal choice,
etc.) should also be considered. For example, we may find cases such
as transfer runs or large shippers where it makes sense to treat
blocks of cars as a unit and to assign these blocks, rather than the
individual cars, to trains.

Subsequently, I have refined the simulation so as to record the
entire distribution of cars for each road during a simulation of
100,000 car lists. [ snip snip ]
Larry, I think there is a certain population on this mailing list
that finds your analysis worthwhile and interesting, and there
also is a population that hates it or dismisses it or... whatever.

Myself, I think there are TWO issues for model railroads (1)
the assignment (waybilling) of cars and (2) the makeup of
trains.

For a 1-train a day model RR like Jack's YV, train makeup isn't
an issue. Each train is a perfect reflection of the distribution of
car assignments.

For a 35-train a day model RR like Mike's, individual trains can
be very different from one another, reflecting different origins,
destinations, connections, schedules, etc.

