Thursday, July 29, 2021

Polling Paradox: the Crowded Bus Paradox

 I've found some mention of a "Crowded Bus Paradox" online, but it's always been poorly explained. Additionally, its impact on polling data is seemingly non-existent. I want to outline an example to help explain why I feel this is important. 

Lets imagine a fictional city. In this city is a train line; perhaps a subway, perhaps some form of LRT, or perhaps commuter rail. Regardless, on this train line, is a station. The one we are using in our little example. At this station is a bus line. On this bus line, a bus departs every 5 minutes, right on schedule. 

People walk to the station to catch the bus. One every minute, to be exact. The train also uses this station, it arrives and departs at 10 minute intervals, and on each train, are 5 people who use the bus line we are looking at.

Now. What will happen. Lets assume its 2:00pm.

At 1:56pm, a passenger walks up. Another at 1:57, 1:58, and 1:59. At 2:00pm a 5th passenger walks up. At the same time, a train arrives, and 5 people get off the train. Also at the same time, the bus is ready to depart, and so 10 people board the bus. 

What happens next?

Well we continue to get 5 people walk in, so the next bus leave with 5 people on board. Remember, no train arrives at 2:05pm, the next one arrives at 2:10pm. 

Next we have another bus with 10 passengers departing, followed by one with 5, and one with 10, and one with 5, and so on to infinity. 


So, a statistics agency hires a polling firm to get some data. They ask the bus drivers. "How many people are on your bus". Averaging the answer, they get 7.5

Now, they ask the passengers. "How many people are on your bus". Averaging the answer, they get 8.3

But wait! How? Where did these additional people come from?


Up to now you've probably been looking at this from a driver's point of view. 10, then 5, then 10, then 5. But look at this from a passenger's point of view. 

You can split this 10-5-10-5-etc thing into two groups. One group with 5 passengers, and one group with 10. This is a total of 15 people. Of those 15, 10 of them get on buses with 10 people, and 5 on buses with 5 people. This means that 10 people see 10 people on the bus, and 5 see 5 people on the bus.

 10+10+10+10+10+10+10+10+10+10+5+5+5+5+5=125

125/15 = 8.3333333

Lets assume for the sake of argument that 8 people is the most that can sit on our tiny little bus. 

If you were to look at the total provided by the drivers, you'd find that only half the buses are overcrowded. However, if you ask the passengers, two third of them say they are on an overcrowded bus.

Now we have 50.0% vs 66.7%. 


So, we have radically different conclusions from the exact same data, simply changing based on how you look at it.



So. Why am I posting this? It is a bit unusual, given what I usually post about. There are a few reasons. One, is that from time to time I see polls asking people to estimate things about others, and the results of those polls are always far off from base. Part of the reason is related to this paradox. It also explains why people can experience things (like an overcrowded bus) that the data (how many buses are overcrowded) suggests are uncommon, at rates that seem to imply the data is faulty. Lastly, I have a few posts I hope to make in the future that may require referencing back to this post, and this paradox. As such, I felt it was good to 'get it out of the way' now. 

I'm still going to post a regular weekly update on monday; but I also hope to make additional posts based on some stuff I've been working on that are not directly related to the blog; in particular, looking at historic and past elections; in particular, in germany. 


No comments:

Post a Comment