Precinct summability of IRV

last19digitsofpi

I think the argument is that if you had 20 candidates, then a malicious agent "Mallory" could tell "Alice" the following:

I want you to vote [Adolf Hitler, Josef Stalin, Benito Mussolini, Joe Smith, John Doe, Jane Brown, Jack Random, Jeff Specific, Jen Order, Jake Candidates, Winston Churchill, Franklin Roosevelt]. If I see that precise order in your precinct, then I will {...}, else I will {...}.

It's also been argued that this is difficult to verify by hand in the case of a recount. However, if you're just trying to verify that the IRV (or seqPAV or RRV) winner is correct, you can do it in (# rounds) serial counts (meaning each precinct has to do a separate count for each round, but they can be done in any order) if the supposed order of elimination is known. (If there happens to be a disparity then you have a problem, but that's Somebody Else's Problem.)

This is a topic I want to follow because it will make or break the... morality of my much-desired multiwinner Approval research.

rob

@last19digitsofpi said in Precinct summability of IRV:

I think the argument is that if you had 20 candidates, then a malicious agent "Mallory" could tell "Alice" the following:

They could do that today with any RCV election, they just have to wait until the final ballots are published. (which are published in most elections, such as the one I showed)

Note that there is no reason they must publish the ballot data by precinct, whether we are talking about publishing it at some later date, or as results come in.

Finally, malicious agent Mallory could just have them vote by mail, where it is easy to monitor how someone is voting.

@last19digitsofpi said:

It's also been argued that this is difficult to verify by hand in the case of a recount.

Not sure I understand how that applies. The recount is not affected by the precinct submitting results as they come in. They still have all the ballots they can use for the recount.

last19digitsofpi

@rob said in Precinct summability of IRV:

Not sure I understand how that applies. The recount is not affected by the precinct submitting results as they come in. They still have all the ballots they can use for the recount.

Let's say there is a C1V, AV, and IRV election all of which are disputed.
The C1V election has a precinct that reported

A 571
B 482
C 6144
D 16
E 3
F 8

This can be easily checked by hand: sort all the ballots into piles for each candidate.

Now, consider an Approval Voting precinct:

G 883
H 476
I 340
J 181
K 1105

To verify this by hand, you'd need to check all ballots 5 times, since any individual ballot could approve multiple candidates.

Finally, consider an IRV precinct:

L,M,N,O 366
L,M,O,N 68
L,N,M,O 15
L,N,O,M 70
L,O,M,N 4
L,O,N,M 53
etc.

That would involve sorting the ballots into 24 piles to check the totals, or 4 passes through the ballots to check the winner, but if the latter check fails you have a crisis. (For example, suppose the "election night" results are L=50k, M=76k, N=60k, O=L+6. L is eliminated which ends up giving, say, O the victory after N is eliminated. But a recount finds an extra dozen votes for L, so O should have been eliminated first!)

(Actually... verifying the totals may not be as hard as I think. After the L-top ballots are separated, you can verify they add to 366+68+15+70+4+53. Then within the L-top ballots you can verify that 366+68 rank M second, 15+70 rank N second, and 4+53 rank O second.)

Jack Waugh

This would seem to counter a contention that @Sass has been emphasizing that IRV would feed suspicions that the public wouldn't be able to check whether the official tally were proper.

rob

@last19digitsofpi Hey 8451058495548583234,

I'm not entirely following you.... I think you are speaking of a different thing. You are comparing it to Approval voting, which I will admit allows for much more straightforward precinct summing as well as hand tabulation.

I'm not sure what you mean by "sort all ballots into piles". This might be a hand tabulation step, but then again it seems to me that you shouldn't have to actually physically sort them, since now they have to be stored separately, etc. Doesn't work so well when there are hundreds of thousands of ballots at a hand recount.

I'm mostly talking about how the precincts can, during and immediately following an election, send all necessary data so that we can know who is winning, and how close the other candidates are, without a long delay as is typical currently with ITV.

Notice that some ranked ballot elections could have precincts submit pairwise matrices (numCandidates * (numCandidates-1) numerical values ), and that counts as "all necessary data."

IRV requires more data than that, but not THAT much more. Little enough data that, even with a 10s of millions of ballots, is small enough to be able to paste the textual data into an email or forum post in a readable format.

Note that Bottom-2-IRV still requires all the ballots to fully tabulate, but because it is Condorcet compliant, the pairwise matrix data goes a lot further. Other methods (Min-Max for instance) can resolve it fully with a matrix.

@rob said in Precinct summability of IRV:

IRV requires more data than that, but not THAT much more. Little enough data that, even with a 10s of millions of ballots, is small enough to be able to paste the textual data into an email or forum post in a readable format

While I am certainly in agreement with you in spirit, in all fairness to the concern, there are some rare cases where the raw ballot data is legitimately huge. Try checking out some of the statewide AU Legislative Council elections (conducted using STV). I believe it was multiple GB

rob

@andy-dienes Multiple gigs? Holy crap.

I was specifically talking about single winner elections with basic ranked ballots. The election I show above is only 9000 ballots, but the data doesn't grow proportionately to the number of ballots, since it just increments a number when a ballot is added (if a ballot with an identical ranking has already been added).

It varies mostly based on the number of candidates. So if there are hundreds of candidates, sure, that's a lot of data. Still I can't imagine it in the gigabytes, unless something else entirely is going on.

If you've got such data, I would love to see it. You are speaking of literally a million times as much data as the burlington election, so I'm at a bit of a loss as to how that could happen.

@rob 2019 NSW Legislative Council STV Election with something like 350 distinct candidates . Granted, zipped it is "only" 100mb

rob

@andy-dienes It appears they went out of their way to make that about as inefficiently stored as they could.

If you give each candidate a very short abbreviation ("a" "b" and "c" work great if you have less than 26 candidates), show a whole rank ordering per line (e.g. "b>c>g>f>a"), and then combine all rank orderings into a single line preceded by a count (e.g. "4238: b>c>g>f>a"), you should be able to reduce it massively.

Have you parsed any of these giant files? I'd like to see how small we could get them.

@rob I think the main blowup of memory is their use of string dtypes everywhere; parsed in Python, a string is something on the order of magnitude of 60 bytes. The information it contains which is relevant to the actual election (if you are more clever about symbology) can usually fit in something more like... 2 bytes. I bet if we picked a more efficient format we could get the whole thing uncompressed to about 20 or 30 mb

rob

@andy-dienes said in Precinct summability of IRV:

I bet if we picked a more efficient format we could get the whole thing uncompressed to about 20 or 30 mb

Unless it represents something entirely different than what I'm imagining, I think we could get it down to 20 or 30 kilobytes.

@rob Well, I don't think you can bucket purely based on preference order, since that csv also contains information regarding metadata about the ballot. But probably you are still right and I am overestimating---I suppose we'll have to just try and find out

rob

@andy-dienes yeah I don't want any of that metadata since I am talking about a replacement for the precinct sums that come from, for instance, a choose-one election.

In those, all we need is a count for each candidate.

Here, we need a count of each "ballot configuration" that has at least one ballot, but we don't need anything else. (and I guess the number of possible configurations is factorial of the number of candidates, so it can get large as the number of candidates gets larger)

There is a place for all that extra data, but I just want each precinct to be able to send enough data that we can do the full tabulation, and no more. In IRV elections, they often say "if there is no candidate that has more than 50% of the first place votes, you have to wait for a week or more for us to tell you who the winner is."

Which I might be able to understand if they were sending in the results attached to the leg of a pigeon.

Jack Waugh

A pair of concepts that might have use when we think about cast ballots is "ballot token" and "ballot type". A ballot token then would be an individual ballot and a ballot type would be the equivalence class of all the ballots that say the same thing.