GPT and I invented a new voting system metric?

Psephomancy

Cooper 2001 makes a good point about SUE vs CE:

while the concept of Condorcet efficiency permits comparison of rules, the dichotomous nature of its evaluation renders it less than perfect. To see why, consider the following example:

A population arrayed along an axis running from −10 to +10, and with a median of 0, chooses between three candidates, located at positions 1, 2, and 8. The candidate at position 1—nearest the median—is of course the Condorcet winner. Decision rule A selects the candidate at position 1 fifty percent of the time and the candidate at position 2 fifty percent of the time. Decision rule B also selects the candidate at position 1 fifty percent of the time, but the remaining fifty percent it chooses the candidate at position 8.

[Condorcet Efficiency] rates decision rules A and B as equivalent, since each selects the Condorcet candidate 50 percent of the time. Yet surely these rules differ in an important way, since rule A selects a candidate near the median—though not that nearest the median—the remaining 50 percent of the time, whereas rule B selects the most extreme candidate the remaining 50 percent of the time.

She then introduces a new metric which "resembles in spirit the concept of social-utility efficiency":

I therefore use a different measure of a decision rule's efficiency. Specifically, I adapt the concept of the Mean Squared Error (MSE) …

Although social utility involves a different calculation (the ratio between the utility of the selected candidate and that of the utility-maximizing candidate), it, like the MSE, involves not merely a ranking of candidates, but a means of capturing the intensity of voters' preferences.

But her "MSE" measurement seems pretty different in concept to me, and is still based on utility, rather than pairwise rankings/defeats like CE is.

Green-Armytage 2015 defines Utilitarian Efficiency to be analogous to CE:

In this paper we define utilitarian efficiency operationally as the likelihood that sincere voting will result in the election of the candidate who maximizes the sum of the voters’ utilities. To the extent that we view elections as a way to select the ‘best’ candidate, utilitarian efficiency is a logical criterion to consider, because maximizing the sum of utilities is a plausible operational definition of being ‘best’.

So, ignoring MSE, the other three metrics can be placed in a table:

	Utility-based	Ranking-based
Likelihood of electing best	Utilitarian Efficiency	Condorcet Efficiency
Goodness of winner	Social Utility Efficiency	???

So I've been meaning to see if there is an actually analogous metric that fills in the blank. I asked some machines (ChatGPT3.5, Bing Chat, Character.ai) a few week ago if they knew of any, and they did not, so I asked them to complete the analogy, and with some prodding they stumbled around and came up with definitions and some OK names for it.

Then when ChatGPT4 was released, I asked, and it immediately understood what I was asking, invented a metric and name, and wrote out the whole derivation:

Apologies for any confusion. I'm glad to help define this new concept. Here's a possible formula for Pairwise Ranking Efficiency (PRE):

Let's denote the following variables:

n as the number of candidates

W as the elected candidate in a given election

C as the Condorcet winner (assuming one exists)

p(x, y) as the fraction of pairwise comparisons that candidate x wins against candidate y (0 <= p(x, y) <= 1)

Now, let's define the average pairwise winning fraction for a candidate x:
APWF(x) = (Σ [p(x, y)] for all y ≠ x) / (n - 1)
The Pairwise Ranking Efficiency (PRE) can then be calculated as:
PRE = (APWF(W) - APWF_avg) / (APWF(C) - APWF_avg)
where APWF_avg is the average APWF for all candidates:
APWF_avg = (Σ [APWF(x)] for all x) / n
This formula computes the Pairwise Ranking Efficiency by comparing the performance of the elected candidate (W) with the average candidate performance and normalizing it with respect to the performance of the Condorcet winner (C). The resulting value will range from 0 to 1, where a higher value indicates better performance in electing candidates with strong pairwise ranking properties.

It's not quite right, because:

"Fraction of pairwise comparisons" (0–1) is not the same units as n-1. Probably the (n-1) should just be dropped, because it's already a fraction? Or just use the count directly, since every term is going to be divided by the same thing anyway?
There may not always be a Condorcet winner (to be conceptually equivalent, it should compare to the "best" candidate, even if there is no CW)

People who are more familiar with the subtleties of Condorcet systems than me: Does this make sense as a conceptual parallel?

Social Utility Efficiency (=Voter Satisfaction Efficiency) is defined as:

(Utility(selected) - Utility(average))/(Utility(best) - Utility(average))

So for a ranking-based equivalent, we need to keep the (S-A)/(B-A) concept, but instead base it on number of pairwise defeats? So if CBB(x) = "Number of candidates beat by x", I think this is a simpler equivalent of GPT's idea?:

(CBB(selected) - CBB(average))/(CBB(best) - CBB(average))

Or do we have to do "number of candidates beaten by x minus number of candidates that beat x" to account for ties? I don't think those are equivalent?

Toby Pereira

I meant to reply to this ages ago, but basically we're looking at Condorcetness of a voting method, and how close to the Condorcet winner the winner of a method generally is.

I think it's arguably of less practical use than utility, because if Condorcetness is what you want, then you just adopt a Condorcet method. Whereas if utility is what you want, there isn't such an obvious path to it.

Having said that, it might be possible that strategic voting can cause the Condorcet winner not to be elected in some situations with a Condorcet method, so you might want to find the method that best maximises this thing you're measuring under certain strategic assumptions. In fact, Warren Smith would argue that score and approval voting might even be better at electing the Condorcet winner than Condorcet methods. See e.g. this.

Also in terms of Condorcetness, there is the "Game Theory" method, which is arguably the ultimate in Condorcet as I said in this post on the old CES Google group. That method also got discussed in this thread.

Toby Pereira

@psephomancy Just to add a bit more to this then - It's quite difficult to come up with the Condorcetness of a winner. Different Condorcet methods have different ways of determining a winner, so when there isn't a Condorcet winner, they can pick different winners. But by their own measure you'd say that they are picking the "most Condorcet" winner.

For example, you talk about the number of pairwise defeats. That's basically Copeland's Method, so would consider the Copeland winner to be the most Condorcet, but it fails independence of clones, and is generally not considered to be great from a theoretical point of view.

Similarly there's Minimax, which elects the candidate that has the smallest pairwise defeat (if there isn't a Condorcet winner). So your measure would be based on the size of the winner's worst single pairwise result. But this also fails independence of clones, among other criteria.

Then you have methods like Ranked Pairs and Schulze, which are known for their criterion compliance. However, candidates don't end up with a score that can be compared with the score of the best winner.

But we can perhaps use the Game Theory method, which I previously described as the ultimate in Condorcet. When there isn't a Condorcet winner, it is non-deterministic and picks between certain candidates with certain probabilities. But no strategy can beat it in the long term by the measure they are using (which I think is average pairwise win/loss).

So a method picks a winner, and you compare that winner against the Condorcet winner to see the pairwise result (which is just a draw if they are the same candidate). If there isn't a Condorcet winner, you compare the method winner against the Game Theory strategy overall. So if under the Game Theory method candidate A wins 50% of the time, B 40% and C 10%, you just looked at the weighted average result against these candidates.

So you now have this pairwise result. Say it's the margin of defeat or just 0 if it is the Condorcet winner. To turn it into the Pairwise Ranking Efficiency measure in the same way as the utility version, take the difference between that and the average defeat (use the positive value if better, negative if worse) and just divide by the average margin of defeat against the Condorcet winner (or lottery profile).

For example, candidate A wins by some method. But A is not the Condorcet winner and is beaten by a margin of 10 votes by the Condorcet winner. The average margin of defeat against the Condorcet winner for a candidate is 30 votes. So the PRE is (30-10)/30 = 2/3.

An alternative would be to use median margin of defeat rather than mean.

Psephomancy

@toby-pereira Criteria and strategy aren't relevant though; it's just a measurement of the "goodness" of the candidate.

Toby Pereira

@psephomancy said in GPT and I invented a new voting system metric?:

@toby-pereira Criteria and strategy aren't relevant though; it's just a measurement of the "goodness" of the candidate.

I was thinking this measure would be used to measure the quality a method in general (average score of winner), as is done with utility measures, not just individual winners on a one-off basis. It was all there as context, with my potential answer in the second post.

Edit - But to be crystal clear about the criteria - I discussed the criterion compliance of Copeland and Minimax to discuss whether they were the best measure of Condorcetness. We have to pick a measure of Condorcetness, and by doing that you are inevitably picking a Condorcet method. And so I was saying that I wouldn't want to pick a measure that wasn't cloneproof. On the other hand, the Game Theory method is arguably the ultimate Condorcet method, so it might make sense to use the measure from that method.

multi_system_fan

@toby-pereira https://jamesgreenarmytage.com/dodgson.pdf seems like an interesting improvement

Toby Pereira

@multi_system_fan It's quite interesting, although allowing more than one round of voting changes things quite a lot, and there are probably also other alternatives that might be as good or better.

I don't think you could apply it to the metric being considered in this thread though.

Psephomancy

@toby-pereira Sorry I wrote my previous short comment in line at the grocery store and forgot about this thread.

Yes, we need a measure of "Condorcetness" or "pairwise bestness". But, like the raw sum-of-utility measure, it doesn't need to be resistant to strategy or motivated by similar concerns that would apply to an actual voting system. It is only motivated by the philosophical "goodness" (representativeness) of the candidate, but I don't know Condorcet systems enough to know what that would be.

Toby Pereira

@psephomancy Yes, I agree about the resistance to strategy. But I see a failure of independence of clones as separate from strategy. I think if a measure isn't cloneproof it's probably not a good measure. Also, Copeland is very low resolution anyway in that it just looks at number of defeats rather than the size of any of them.

Psephomancy

@toby-pereira said in GPT and I invented a new voting system metric?:

I think if a measure isn't cloneproof it's probably not a good measure.

Why would that matter for a measure?

Also, Copeland is very low resolution anyway in that it just looks at number of defeats rather than the size of any of them.

That makes sense.

Toby Pereira

@psephomancy said in GPT and I invented a new voting system metric?:

@toby-pereira said in GPT and I invented a new voting system metric?:

I think if a measure isn't cloneproof it's probably not a good measure.

Why would that matter for a measure?

Because you can have a candidate that is the closest to being the Condorcet winner but not the Copeland winner. E.g.

14: A>B>C
4: B>C>A
12: C>A>B

A>B - 26:4
B>C - 18:12
C>A - 16:14

A has the biggest winning margin and smallest defeat, and is the nearest to a Condorcet winner by any reasonable measure. But then you can clone C and have a C1 and C2 but where C1 is always ranked above C2.

In this case A now has two defeats (against C1 and C2) so loses to both B and C1 in Copeland. But A is still the nearest to a Condorcet winner in terms of defeat sizes, so I would say they are still the "most Condorcet" winner.