MOV, Large Base ship half points and SOS

By Osoroshii, in X-Wing

I've taken some time to let the decision of the Half Points sink in befor I decided to get things off my chest. I'm disappointed in the direction the organized play is heading for X-Wing. I was never really a fan of the MOV (Margin of Victory) system to begin with. I could easily see the value of bulking up on ships to bleed less points in organized play. Up until MOV the small ship squads perform well in tournaments. The large base models had to fight their way in to the top seats, yes even the turrets.

Strength of Schedule (SOS), even with it's flaws is a much better system then MOV. In SOS you are trying to win every match while in MOV your trying not to lose. I understand the point of wanting to move away from a SOS system. At times when a player misses a cut because his first or second round opponent dropps from the tournament, it feels unfair. I don't disagree that missing cuts because of someone else's action feels wrong.

FFG has moved in a good direction setting a definitive round time at 75 min. I personally feel this is way more then enough time to reasonable finish a game of X-Wing. If you are routinely getting called on time, your not really playing a squad that can kill off another squad. And that is what MOV brought to tournament play.

There has been a game system running events and tournaments for over 25 year. In Hobby gaming, it is the 2,000lbs gorilla in the room. Yup, Magic the Gathering. They use Swiss rounds, with a SOS. They also don't have modified wins, you either win, lose or draw. I don't think it would be that bad to shift towards that kind of system.

With the 75 min rounds mostly all games finish, my guess would be 90% or better. Normally the matches that don't finish, don't have enough offense in their list as they are playing the MOV game instead of X-Wing. When time is called if both players have ships left on the table, they both failed to win the match regardless of points per ships. When both players fail to win its a draw. Wins are 5 match points, draws are 1 and losses are 0.

So when MOV was introduced over a year ago most applauded the change. It wasn't long before the large defensive ships started to rule the scene. Now after a year, we make the adjustment to fix MOV by making large base ships score diffently then every other ship. Now there are situations (although rare) you can lose a match without ever losing a ship. So, I guess what I'm asking is to abandon MOV and half points and move to system proven to work, SOS.

No

The only thing in SoS is that there are some occasions where conceding will increase your SoS. Now lets get this straight no one forfeits their way to victory (unless you are the Average Joes). However if you are about to lose your first game and your currently 3-0 conceding will give you a higher strength of schedule instead of playing it out.

As for the large ship MOV it makes sense because many of them end up with the squadron point cost for two small ships and most of them have at least twice the hit points or more of two Z-95s. They are often two ships of squadron points and hit points.

Half points for large ships isn't the only MOV fix their trying. They've also changed the pairing of rounds to help breakup the large ships from facing off.

SoS is random. Your primary tiebreaker should not be a random element if you have any other reasonable metric available to you. Strength of Schedule is never a goodboption, it just happens to be the only option in most cases.

SoS is random. Your primary tiebreaker should not be a random element if you have any other reasonable metric available to you. Strength of Schedule is never a goodboption, it just happens to be the only option in most cases.

Well they did do a random TIE breaker at the end of MOV instead of SoS as to avoid contestants from trying to game the system by say flying ships off the table to face an easier list. Although technically you could still do that but killing you MOV is a bad maneuver unless you win ever single game and not modified win but win.

Hells to the no.


I will forever be against going back to SoS simply due to the fact that it allows drop outs to take other people with them. That is entirely unacceptable and as long as it is possible, any other system is superior.


I also strongly object to the fact that your system would give additional incentive to running away with a last surviving ship, particularly fast ones. With MoV you are often encouraged to stay and try for that last kill to soften the loss, but with that out of the picture people will have no reason not to take their Squint or A-wing and run for the hills. Why risk walking away with nothing when you can take a guaranteed draw?


We left SoS behind for good reasons. Let's leave it behind. MoV isn't perfect, and even now is still a work in progress, but abandoning it because it still has problems and going back to a proven failure is like abandoning your car and going back to a horse-drawn carriage because you're having engine trouble.

Edited by DR4CO

I hace been years playing competitivo Magic and still dont get why people hate SOS just because the drop mayor cause somebody to fall from top, the number of times i've been in that posición was may be one, and i recognized i played against bad players.

I dont like MOV right now at least, i dont think SOS is bad

I hace been years playing competitivo Magic and still dont get why people hate SOS just because the drop mayor cause somebody to fall from top, the number of times i've been in that posición was may be one, and i recognized i played against bad players.

I dont like MOV right now at least, i dont think SOS is bad

It happened three times to me in one year. Two times it only mattered for bracket prizes, but the first time it cost me the top-8 cut.

I am a fan of MOVof SOS as it forced you to play out the game until end most times, I am not as big of a fan of the New big ship rule, I completely understand why they did it. but it just is not great, The point cost is worked into the big ship. Point for point does not match up with small and big ships. Fat Han as an example is basically the same point cost as half a swarm, not 2 of the ships. As it stands now you are almost penalized for taking big ships. I actually lost the other day without losing a ship, 2 brobots 0 shields, 3 and 2 hull respectively, giving up 50 points in the new scoring, while I was able to kill 3 ships worth 44 points, The last 2 ships were on their last hull, but ran away and defended up for the last 8-10 minutes cause they did not have to fight to win the game. This tends to happen a lot more now than it did over the past few months. I think this new rule hurts the game more than it fixes it

I think a fix could be that you only get half points if it goes to time. The is you with large tanks is that they would get a win and run, Now while you cannot stop this from happening all the time, it does help to limit it.

Also from all the games I have seen, the chance to 75 minutes really has put a major dent in the number of games that go to time.

I'm all for the change to half points as the old system was untenable and incredibly restrictive regarding competitive builds.

As others have mentioned I do find losing without losing a ship a little odd though if you have destroyed some of theirs.

Just spitballing, how workable would it be to have the win-loss judged on ships left on the table but have the half points then kick in for MOV of the loser?

Win with two beat up brobots still standing, you win with a MOV based on the ships left on the table. The opponent loses but gets a MOV of two half point Brobots.

Would losing with a positive MoV be too weird? Other issues I'm missing as a fairly new competitive player?

SoS is random. Your primary tiebreaker should not be a random element if you have any other reasonable metric available to you. Strength of Schedule is never a goodboption, it just happens to be the only option in most cases.

It is just as random as MOV. Because you can't control who or what you play. And certain matchups will give you different results in MOV.

While I prefer SOS, I don't exactly hate MOV. I like that SOS shows the quality of players that you played against, not merely how much you beat up on someone.

I still like MOV over SOS. I like it being in your hands, rather than a previous opponent. It should also be noted that SOS for MTG works in MANY different formats, like Draft, Sealed, Standard, modern, vintage, legacy, Commander, etc. Also, MTG players do a best 2 of 3 format in almost all formats. If you're going to recommend the 2000lb gorilla, you should recommend him in its full. Play 2 out of 3? Sure, tournaments just got MUCH longer. The only thing that comes close to this in Magic format is Two-headed Giant. You play one game, but its a 2v2 game with slightly different rules that cater to one game taking all. Every time I've played 2HG MTG, we end up playing best 2 out of 3 anyway, and I would say that in that format, whomever wins the first one doesn't ever lose. Its very well done from a sealed perspective.

MOV as it currently sits allows you to control your destiny in these tournaments, rather than your oppenent that left. I really like that. I think the change to large ships MOV will de-incentivize them which is nice. You can't "game the clock" anymore, or maybe you can, but its definitely a lot harder to. It brings it down to how well you pummel your opponent, which in a 1v1 where board positioning arguably matters more than dice rolls makes far more sense. I only wish there was some combination of the two.

Edited by jonnyd

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

One aspect against Osoroshii's view: MoV still has to be used to determine the victory condition if the game goes to time. This inherently distorts the optimal tactics (and encourages slow play abuse) relative to how untimed games would be played. So even if you use SoS as the tiebreaker, the bigger problem is still to get the scoring "correct" for timed games. And even with the new change we are absolutely not there yet.

Edited by MajorJuggler

My only issue with a getting a draw for an unfinished game is, it isn't always your fault.

1. The opponent might run his ships away, a draw is better than a loss.

2. Dice can affect how long a game lasts. If there are a large number of double or triple evade results, especially early in the match, then games can take a long time, even if red dice are hot, and optimal play has been used.

3. I had a third, but i forgot, it was pretty good!

Based off what MJ said, though, moving to a normalized SoS using FFGs software wouldnt be a bad idea. I never particularly liked MoV, especially because dice could control it.

My understanding, with regard to magic, is that they currently run enough rounds so that all X-1 players make the top cut. That doesn't happen in X-Wing. I'm not sure SoS would be as bad if that were the case, but since it isn't I think that players need a method that they have some control over.

It is just as random as MOV. Because you can't control who or what you play. And certain matchups will give you different results in MOV.

This was always my argument for partial points of some kind. Your opponent was in control of your opportunities for the division of your MoV. At least now, almost every list is going to split it at least 3 ways.

Strength of Schedule (SOS), even with it's flaws is a much better system then MOV.

Strength of schedule is a bad idea in a game like X-wing, where tournaments feature a small number of games, and where quantitative information about a players' performance is limited to that player's performance in the tournament.

There is a really detailed mathematical basis for this, but here's the short version. Each player's record has some uncertainty associated with it: a player's tournament record is the output from a system, but player skill is just one of the inputs. Performance depends on a lot of factors including dice, matchup, fatigue, etc.

For reasons I won't go into, using strength of schedule concentrates that uncertainty in the center of the distribution--for instance, on players who are 3-2 or 2-3 in five rounds of qualifying Swiss play. Which in turn means strength of schedule provides the weakest, least valid answers for exactly those players who are nearest to the cut.

Strength of schedule does not work in X-wing. It's not much better--in a technical, statistical sense--than selecting players randomly.

FFG has moved in a good direction setting a definitive round time at 75 min. I personally feel this is way more then enough time to reasonable finish a game of X-Wing. If you are routinely getting called on time, your not really playing a squad that can kill off another squad. And that is what MOV brought to tournament play.

That's completely inaccurate. Go back and look at the first edition of the tournament rules (where strength of schedule was the most important tiebreaker): they tell you how to score games that are called at time. Point fortresses have become well-defined and frequently used in X-wing over the past year, but they certainly existed in Wave 2.

There has been a game system running events and tournaments for over 25 year. In Hobby gaming, it is the 2,000lbs gorilla in the room. Yup, Magic the Gathering. They use Swiss rounds, with a SOS. They also don't have modified wins, you either win, lose or draw. I don't think it would be that bad to shift towards that kind of system.

I've already explained why it's a bad idea, but I'll tack this on, too: neither "everyone else does it this way" nor "people like us have always done it this way" are compelling arguments.

With the 75 min rounds mostly all games finish, my guess would be 90% or better. Normally the matches that don't finish, don't have enough offense in their list as they are playing the MOV game instead of X-Wing.

That's a huge assumption. "Normally" a match that doesn't finish in 75 minutes is due to an offense problem in... one player's list? Both lists? And on that basis, you're okay with a system that penalizes both players?

So when MOV was introduced over a year ago most applauded the change. It wasn't long before the large defensive ships started to rule the scene.

Post hoc, ergo propter hoc.

Now after a year, we make the adjustment to fix MOV by making large base ships score diffently then every other ship. Now there are situations (although rare) you can lose a match without ever losing a ship. So, I guess what I'm asking is to abandon MOV and half points and move to system proven to work, SOS.

FFG moved away from SOS because it was proven (through a great deal of experience) not to work for X-wing.

I hace been years playing competitivo Magic and still dont get why people hate SOS just because the drop mayor cause somebody to fall from top, the number of times i've been in that posición was may be one, and i recognized i played against bad players.

I dont like MOV right now at least, i dont think SOS is bad

It's not terribly good in competitive Magic, either, but Magic has a couple of advantages--namely, it's a shorter game. That means a match between two players is actually best-of-three, which eliminates a lot of noise from the system.

It is just as random as MOV.

It isn't. SOS concentrates uncertainty in the middle of the pack, while MOV spreads it out. That's a bad thing if you're trying to compare multiple undefeated players, but in typical circumstances, X-wing's Swiss -> single-elimination tournament system makes substantially more valid, more defensible decisions at the cut under MOV than it did under SOS.

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

Drops can be figured out or worked around, but the problems with strength of schedule are baked in. It's a sampling problem, actually--I'd be happy to talk about it by PM, if you're interested.

Basically, if you're talking about a baseball team that's played 162 games, strength of schedule is pretty meaningful. It's questionable in football (or hand-egg, for members of the non-US X-wing community ), where teams play just 16 games, and it's absurd in X-wing where the Swiss qualifiers might consist of just four games.

One aspect against Osoroshii's view: MoV still has to be used to determine the victory condition if the game goes to time. This inherently distorts the optimal tactics (and encourages slow play abuse) relative to how untimed games would be played. So even if you use SoS as the tiebreaker, the bigger problem is still to get the scoring "correct" for timed games. And even with the new change we are absolutely not there yet.

Yeah. Timed play inevitably distorts the game as compared to untimed play, and using strength of schedule compounds that problem rather than fixing it.

Edited by Vorpal Sword

...uh, I should say I'm not saying X-wing's Swiss -> single-elimination format with MOV as a tiebreaker is perfect. In fact, it's pretty bad if you view it as a system that's supposed to reliably find the best group of 2^n players and rank that group in order of skill.

But if the only options are MOV or SOS, MOV wins every time.

Strength of schedule is a bad idea in a game like X-wing, where tournaments feature a small number of games, and where quantitative information about a players' performance is limited to that player's performance in the tournament.

There is a really detailed mathematical basis for this, but here's the short version. Each player's record has some uncertainty associated with it: a player's tournament record is the output from a system, but player skill is just one of the inputs. Performance depends on a lot of factors including dice, matchup, fatigue, etc.

For reasons I won't go into, using strength of schedule concentrates that uncertainty in the center of the distribution--for instance, on players who are 3-2 or 2-3 in five rounds of qualifying Swiss play. Which in turn means strength of schedule provides the weakest, least valid answers for exactly those players who are nearest to the cut.

Strength of schedule does not work in X-wing. It's not much better--in a technical, statistical sense--than selecting players randomly.

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

Drops can be figured out or worked around, but the problems with strength of schedule are baked in. It's a sampling problem, actually--I'd be happy to talk about it by PM, if you're interested.

Basically, if you're talking about a baseball team that's played 162 games, strength of schedule is pretty meaningful. It's questionable in football (or hand-egg, for members of the non-US X-wing community ), where teams play just 16 games, and it's absurd in X-wing where the Swiss qualifiers might consist of just four games.

Qualitatively that makes a lot of sense. Out of curiosity, do you have any links for papers on the subject? I haven't done a literature survey, but I'm curious.

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

One aspect against Osoroshii's view: MoV still has to be used to determine the victory condition if the game goes to time. This inherently distorts the optimal tactics (and encourages slow play abuse) relative to how untimed games would be played. So even if you use SoS as the tiebreaker, the bigger problem is still to get the scoring "correct" for timed games. And even with the new change we are absolutely not there yet.

I think Osoroshii's system has games that go to time as draws, so no MoV worries in that circumstance. Although I don't like SoS, the system proposed upthread does encourage very offensive play, and as you have noted a couple times this week, the point of X-wing is to destroy the other squad, not destroy ships.

I'd be happy to talk about it by PM, if you're interested.

Please don't; this is getting good.

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

One aspect against Osoroshii's view: MoV still has to be used to determine the victory condition if the game goes to time. This inherently distorts the optimal tactics (and encourages slow play abuse) relative to how untimed games would be played. So even if you use SoS as the tiebreaker, the bigger problem is still to get the scoring "correct" for timed games. And even with the new change we are absolutely not there yet.

I think Osoroshii's system has games that go to time as draws, so no MoV worries in that circumstance. Although I don't like SoS, the system proposed upthread does encourage very offensive play, and as you have noted a couple times this week, the point of X-wing is to destroy the other squad, not destroy ships.

If games that go to time are automatically draws, then you give the losing player a HUGE amount of incentive for the game to go to time.

When time is called if both players have ships left on the table, they both failed to win the match regardless of points per ships. When both players fail to win its a draw. Wins are 5 match points, draws are 1 and losses are 0.

So, reading this again, I think that is a really really bad idea. If you are losing, now you can stall to at least get one point. That one point now puts you above everyone else who instead had a loss. So if you play 6 rounds and some 5-1 players will not make the cut, then now to be sure you make the cut you need to force your loss to go to time. Except that everyone else is going to have to do the same thing as well. Since you don't know if you will eventually be a 5-1 player at the end of the tournament, this means that every player will have to do this every game. Which means that in every game, one player is going to want the game to go to time. So, instead of 90% of games [edit: not] going to time....

Edited by MajorJuggler

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

Drops can be figured out or worked around, but the problems with strength of schedule are baked in. It's a sampling problem, actually--I'd be happy to talk about it by PM, if you're interested.

Basically, if you're talking about a baseball team that's played 162 games, strength of schedule is pretty meaningful. It's questionable in football (or hand-egg, for members of the non-US X-wing community ), where teams play just 16 games, and it's absurd in X-wing where the Swiss qualifiers might consist of just four games.

Qualitatively that makes a lot of sense. Out of curiosity, do you have any links for papers on the subject? I haven't done a literature survey, but I'm curious.

I looked at it a while back (looking for ammunition for an e-mail to Organized Play, actually, before they moved to MOV). Basically, there's a lot of information about there about the general design of tournaments1, but not a lot about strength of schedule specifically. I ended up having to work a lot of it out based on other resources, and I still have notes that I'd eventually like to write up in my copious spare time.

1 See, for example, Scarf et al. (2008), "A numerical study of designs for sporting contests." European Journal of Operational Research, 198 (2009) 190-198. doi:10.1016/j.ejor.2008.07.029

I'd be happy to talk about it by PM, if you're interested.

Please don't; this is getting good.

Okay, you asked for it. ;)

One way to look at tournaments is as a mechanism for picking some number (often 1) of "players" from a set of an arbitrarily large size. The players could be actual individual players like in X-wing, or teams like in baseball, or more abstractly they could be people competing for a job. We'll just call them all players, and assume there's an operation that compares any pair and determines which is better.

A psychometrician named Thurstone developed something called the law of comparative judgment in the 1920s. Thurstone was interested in determining things like the smallest difference people could perceive between two stimuli--say, brightness of a light, and the law of comparative judgment was how he approached the problem.

It's not really a law, but actually a mathematical model that ranks a set of objects in exactly the way we need to perform our tournament: it compares every possible pairwise combination of objects, and uses those comparisons to not only rank the objects but actually determine their positions with respect to some hypothetical underlying variable that causes their performance.

That's a direct analogy to a round-robin tournament that matches every player against every other player. If I put 128 X-wing players in a round-robin tournament, I could use the results to determine (say) a z-score for each of them. (With today's mathematical tools at hand, I'd probably use a Rasch model, which is a pretty linear descendant of the law of comparative judgment.) But for obvious reasons, that's not feasible unless you have three weeks for your tournament. But round-robin is the best way to do it, if your goal is actually to figure out how good every player in the field is with respect to every other player.

Notably, though, even in a round-robin tournament there's some error associated with that set of z-scores because there's error associated with the outcome of an X-wing game. If I play you and you win, it could be because you're better than I am. But it could also be that your list had a particularly good matchup against mine, or that you had excellent results on the dice in the first few rounds, or it could be that twenty minutes into the game I was elbowed in the kidneys by a celebrating player who was standing too close to me. In this very simple model of player performance, everything that affects the outcome of an X-wing game that isn't skill is represented as error, and it turns out that X-wing is a noisy (that is, error-prone) process.

So now picture those players laid out on a number line (a graph of the underlying variable of "skill"). Each person who participated in this enormous round-robin tournament has both an estimated location on the line, represented by a dot, and an interval (represented by a shaded area) that indicates potential error in the estimate. (A poll of registered voters might put a candidate's support at 27%, plus or minus 3.5%. That 3.5 percent is the potential error.)

Some of those error regions might overlap, and if they do, that's the mathematical model we're using admitting that the ranking might be wrong--player #38 and player #39 might really belong in the opposite order, if the model was a little bit wrong about both of their skill levels.

But critically, this method (round-robin analyzed using a particular set of mathematical tools, if you're losing track) is the most accurate possible way to determine player's positions on that line. Since it's clearly not feasible to do it, we invented alternatives. Single-elimination is an old solution, and it has a huge payoff in terms of time--it's literally an exponential decrease in the number of rounds I have to run. But you pay for it in terms of accuracy, because now you're testing each player against a limited sample of other players. And that blows up those error regions by a lot. Now there are a lot of players with overlapping error bars (again, meaning you can't really resolve which of them is better), and the error is uniformly distributed across all players. You lost resolution, and you lost it everywhere at once.

Swiss tournaments are a newer solution, and they work--in mathematical terms--not by reducing that error, but by moving it. The Swiss system says "hey, we care most about the players at the top, right? Let's focus on getting good estimates for them!" And that's just what it does. When you apply mathematical tools to the Swiss system (usually using large simulations), what happens is that because you're always comparing the best players to the best players, the uncertainty around those players decreases rapidly. Unfortunately, for the middle ranks anyway, you get that advantage by blowing up the error bars for players in the middle of the distribution even more than you did under single-elimination.

The best qualitative explanation I can come up with for how this happens is that in order for someone who's "really" a 3-2 player to end up at 5-0, all of the error in each of her games has to point in the same direction--she would have to be not just consistently lucky but lucky in a set of increasingly discriminating games. Because that's unlikely, the error bars are pretty small. But in order for someone who's "really" a 3-2 player to end up at 2-3, bad luck only has to strike once. So the error bars are typically going to be so large that we can't tell players in adjacent score brackets apart.

So how does strength of schedule tie into all of this? It's conceptually just the sum of your opponents' skill estimates, which means the error in your strength of schedule is the sum of the errors in each of those estimates. Players in the middle of a Swiss ranking tend to have played other players in the middle; players on the outside tend to have played mostly players on the outside. In each round, we're "pushing" error away from the players on the outside, at the cost of our ability to tell the difference between players on the inside. And that means with each round, strength of schedule gets less and less meaningful for those players on the inside.

And as we figured out just now, we can't even reliably tell the difference between a 3-2 player and a 2-3 player by the end of 5 Swiss rounds. So when we add up the error in everyone two 3-2 players played against, we get those players' strength of schedule--but there's so much error wrapped up in strength of schedule that it's almost literally meaningless.

(I should say that all of this has been presented more-or-less qualitatively, but it can be demonstrated quantitatively. I haven't done it, and to my knowledge no one else has. It would be a straightforward but very time-consuming task, for someone with the requisite knowledge base. You could also take a shortcut and do it by simulation, which would be less time consuming but requires a set of software tools I don't have and would still be at least moderately time-consuming...)

Very well explained Vorpal Sword, thank you.

RoV

One aspect in favor of Osoroshii's view: SoS in the FFG software is now normalized to how many games your opponent played, so drops don't matter. That is a VERY good thing.

Drops can be figured out or worked around, but the problems with strength of schedule are baked in. It's a sampling problem, actually--I'd be happy to talk about it by PM, if you're interested.Basically, if you're talking about a baseball team that's played 162 games, strength of schedule is pretty meaningful. It's questionable in football (or hand-egg, for members of the non-US X-wing community ), where teams play just 16 games, and it's absurd in X-wing where the Swiss qualifiers might consist of just four games.

Qualitatively that makes a lot of sense. Out of curiosity, do you have any links for papers on the subject? I haven't done a literature survey, but I'm curious.

I looked at it a while back (looking for ammunition for an e-mail to Organized Play, actually, before they moved to MOV). Basically, there's a lot of information about there about the general design of tournaments1, but not a lot about strength of schedule specifically. I ended up having to work a lot of it out based on other resources, and I still have notes that I'd eventually like to write up in my copious spare time. 1 See, for example, Scarf et al. (2008), "A numerical study of designs for sporting contests." European Journal of Operational Research, 198 (2009) 190-198. doi:10.1016/j.ejor.2008.07.029

I'd be happy to talk about it by PM, if you're interested.

Please don't; this is getting good.

Okay, you asked for it. ;)

One way to look at tournaments is as a mechanism for picking some number (often 1) of "players" from a set of an arbitrarily large size. The players could be actual individual players like in X-wing, or teams like in baseball, or more abstractly they could be people competing for a job. We'll just call them all players, and assume there's an operation that compares any pair and determines which is better.A psychometrician named Thurstone developed something called the law of comparative judgment in the 1920s. Thurstone was interested in determining things like the smallest difference people could perceive between two stimuli--say, brightness of a light, and the law of comparative judgment was how he approached the problem. It's not really a law, but actually a mathematical model that ranks a set of objects in exactly the way we need to perform our tournament: it compares every possible pairwise combination of objects, and uses those comparisons to not only rank the objects but actually determine their positions with respect to some hypothetical underlying variable that causes their performance.That's a direct analogy to a round-robin tournament that matches every player against every other player. If I put 128 X-wing players in a round-robin tournament, I could use the results to determine (say) a z-score for each of them. (With today's mathematical tools at hand, I'd probably use a Rasch model, which is a pretty linear descendant of the law of comparative judgment.) But for obvious reasons, that's not feasible unless you have three weeks for your tournament. But round-robin is the best way to do it, if your goal is actually to figure out how good every player in the field is with respect to every other player. Notably, though, even in a round-robin tournament there's some error associated with that set of z-scores because there's error associated with the outcome of an X-wing game. If I play you and you win, it could be because you're better than I am. But it could also be that your list had a particularly good matchup against mine, or that you had excellent results on the dice in the first few rounds, or it could be that twenty minutes into the game I was elbowed in the kidneys by a celebrating player who was standing too close to me. In this very simple model of player performance, everything that affects the outcome of an X-wing game that isn't skill is represented as error, and it turns out that X-wing is a noisy (that is, error-prone) process. So now picture those players laid out on a number line (a graph of the underlying variable of "skill"). Each person who participated in this enormous round-robin tournament has both an estimated location on the line, represented by a dot, and an interval (represented by a shaded area) that indicates potential error in the estimate. (A poll of registered voters might put a candidate's support at 27%, plus or minus 3.5%. That 3.5 percent is the potential error.) Some of those error regions might overlap, and if they do, that's the mathematical model we're using admitting that the ranking might be wrong--player #38 and player #39 might really belong in the opposite order, if the model was a little bit wrong about both of their skill levels. But critically, this method (round-robin analyzed using a particular set of mathematical tools, if you're losing track) is the most accurate possible way to determine player's positions on that line. Since it's clearly not feasible to do it, we invented alternatives. Single-elimination is an old solution, and it has a huge payoff in terms of time--it's literally an exponential decrease in the number of rounds I have to run. But you pay for it in terms of accuracy, because now you're testing each player against a limited sample of other players. And that blows up those error regions by a lot. Now there are a lot of players with overlapping error bars (again, meaning you can't really resolve which of them is better), and the error is uniformly distributed across all players. You lost resolution, and you lost it everywhere at once. Swiss tournaments are a newer solution, and they work--in mathematical terms--not by reducing that error, but by moving it. The Swiss system says "hey, we care most about the players at the top, right? Let's focus on getting good estimates for them!" And that's just what it does. When you apply mathematical tools to the Swiss system (usually using large simulations), what happens is that because you're always comparing the best players to the best players, the uncertainty around those players decreases rapidly. Unfortunately, for the middle ranks anyway, you get that advantage by blowing up the error bars for players in the middle of the distribution even more than you did under single-elimination. The best qualitative explanation I can come up with for how this happens is that in order for someone who's "really" a 3-2 player to end up at 5-0, all of the error in each of her games has to point in the same direction--she would have to be not just consistently lucky but lucky in a set of increasingly discriminating games. Because that's unlikely, the error bars are pretty small. But in order for someone who's "really" a 3-2 player to end up at 2-3, bad luck only has to strike once. So the error bars are typically going to be so large that we can't tell players in adjacent score brackets apart. So how does strength of schedule tie into all of this? It's conceptually just the sum of your opponents' skill estimates, which means the error in your strength of schedule is the sum of the errors in each of those estimates. Players in the middle of a Swiss ranking tend to have played other players in the middle; players on the outside tend to have played mostly players on the outside. In each round, we're "pushing" error away from the players on the outside, at the cost of our ability to tell the difference between players on the inside. And that means with each round, strength of schedule gets less and less meaningful for those players on the inside. And as we figured out just now, we can't even reliably tell the difference between a 3-2 player and a 2-3 player by the end of 5 Swiss rounds. So when we add up the error in everyone two 3-2 players played against, we get those players' strength of schedule--but there's so much error wrapped up in strength of schedule that it's almost literally meaningless. (I should say that all of this has been presented more-or-less qualitatively, but it can be demonstrated quantitatively. I haven't done it, and to my knowledge no one else has. It would be a straightforward but very time-consuming task, for someone with the requisite knowledge base. You could also take a shortcut and do it by simulation, which would be less time consuming but requires a set of software tools I don't have and would still be at least moderately time-consuming...)

I wish I could like this more than once!

Edited by Criwi Romed