Proton Rockets, really

By Jobu, in X-Wing

Is anyone else concerned that with all of the math talk we are losing sight that this is a game, we will all use the ships/upgrades we feel like, and try to make them work. Who cares if some computer says taking a TL at X range when you are rolling Y dice against Z defense dice is W% more efficient than taking a focus, if my gut tells me to take a focus I will take a focus as would most players. If we put to much emphasis on the statistics behind this game we might as well insert lists into a computer at the start of a tournament and have it spit out a projected winner and save ourselves the 5 hours that it would take to play all of the rounds.

[Dismounts Soab Box Now]

P.S. I kinda wish there was a program like that as like an "odds" calculator sorta like when you watch poker games on TV and they show the % chance of a player's hand winning, I think it would be fun to see who "beats the odds" either through superior pilot skill/maneuvering or just plain gets lucky dice. Heck for some of the larger tournaments like Worlds we can do a March Madness Bracket, albeit it would be tricky if there are any swiss rounds at the beginning.

It doesn't matter what you're looking at data for, insufficient sample size is a serious problem.

Treat this as a poll. Even for the non-crunchy data points that you are looking for, we still have far too few participants to gain any conclusive data from.

The issue is, you're only seeing the "cream of the crop" lists and calling it a small sample size. I would wager the vast, vast majority of people placing in the top positions of the regionals events aren't bringing brand new untested lists to that event. They have play tested and tweaked and experimented in games leading up to the event and figured out what works for them. They then took a list that works for them and competed against a bunch of other people that took lists that worked for them. And the best players with the best lists and the best luck rose to the top of those events.

You're looking at the cap stone and calling the whole pyramid small. Somewhere along the long line things like APTs got filtered out. Yes they can be great in certain circumstances but those circumstances don't present themselves often enough to justify the cost.

Is anyone else concerned that with all of the math talk we are losing sight that this is a game, we will all use the ships/upgrades we feel like, and try to make them work. Who cares if some computer says taking a TL at X range when you are rolling Y dice against Z defense dice is W% more efficient than taking a focus, if my gut tells me to take a focus I will take a focus as would most players. If we put to much emphasis on the statistics behind this game we might as well insert lists into a computer at the start of a tournament and have it spit out a projected winner and save ourselves the 5 hours that it would take to play all of the rounds.

I do agree with this to an extent. Understanding the odds are helpful to people coming into the game though. I don't memorize the odds in any given situation but understanding WHY certain actions are better at certain times than others is important to developing as a player.

For instance, an unmodified attack die has a 50% chance to hit. A focused attack die hits 75% of the time and a TLed attack die hits 75% of the time. However, a TLed attack die doesn't offer any defensive benefits. Why, then, would anyone ever choose TL instead of a focus?! Well, with the TL, you have a higher chance of delivering a crit. The 25% increased chance of hitting on a focus will NEVER lead to an additional crit, whereas a TL might. Therefore, a TL makes sense in certain circumstances (if you are desperately praying for a crit, or aren't going to be receiving any return fire, or are perhaps setting up for a future round).

It's rudimentary, I know, but I explained that to a friend just getting into the game recently and things started clicking. We may take some of the math for granted, but having a general understanding of what and why helps us decide what upgrades compliment each ship and pilot, and what actions to take when.

That said, white knuckling spreadsheets so hard that you're unwilling to experiment with what works for you only does yourself a disservice. But I'm sure you've either seen threads or spoken to down-on-their-luck players who say they don't know what they're doing wrong but they keep losing. If they don't understand the odds on their choices, then they will have a hard time learning to improve.

gamers percieve their games far more accuratly than people percieve most things statistically studied.

LOL.

:rolleyes:

Is anyone else concerned that with all of the math talk we are losing sight that this is a game, we will all use the ships/upgrades we feel like, and try to make them work. Who cares if some computer says taking a TL at X range when you are rolling Y dice against Z defense dice is W% more efficient than taking a focus, if my gut tells me to take a focus I will take a focus as would most players.

Nope, not concerned at all. It's a dice game, you win by understanding how dice work. And if you understand how dice work and play enough your "gut feeling" decision is going to be the correct one. Going deliberately against the odds because you think you're "feeling lucky" or whatever is just stupid.

The issue is, you're only seeing the "cream of the crop" lists and calling it a small sample size. I would wager the vast, vast majority of people placing in the top positions of the regionals events aren't bringing brand new untested lists to that event. They have play tested and tweaked and experimented in games leading up to the event and figured out what works for them. They then took a list that works for them and competed against a bunch of other people that took lists that worked for them. And the best players with the best lists and the best luck rose to the top of those events.

But there is a different small sample size issue: the fact that it only considers a very small aspect of the game, competitive tournaments played under a specific set of tournament rules. And those rules impose two very important limits on list choices:

1) They have to play quickly, without requiring too much time to think. The time limit rules out options, whether because the player can't fly swarm fast enough, the list requires too many turns of fancy maneuvering and can't score a full win, it's too easy to make fatal dial mistakes when you're trying to play as fast as possible to avoid a penalty for stalling, etc. And remember that you're doing all of this under the mental fatigue of a long day of gaming. In the case of APTs it's a weapon that depends on making good maneuvering choices and has little margin for error. Why deal with that when a YT-1300 is so much easier to fly?

2) Avoiding bad luck is more important than maximizing your chances of winning. Since you have to win all of your games (with full wins!) to win the tournament you have no margin for error. This isn't MTG where you have best of 3/5/7/etc games, and an occasional game loss to bad luck isn't fatal. For example, a TIE interceptor has about a 5% chance of rolling all blanks on its evade dice, which means that over an entire tournamnet you've got a pretty good chance of seeing it happen. And when it happens, especially early in the game, you've got a pretty good chance of losing the game and therefore losing the tournament. So even if a TIE interceptor list is great for a one-time pickup game it's a weak choice for a tournament. APTs are probably in a similar situation, they can fail to work just often enough that you can't count on them in a multi-game tournament.

In short, the tournament rules push players to safe, predictable lists like swarm/XXBB/HSF/etc instead of fancy tricks. If you want to fix the sample size issue you need to consider the entire game, not just the special tournament version, and play a large pool of single games with players that are building lists to maximize their chances of winning that single game.

In short, the tournament rules push players to safe, predictable lists like swarm/XXBB/HSF/etc instead of fancy tricks. If you want to fix the sample size issue you need to consider the entire game, not just the special tournament version, and play a large pool of single games with players that are building lists to maximize their chances of winning that single game.

No insult intended but I think you are clutching at straws here.

Fancy stuff does not win wars (war games), consistent (solid) performers do.

Edited by Elkerlyc

1. How does a TIE swarm fit into the "must play quickly" category, but APTs don't? Do they take longer to use than assigning dials/maneuvering/taking actions/shooting with a 7 ship list?

2. In a competitive environment nothing is more important than maximizing your chances of winning. Reducing the amount you rely on luck is part of that. The very restrictive conditions that make APTs useful, balanced against their obscene point cost means you are relying on quite a bit of luck to make them useful. They just aren't often enough.

According to MajorJuggler's regionals statistics thread, there have been 0 Advanced Proton Torpedoes so far finishing in the top 1/3 of regionals lists. That would imply that Advanced Proton Torpedoes are not worthwhile, and not a good benchmark for game balance.

I've said this before and will again... that list is a small sample size statistically.

That is 100% not the point. It's an excellent baseline of what players see as competitive. The best one we can possibly get. When you look at a regionals event most players are going to bring their A-Game, their best list. They won't bring something experimental. They won't bring something they're trying to make work. So we start to see how people perceieve the game, and gamers percieve their games far more accuratly than people percieve most things statistically studied.

Sample size doesn't play into it. At all.

Focus on the see. Players copy each other. XXBB exploded after Paul Heaver won with it. One of the things I like about the new meta is the rockpaperscissors ring it kind of creates that'll hurt netdecking really, really badly. After that, we may see a fair bit more variety.

In short, the tournament rules push players to safe, predictable lists like swarm/XXBB/HSF/etc instead of fancy tricks. If you want to fix the sample size issue you need to consider the entire game, not just the special tournament version, and play a large pool of single games with players that are building lists to maximize their chances of winning that single game.

This is why I'd like to see some 'campaign weekend' events - something like 40k campaign weekends with 3-4 unlimited(ish)-time games and scenarios rather than the default 50 minute kill-fest.

Scenarios will push you to look at other things.

As iPeregrine pointed out, the tournament rules do favor some lists over others.

Playing XXBBs or HSF is a safe list. You rely only on attack dice, and have the typical modifiers ensure predictible results. Predictible results is the key. As has been said, you win these tournaments by moving as far away from luck as you can while adding redundancy.

But many of the missiles and torpedoes we've seen already are simply not predictible weapons. And that's intentional. APTs can one-shot most ships in the game. Heck, Concussion missiles can one-shot a good number of ships, and that's at range 3. Though they don't always. That's why tournament players don't like them - they're not gamblers. That's why they stay away from high PS Interceptors and the like.

The problem here is that people seem to be suggesting that, through various means, missiles and torpedoes should have their reliability increased, which isn't something they're designed for. If you end up with ordnance that can land their damage reliably, you end up in a situation where you pay 6 (or so) points for an upgrade that guarantees to remove 12+ points of your opponent's ships when fired, which is quite a steal. If you didn't have to work to get a TL + Focus (with possibly Deadeye involved) to get APTs to unload 5 damage, it would be quite unbalanced and result in a game where most ships don't survive the first round of shooting.

But now we have one missile that is above average in reliability when equipped to one of two ships that are almost unanimously considered underpowered, and I fail to see where the problem is.

The problem here is that people seem to be suggesting that, through various means, missiles and torpedoes should have their reliability increased

Not really. I do not mind unreliability in my tournament list.

Provided I can have several to 'stack the odds'.

And this is the problem. It is a one-shot low probability (to be worthwhile in a short game) weapon at a steep price.

Example; I hardly (ever?) see proton torpedoes in any list. 4 points is a LOT in 100 point games.

Now if you add the unreliability you'd need several 'to be sure'. Say 2 or 3.

That means 8-12 points. Simply never worth it in a mere 100 points. (IMO)

[edit; that is why I like flechette torpedoes. 2 pts for a stress AND one less green die at range 3? Sounds fair to me]

Edited by Elkerlyc

gamers percieve their games far more accuratly than people percieve most things statistically studied.

LOL.

:rolleyes:

Look, I have issues with how Major Juggler is analyzing the top squads at Regionals. But that doesn't mean the sample size is bad. Because of the competitive nature and size of the Regionals, it is a great way to see what is the top tier.

[edit; that is why I like flechette torpedoes. 2 pts for a stress AND one less green die at range 3? Sounds fair to me]

Well, essentially 2 points for a stress token. A nice, logical, predictable purchase. The one-die-less thing is fine but since you're not getting the benefit of your target lock in the shot it's pretty much a wash.

According to MajorJuggler's regionals statistics thread, there have been 0 Advanced Proton Torpedoes so far finishing in the top 1/3 of regionals lists. That would imply that Advanced Proton Torpedoes are not worthwhile, and not a good benchmark for game balance.

I've said this before and will again... that list is a small sample size statistically.

As others have stated already, whether or not it is a small sample size is a philosophical point on what the statistical population is. For example, we could use the following as definitions of the population, in ascending order of competitive level of play:

  1. A uniform distribution of all possible pilots and skills, taken by an infinite number of truly randomly generated squads. I.e. monkeys banging on a typewriter.
  2. The total number of X-wing games that have been played worldwide in the current meta.
  3. The total number of competitive games that have been played in the current meta, as defined by some definition of "competitive".
  4. The sum total of successful squads played at competitive games, played in the current meta, as defined by some definition of "competitive", and some definition of "successful".

You could come up with many other definitions to define the population, but these are the most obvious. If the goal is to analyze the effectiveness of squads/upgrades/etc at the competitive level, then the population must be determined by method #4, and we are simply debating what is defined by "success" and "competitive".

So lets go down that rabbit trail. I did some quick googling on statistical sample size and confidence levels and found this blog. He has a couple typos: wrong labels on example Z-scores, and there should be a multiply in the equation not a subtraction, but it is very readable.

http://www.qualtrics.com/blog/determining-sample-size/

The key takeaway here is:

Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2

Note that the statistical significance of the sample size is independent of the size of the parent population as long as the sample size isn't more than a few percents of the parent population. We will assume for the sake of Kelvan's argument that the Final Cut / Top Third squads in the 2014 Regionals represent a sufficiently small sample size compared to the overall X-wing population that we can use the above metric. If we can interpret the data in the Regionals into Bayesian form, then we can use the above metrics. In this instance we can formulate a question like this:

"Will you take Advanced Proton Torpedoes into a competitive setting and end up making the Final Cut or Top Third of all squads?"

In our case we want to solve for the Z-score based on the sample size, desired standard deviation and confidence interval:

Z-score = (margin of error) * ( (Sample Size) / (StdDev*(1-StdDev) ) )0.5

Now lets fill in some numbers:

  • margin of error (confidence interval): represents how much high or lower from the population mean you are willing to let your sample fall. For example, in the statement "75% of voters said yes to XYZ, with a margin of error of +/-5%", the confidence interval is 0.05. We will use a 2% margin of error = 0.02.
  • For sample size we will use the total number of X-wing players attending the 2014 Regionals. Most recently updated attendance numbers are 1153 total, so that's our sample size.
  • Standard deviation determines the variance in the response, so to be conservative we will use a value of 0.5.

Plugging that all in, we get a Z-score of 1.36. This corresponds to a confidence level of about 91%.

TL;DR: Granting Kelvan his assumption that the 2014 Regionals is a statistically small sample size (but proportionately representative of the competitive meta), there is 91% confidence that players will take APTs and make the Final Cut / Top Third 2% of the time or less.

That is statistically significant, and validates kraedin's point. Sorry Kelvan. :)

(edit: changed the sample size definition to fit the question)

Edited by MajorJuggler

According to MajorJuggler's regionals statistics thread, there have been 0 Advanced Proton Torpedoes so far finishing in the top 1/3 of regionals lists. That would imply that Advanced Proton Torpedoes are not worthwhile, and not a good benchmark for game balance.

I've said this before and will again... that list is a small sample size statistically.

As others have stated already, whether or not it is a small sample size is a philosophical point on what the statistical population is. For example, we could use the following as definitions of the population, in ascending order of competitive level of play:

  1. A uniform distribution of all possible pilots and skills, taken by an infinite number of truly randomly generated squads. I.e. monkeys banging on a typewriter.
  2. The total number of X-wing games that have been played worldwide in the current meta.
  3. The total number of competitive games that have been played in the current meta, as defined by some definition of "competitive".
  4. The sum total of successful squads played at competitive games, played in the current meta, as defined by some definition of "competitive", and some definition of "successful".

You could come up with many other definitions to define the population, but these are the most obvious. If the goal is to analyze the effectiveness of squads/upgrades/etc at the competitive level, then the population must be determined by method #4, and we are simply debating what is defined by "success" and "competitive".

So lets go down that rabbit trail. I did some quick googling on statistical sample size and confidence levels and found this blog. He has a couple typos: wrong labels on example Z-scores, and there should be a multiply in the equation not a subtraction, but it is very readable.

http://www.qualtrics.com/blog/determining-sample-size/

The key takeaway here is:

Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2

Note that the statistical significance of the sample size is independent of the size of the parent population as long as the sample size isn't more than a few percents of the parent population. We will assume for the sake of Kelvan's argument that the Final Cut / Top Third squads in the 2014 Regionals represent a sufficiently small sample size compared to the overall X-wing population that we can use the above metric. If we can interpret the data in the Regionals into Bayesian form, then we can use the above metrics. In this instance we can formulate a question like this:

"Will you take Advanced Proton Torpedoes into a competitive setting and end up making the Final Cut or Top Third of all squads?"

In our case we want to solve for the Z-score based on the sample size, desired standard deviation and confidence interval:

Z-score = (margin of error) * ( (Sample Size) / (StdDev*(1-StdDev) ) )0.5

Now lets fill in some numbers:

  • margin of error (confidence interval): represents how much high or lower from the population mean you are willing to let your sample fall. For example, in the statement "75% of voters said yes to XYZ, with a margin of error of +/-5%", the confidence interval is 0.05. We will use a 2% margin of error = 0.02.
  • For sample size we will use the total number of X-wing games played at the 2014 Regionals. Most recently updated attendance numbers are 1153 total, so that's our sample size.
  • Standard deviation determines the variance in the response, so to be conservative we will use a value of 0.5.

Plugging that all in, we get a Z-score of 1.36. This corresponds to a confidence level of about 91%.

TL;DR: Granting Kelvan his assumption that the 2014 Regionals is a statistically small sample size (but proportionately representative of the competitive meta), there is 91% confidence that players will take APTs and make the Final Cut / Top Third 2% of the time or less.

That is statistically significant, and validates kraedin's point. Sorry Kelvan. :)

(edit: changed the sample size definition to fit the question)

.........Yeah! I think.... :wacko:

Look, I have issues with how Major Juggler is analyzing the top squads at Regionals. But that doesn't mean the sample size is bad. Because of the competitive nature and size of the Regionals, it is a great way to see what is the top tier.

If you have suggestions on how it can be improved, please let me know. I am planning on updating the Excel file so you can run the numbers with different weightings and meta.

A sample size of a given population only needs to be about 100 people as long as it's comprehensively representative of the differentiations within that population to a reasonable degree. That's what I learned in my 400-level political science course on public opinion, anyway.

Considering we have about 1,150 squads reported in that thread, our sample size is more than enough to gauge what people are and are not playing. I'd say the only criticism to be levied is that we don't have squads from all corners of the world yet — but that'll be rectified by the end of regional season once all tournaments have been held.

Look, I have issues with how Major Juggler is analyzing the top squads at Regionals. But that doesn't mean the sample size is bad. Because of the competitive nature and size of the Regionals, it is a great way to see what is the top tier.

Eh? His analysis is informative. I've liked it. What's the problem with it?

I have no useful suggestion. How you are looking at the data is fundamentally flawed in determining the balance of the game. It is interesting information, no doubt. I'm not saying it isn't. But it isn't they way to look at the true balance of the game, or of a ship. Breaking down squads to just their components is a bad way to analyze how viable things are. SQUADS are what are important. And merely making an appearance in the top tables of Regionals is a huge argument for viability, no matter how often or not it appears.

I think Darth Vader appearing once and Howlrunner appearing dozens of times is a strong viability argument. Especially when some locations will have better competition or more competition than others, making it harder or easier for fringe ships like the TIE Advanced to make a top table.

I have no useful suggestion. How you are looking at the data is fundamentally flawed in determining the balance of the game.

How so?

Breaking down squads to just their components is a bad way to analyze how viable things are. SQUADS are what are important.

I guess this is your answer to my question above? In any event, squad archetypes are now included. This was a recent update, within the last week, so you probably were not aware.

And merely making an appearance in the top tables of Regionals is a huge argument for viability, no matter how often or not it appears.

[edit]: Occurring once is an entirely different thing than occurring 100 times. If we were to play a million games then everything would eventually make it into the Final Cut, statistically. The total proportion is extremely relevant.

I don't care if people bring 4 TIE Advanced, all I care about are the most competitive results. Final Cut / Top third is actually a very wide net, so it is very inclusive. The weighted average inherently takes into account the proportion of times that something makes the Final Cut, how high up it gets, and how many players it had to beat out to get there. It is reasonable to conclude that the higher the %, the more viable it is. You are free to interpret it any other way, including arguing that the TIE Advanced is viable because it represents 0.03% of weighted average points spent, which is greater than zero.

Also: I made all the data publicly available for download in an Excel file. You can download it and mess with it as much as you want. You could also write a script to compile squad lists based on text parsing of the results. The data is all there, no need to simply complain about it, offer suggestions or better yet improvements. :) Feedback is always welcome, I think you and some other folks pointed out in the Store Championship thread that squad data would be useful, so I added squad archetypes.

Edited by MajorJuggler

I wish events would start logging all of the lists people brought.

Then we could look at the top table lists/ships vs percentage of each list/ship played at events. This would put some of the debates on here to rest, as it would be all inclusive. We could see things like, *chance of a Tie Swarm making it to the top*, which will be based on total number of swarms played at all events and the total that made it to the top. This would remove another variable from the data set, which is total # of each list played by all players, not just the good ones :P Can you say, goal for next Regional Season??

I 100% like what you're providing for us, Major Juggler, but now I want more! Sometimes "answers" just lead to more questions! Maybe I just like that I got to see my name and squad on a thread that everyone reads :P

Conditional probabilities would also be interesting, although less informative than Final Cut Probability.

I would certainly add it all if we were given the data, but getting the data in the first place is hard. TOs rarely provide the entire set of lists, half the time we only get results by the community.

I wish events would start logging all of the lists people brought.

Then we could look at the top table lists/ships vs percentage of each list/ship played at events. This would put some of the debates on here to rest, as it would be all inclusive. We could see things like, *chance of a Tie Swarm making it to the top*, which will be based on total number of swarms played at all events and the total that made it to the top. This would remove another variable from the data set, which is total # of each list played by all players, not just the good ones :P Can you say, goal for next Regional Season??

I 100% like what you're providing for us, Major Juggler, but now I want more! Sometimes "answers" just lead to more questions! Maybe I just like that I got to see my name and squad on a thread that everyone reads :P

While %chance of a list making a top cut would be interesting, I don't think it would be conclusive about the strength of the list either, since a lot of players in the bottom aren't necessarily there because of their lists.

Eight pages!? Can somebody who's been keeping up give me a summary so I can jump in?

Eight pages!? Can somebody who's been keeping up give me a summary so I can jump in?

Because missiles suck, but these work so differently they don't invalidate anything.

Rargh these invalidate all the other missiles.

Repeat for a while.

STATISTICS!

Edited by Aminar