Using Lanchester's Square Law to predict ships' jousting values and fair point values (work in progress)

By MajorJuggler, in X-Wing

Well if I could sidestep the argument on the model itself for a moment ... the model indicates that a good portion of the ships (six of them) are undercosted by 1 point, with nearly all the others (save the Z-95, and the B-Wing and TIE Bomber if you round up) undercosted by more. That says to me that the baseline ship, the TIE Fighter, may be under costed by 1 point. Academy Pilots perhaps should cost 13. If we consider that to be true, all the other ships look a bit better costwise.

It also makes sense that FFG would undercost the TIE Fighter a bit if it wanted to encourage swarms to fit with the fluff.

Now addressing MJ directly:

But it's also worth considering that every attempt I can easily find to validate Lanchester's Law using actual combat data has demonstrated substantial mis-fit with the model.

I actually don't find that too surprising, given the increasingly asymmetrical nature of warfare in the recent centuries. I'm not sure there are any real world examples that directly compare to X-wing though, where you have perfect knowledge of the enemy position and capabilities.

But, as you've said, playtesting is a superior method to theoretical estimates-- which means that if you develop a model which disagrees substantially with FFG's costs, the burden is on you to explain why your model is a better reflection of the underlying reality than the playtesting that's already been conducted.

Correction: if the model disagrees substantially with the community's consensus on balanced cost , the burden is on me to explain the model. As you said, FFG has on occasion clearly gotten it wrong...

With respect to "jousting value", it's hard to disagree with "non trivial correlation" without a definition of what qualifies as trivial for you, but the degree of correlation is actually something we can actually assess; I'll get to that in a later post.

Ships represented in Worlds 2013 Top 16, ranked by percentage points...

My impression, and this could be wrong, is that the reliance on high jousting ships has actually increased since then.

Similarly, I think we'd agree that the Lambda has a bottom-tier dial and an upper-tier set of upgrade slots. Where we part ways is the point where you assign values to those tiers. And the values you assign are arbitrary in a technical sense --they "feel" right, but there's no particular constraint or mathematical reasoning behind many of them (particularly the dials and upgrades).

The dial and upgrades, as you pointed out, are more difficult to quantify, although the goal is still to quantify how damage output (generally by getting into closer range or having less stress) or durability (i.e. evading arcs) is numerically related to each "tier" of dial. Technically there are as many "tiers" of dials as there are ships with different dials, and you're right, I haven't taken all of the categories into the realm of mathematical rigor as I have with the attack and durability.

It's a work in progress. :)

The specific values almost certainly have room for improvement. But I think the general approach is solid. As evidence, I point to the predicted "fair" points costs, which, for all of the small base ships wave 1-3 (HWK-290 notwithstanding), all fall within a point or less of the community's consensus. One full point is probably conservative; it might actually be as low as half a point. That's pretty good for a "dumb" formula that has no special fudge factors for any of the 8 small base ships that all use a common pool of values.

I have read MajorJugglers entire post and have to agree with his mathematical conclusions. The tournament I played in Sunday had two very similar lists that took first and second place. They each had (2) Interceptors and (1) Firespray. One list used Fell and Turr the other used Saber and an additional Tie (Night Beast). These lists DID NOT JOUST due to the math is against that style of play. The Firesprays blocked while the Interceptors moved in and out to take optimal shots.

I played the three ship Fell, Turr, Firespray list and was able to eliminate all opponent ships by games end in all three games. The Firespray did not surrive in any of the three games. The ships I used were played to their strengths and did very well. If I would have lined up in the gutter and jousted I would have not done well at all. The firespray by blocking ended up in a pseudo joust situation and like the math shows it did not fair well, however it's mission was to block which it did do well.

In conclusion MJ's math did predict the outcome of jousting my three ships. The interceptors did not joust and performed wonderfully.

I have read MajorJugglers entire post and have to agree with his mathematical conclusions. The tournament I played in Sunday had two very similar lists that took first and second place. They each had (2) Interceptors and (1) Firespray. One list used Fell and Turr the other used Saber and an additional Tie (Night Beast). These lists DID NOT JOUST due to the math is against that style of play. The Firesprays blocked while the Interceptors moved in and out to take optimal shots.

I played the three ship Fell, Turr, Firespray list and was able to eliminate all opponent ships by games end in all three games. The Firespray did not surrive in any of the three games. The ships I used were played to their strengths and did very well. If I would have lined up in the gutter and jousted I would have not done well at all. The firespray by blocking ended up in a pseudo joust situation and like the math shows it did not fair well, however it's mission was to block which it did do well.

In conclusion MJ's math did predict the outcome of jousting my three ships. The interceptors did not joust and performed wonderfully.

I'm not sure if you're being ironic or not, so apologies if you are.

I think most people are smart enough to figure out that jousting with high point, low "hit point" ships when they are outnumbered is a bad idea.

You actually point out the issue, that there is far more to a ship's point value than simply those three statistics, which is what MJuggler is trying to calculate. As I read it, his calculations of the ship's jousting value is also a perception of the tournament and competitive value of a ship, which your examples pretty clearly refute.

In conclusion MJ's math did predict the outcome of jousting my three ships. The interceptors did not joust and performed wonderfully.

Just to make sure I'm reading this right, the jousting model successfully predicted the performance of your ships that didn't joust? I'm not going to consider that the strongest statement of support, personally...

AlexW you are correct in that I also believe that the issue is there is far more to a ship's point value than simple statistics. MJ's calculations do not take in to account pilot special abilities or how PS affects the ships performance. I find MJ's math is useful for some baseline comparisons but it does not describe the full capability of any ship.

My above post's intent is to show that "Math" alone cannot fully predict the performance of any of the ships in X-Wing. It can however reveal some basic design criteria to illustrate the different strengths and weaknesses of individual ships. Fly the ships to their strengths and you should do OK.

[EDIT: Forgot to include this, and now I can't remember where I wanted to put the link--but here is where I show my work .]

[Much later EDIT: Every time I think I've fixed this post, I put it back up only to find that the same link and list are broken again. I'm not sure if it's user error or the editor being blissfully stupid, but HOPEFULLY I've managed to nail it down this time.

AND EVEN MOAR EDIT: I keep finding typos and copy-paste errors from trying to fix link/list issues. Close to giving up here; hopefully the post is at least comprehensible, if not perfect.]

As promised...

How do we go about evaluating model fit?

This is actually one of the hardest questions to answer, here. There are lots of tools, most of which won't work here because the modeling technique is idiosyncratic--MajorJuggler made the model up, so there's no commonly accepted metric. So I chose the simplest tool available, which has the advantages of being easy to calculate and easy to understand: correlation.

Basically, I looked at the degree of association between MajorJuggler's values and a handful of standards that seem relevant (more about each in a moment):

(1) The community consensus

(2) The list of ships in the top 16 at Worlds 2013

(3) The regression formula worked out for Wave 1 by FFG user ShadowJak

What Exactly Is The Community Consensus?

That's actually not an easy question, either. For every viewpoint with reasonably wide acceptance there is at least a handful of vocal opponents, because this is the Internet and that's how the Internet works. And it's not really clear what we mean by "community", either: the set of people who post at BGG is different from the set of people who post here, and both are different sets from the people at AFewManeuvers or TeamCovenant. And moreover, the set of people who post on the internet about this game is a subset (probably a relatively small one) of all the people who play the game, which means our view of the game is probably at least somewhat different.

So I'll do basically what I believe MajorJuggler did with his model, and approach the problem from the perspective of what we can work with rather than what would be perfect. Toward that end, BGG user RecklessFable ran a poll in November asking people to rank the ships as they appear in the current competitive metagame, from 1st place to 12th. It's not a scientific poll (since the properties of the sample of respondents is unknown), and it has some measurement problems. It didn't distinguish as MJ does between the Outer Rim Smuggler and the named YT-1300 pilots. It was also conducted in November, around the time of Worlds but arguably before Wave 3 had really been sorted out.

But it also has the massive advantages of being publicly accessible, having at least a reasonably large number of respondents, and asking roughly the right kind of question. One of MajorJuggler's metrics for both value schemes (both versions of the model) is efficiency--that is, the ratio of his estimated cost to the actual cost of the ship. The higher the efficiency, the more competitive the ship ought to be, and the more highly people ought to rank it; the lower the efficiency, the less competitive the ship, and its rankings in RecklessFable's poll should fall.

A technical note: in order to derive a "true" rank from RF's poll, I took the mode of votes for each ship--that is, the place assigned to each ship by the greatest number of respondents. Accordingly, it was possible that any particular rank would contain any number of ships; fortunately there were only two ties, and both were in places where we can make sense of the result.

An alternative methodology would be taking the most-voted ship for each place . I didn't do that because it resulted in the TIE Bomber appearing twice and the TIE Interceptor never appearing at all, which is obviously less than ideal for our purposes. It also says some interesting things about the "community consensus" on the TIE Interceptor, but that's a topic for another time.

The following list illustrates the overall results from the poll, following that first method:

1st place	TIE Fighter
2nd (tie)	B-wing
2nd (tie)	X-wing
4th		YT-1300
5th		Firespray
6th		TIE Interceptor
7th		TIE Bomber
8th		Y-wing
9th		A-wing
11th (tie)	HWK-290
11th (tie)	TIE Advanced
12th		Lambda
Why the Top 16 at Worlds 2013?

Mostly, it's because MJ uses it as a way to talk about the competitive Wave 3 metagame. I think the sample size is too small for a good comparison, and I think it leaves out a lot of good ships, but it's hard to argue that it's not a representation of which ships we ought to perceive as most competitive. As long as we exclude major outliers (which a correlational approach can do, and which I'll discuss in a moment), it's at least worth looking at.

The other major limitation (in addition to small sample size) is that there are a lot of ties at 0% used, which will artificially inflate the rank-order correlation (again, more about this in a moment).

What's This Regression Thing?

Roughly speaking, regression attempts to use one set of data to predict another set of data. In this particular case , back when Wave 2 was still the subject of intense speculation ShadowJak used each ship's Attack, Agility, Hull, and Shields to predict its cost--and he did it so accurately, in fact, that there's a good argument that it probably reflects something important about FFG's process for costing ships in Wave 1. It fared more poorly with the Wave 2 ships, making the A-wing look overcosted by 1 point and the Large ships look undercosted by about 10 points; of the Wave 3 ships, the only one that's not completely off the grid is the TIE Bomber.

So this formula is increasingly less relevant with each new wave; FFG's costing process is probably getting further and further away from a deterministic model, presumably with an increasing reliance on playtest results to adjust costs up or down.

MajorJuggler and I both agree(!) that it's no longer a strong overall approach. I'm using it as one of my benchmarks because it used to do a pretty good job, and the ships for which it worked best are those that are most like the TIE Fighter: the ships that existed before we started getting new actions and crazy dials and fantastically expensive Large ships. If MajorJuggler's models are good, they should fit at least as well as, and preferably better than, ShadowJak's model. In particular, I'd like to see the "fair value" model work better than ShadowJak's costing model for Waves 2-3.

I adjusted ShadowJak's formula slightly, so that it predicts the cost of each ship at PS1, and so that it subtracts a fixed factor for the Large ships. The relevant equation is:

PS1 Cost = 3 + 8*(Attack - 2) + 8*(Agility - 2) + 4.25*(Hull - 3) + 4.5*(Shields) - 10*(IsLarge)

And here are the estimated costs for each ship in Waves 1-3 using that equation; the center column is the actual estimate, and the right-hand column is the efficiency computed as (estimated cost)/(actual cost) -- just like MJ's efficiency numbers, a value greater than 100% means the ship is underpriced according to the model, and a value under 100% means it's overpriced.

A-Wing		15.75	92.6%
B-Wing		25.5	121.4%
Firespray	31.75	102.4%
HWK-290		3.75	25.0%
Lambda		24	120.0%
TIE Advanced	20	100.0%
TIE Bomber	15.75	105.0%
TIE Fighter	11	91.7%*
TIE Interceptor	19	105.6%
X-wing		20	100%
Y-Wing		17	100%
YT-1300		36.75	99.3%

It should be evident from just a cursory look at these numbers that the model is doing okay through Waves 1-2, but the worst outliers are (in order) the HWK, the B-wing, and the Lambda. Basically, the B-wing is getting too much credit for its high hp and not losing enough for its low Agility; the Y-wing would have the same problem if

How Did I Look At Association?

One of the simplest statistical measures of association between two sets of data is correlation, and the most commonly used measure of correlation is called Pearson's product-moment correlation coefficient. Pearson's coefficient, written as r , appears all over the place in introductory statistics courses, and it should ring a bell for anybody who had a strong high school math program or took a stats class in college.

I used Spearman's rank correlation coefficient ( link here ), which is closely related to Pearson's r , but has some advantages for this particular set of data. In particular, Spearman's rho does a few good things:

(1) It can be used much more easily with rank data, like the "community consensus" polling data.

(2) It's nonparametric, which means it doesn't make any assumptions about the underlying distributions of the two data sets.

(3) It doesn't assume that the relationship between the two data sets is a linear relationship.

(4) It's less sensitive than Pearson's r to outliers. Basically, this means there's a cap on how "wrong" Spearman's rho will perceive things to be; this is important to me because there are places where MajorJuggler perceives his model to have what might be technically called weak local fit (as opposed to strong global fit). Spearman's is nicer about that than Pearson's would be.

The major disadvantage is that Spearman's can be less sensitive than Pearson's overall. I'll talk about this more later, but it is essentially the major limitation of this examination of MJ's data.

Spearman's rho varies from -1 to 1. A value close to 0 means there's no association between the two sets of data, and a value close to 1 means there's a very strong association between the two sets. (A value close to -1 means there's a strong relationship running the wrong direction.)

Method 1: The Community Poll

Here, I compare both of MJ's models (jousting value and fair value) to the community consensus, as measured by RF's poll. The math bits can be found in a Google Drive spreadsheet here, but here are the bottom-line numbers.

Correlation of community consensus and jousting efficiency : 0.274

It's fairly clear from those values where things stand: the jousting value isn't a very good comparison to the community's overall value for ships. The low correlation is driven by a few points of very bad fit:

  • The community strongly likes the YT-1300, but it has very weak jousting efficiency (4th place compared to 11th place).
  • The community moderately likes the Firespray, but it has fairly weak jousting efficiency (5th place compared to 9th place).
  • The community moderately dislikes the TIE Bomber, but it has fairly strong jousting efficiency (7th place compared to 4th place).
  • The community strongly dislikes the Lambda, but it has very strong jousting efficiency (12th place compared to 1st place).

Correlation of community consensus and fair efficiency : 0.849

This is much better, although still not as high as I'd like. Adjusting MajorJuggler's E parameter to reflect actions, dials, firing arc, and upgrades brings it much more in line with the community's overall judgment of the Wave 1-3 ships.

The Lambda remains a problem here, although the mismatch is smaller: the fair value model still likes the shuttle better than people do (9th place in the model, and still in 12th place for the community)--but I'd actually adjudicate that in favor of the model, both because this version of the "community consensus" is a bit out of date and because I think people don't like the Lambda as well as it deserves.

The big surprise here is that people like the X-wing (tied for 2nd place with the B-wing) much better than the model does (7th place, between the Interceptor and A-wing and well behind the B-wing).

But we can make sense of that mis-fit in context: the B-wing's baseline performance looks a lot like the X-wing, except that it has slightly better durability (it survives 1 more attack under typical conditions). MJ's model notices that defensive efficiency, as well as the addition of the barrel roll, compares it to a very small cost increase and a very small decrease in the way the model values the dial, and comes up with a much more favorable "fair value".

So How Good Is MJ Under Method 1?

The jousting model doesn't like Large ships, which isn't really a surprise to me and probably isn't a surprise to him. If you simply omitted the Large ships, the jousting model's correlation to the community consensus would grow substantially. The fair value model is fairly close as well, with some inconvenient swaps and reorderings, but only a couple of places where the two strongly disagree.

But there are a lot of small disagreements that look to me like tremors in the model's foundations. One way to talk about this is to set aside correlation for a moment and just look at where ships are actually ranked by MJ's models: the fair value model has Rebels under-represented at the top of its ranking of efficiency (only the B-wing and YT make the top 6) and over-represented at the bottom (just the Lambda and TIE Advanced are in the bottom half). This doesn't agree with the community consensus as measured by the poll, and it also doesn't agree with the results from Worlds.

Method 2: Worlds 2013

Here, I compare both of MJ's models to the list of ships in the top 16 at Worlds by percentage of all points. (I cribbed these numbers from the OP.)

Correlation of Worlds 2013 and jousting efficiency : 0.290

This doesn't look good, mostly for the same reasons it disagrees with the community consensus: the model hates YTs, which were all over the place, and likes Lambdas, which were entirely absent. The only new note is that the X-wing appeared more frequently at Worlds than any ship except the TIE Fighter, but has only middling jousting efficiency. I've discussed all three areas, and MJ acknowledges that jousting efficiency doesn't say everything about why a ship might be valuable, so I'll move on to the more interesting model.

Correlation of Worlds 2013 and fair efficiency : 0.674

This is much more problematic. The X-wing still sticks out (liked by players, but not by the model), and now so does the Y-wing (appeared about as often as the Firespray and TIE Bomber, but of all the ships that actually appeared at Worlds, it's valued least highly by the model at just 10th place. Meanwhile the list of ships that didn't appear in the top 16 at Worlds includes the Interceptor, which the fair value model places among the mid-tier ships (and which has better efficiency than the X-wing).

Common wisdom explains why the Interceptor was unpopular and/or eliminated before the top 16: 20% of all points in the top 16 were spent on Falcons, and the Falcon's turret trumps the maneuverability that keeps Interceptors alive. And I have no idea how a model could or should include this kind of idiosyncratically predatory relationship between two ships--but it's definitely not modeled here, and it's having a definite impact on fit.

Incidentally, since we've moved away from the rank-only polling data and are now comparing frequency to efficiency, there's a moderately good case for looking at Pearson's r instead of, or at least in addition to, Spearman's rho. But the fair-value model fares even worse under that lens, with a correlation of just 0.504.

So How Good Is MJ Under Method 2?

The fair-value model still clearly says something about the ships that did well at World's, but there are ships the model likes that didn't make it into the top 16, and ships the model doesn't particularly like that did. Overall, even the better (that is, more comprehensive) of MJ's models does a poor job of explaining the showings at Worlds.

Method 3: The Regression Model

As one more quick methodological note, I'm not directly comparing MJ's models to the regression model here--I'm comparing the regression model to the poll and Worlds results, so that I can compare the fit of all three models.

Correlation of community consensus and regression efficiency : -0.014

Correlation of Worlds 2013 and regression efficiency : -0.176

Hopefully it's clear that this is what a really bad model looks like. The relationship in either case is very weak, and for Worlds it's actually running the wrong direction (the TIE Fighter is one of the most overcosted ships but leads the pack; the Lambda is one of the most undercosted ships but never shows up at all). MJ's models are absolutely doing better than the regression model.

The Bottom Line

MJ's models do a moderate job of matching the community's consensus, and they do a moderate to poor job of matching the showings at Worlds. But those are both, in a fundamental way, a reflection of what people like rather than what's good . For instance, if people get the Lambda wrong (which I think they do) and/or avoid it based on its reputation rather than play experience, then it's plausible that MJ's model would fit poorly there not because it's wrong but because it's right .

In fact I looked at the rank-order correlation between the poll and the Worlds results. Spearman's rho in that case is 0.868--meaning that community opinion is more closely related to the Worlds frequencies than either is related to MJ's models. There are (at least) two ways to read that: it could be that the poll is actually a better reflection of the competitive strength of Waves 1-3 than MJ's fair-value model, or it could be that community opinion is driving the Worlds results, and MJ's model knows something the community doesn't.

Unfortunately, there's really no way to evaluate either of those hypotheses without playtesting. In that sense, Wave 4 will be a critical test: the jousting model thinks the Z-95 is going to smash up the metagame and the other Wave 4 ships are likely to be non-starters; the fair-value model can't tell us anything sure about Wave 4 due to remaining uncertainty about things like cloak and the Defender's teased dial, but if the model were accurate, it could tell us how good those things would have to be in order to overcome the ships' statistical weaknesses.

For now, I'm willing to say that although the jousting model is weak, the fair-value model performs far better than I expected.

What About More Tuning?

MJ implies that he will continue to improve the fair-value model, but as both Bilisknir and I discussed, that's a potential problem at least from the perspective of the model's value for the existing ships (Waves 1-3). I think the HWK's dial should have a value equal to or even slightly lower than the Lambda's, for instance--it loses the 4-straight and turns the 2-turns red, but gains the unique and very useful stop maneuver. Likewise I don't see why the Y-wing's dial has a substantially lower value than the B-wing, since it has less red and a more useful K-turn, and only loses the B-wing's red 1-turn maneuver.

Of course both of those changes would likely reduce the overall model fit, which isn't a trivial point, but isn't what I want to talk about. The more closely the model resembles the existing data we have, the better it should be at predicting the value of future releases--but the less meaningful it is when talking about the current ships, since it's already narrowly tailored to explain how we value them as a community. (That is, it's not really predicting anything about the ships already included in the model, it's just agreeing with what we already know.)

So Does This Mean I've Changed My Mind?

No, it doesn't really mean that. It does put me in an uncomfortable position, however: the fair-value model fits well enough that I can't dismiss it as coincidence, but I think my methodological concerns are still valid. Primary among them is the inherent difficulty, if not impossibility, of constructing reliable and accurate values for each ship's dial and for valuing actions and upgrades in a way that's reliable, accurate, and generalizes properly across ships.

But Since It's Going To Happen Anyway: Some Concrete Suggestions

(1) Consider interactions between upgrades.

I recognize that this is opening up Pandora's box of ad-hoc tweaks, but upgrades certainly display synergy with one another. The second crew upgrade is arguably better than the first, since it allows players to build combos--Navigator + Intelligence Agent, Recon Specialist + Gunner, Chewie + C-3PO. The astromech slot is going to work better with system upgrades than without, due to combos like Advanced Sensors + R2 Astromech. System upgrades also arguably make ordnance slots (meaning missiles and torpedoes) better, because FCS potentially sets up an action stack. There are more examples, but hopefully this is enough to demonstrate that a flat value for each upgrade slot is potentially a source of error.

(2) Build out all ships that can carry turret upgrades separately.

The Y-wing without a turret is just not the same ship as the Y-wing with an ion cannon, and (at least where generic value is concerned) there really is no such thing as a HWK with no turret. Modeling these as separate ships, and then accounting for them as ships with a 360 arc (and possibly a unique Attack coefficient reflecting the damage cap of the Ion Cannon and action handicap of the Blaster) rather than ships with a traditional arc and a turret upgrade slot, should help improve model fit for both ships.

(3) Limit the resolution of your tiered estimates.

Humans can very reliably divide things into three ordinal categories (mild/medium/hot; none/some/many; below expectations/meets expectations/exceeds expectations; etc.) Broadly speaking the more categories you use, the more disagreement you'll get between raters, and the less reliable your scale is. (In fact there's some research that says people lose the ability to meaningfully distinguish between adjacent ratings if you have more than about 7 categories in total.)

A good analogy might be using the "enhance" button to fix a picture. You can manually touch it up here and there if you're so inclined, and really good software might be able to meaningfully interpolate pixels for you, but exporting a low-resolution picture in a high-resolution format won't make it look any sharper--and people are really low-resolution cameras where most phenomena are concerned.

So your current approach to valuing dials isn't good measurement, even setting aside my disagreement with your attempt to quantify them at all. You're building a continuous scale, which gives you an essentially infinite number of categories--which weakens your model and leaves you open for endless debate about adding that extra .05 to the shuttle or whatever. Build three tiers, value them at (e.g.) 0.8, 0.9, and 1 (or whatever set of coefficients puts the fit where you want it), and cut off the discussion there. You can consider adding more categories if necessary to reflect a totally unique dial that's truly better than the Interceptor/A-wing tier, but I don't see any reason that you can't pretty easily fit all the existing ships into a good/better/best sort of scheme.

And ditto with (known) actions. Let attack actions (TL, Focus) be worth X, defensive actions (Evade) Y, and repositioning actions (Boost, Barrel Roll) worth Z.

(4) Unit-weight the unknowns.

You currently have both cloak and the Defender's dial valued at 1.1. All this really does is obscure the fact that your model isn't actually modeling those things. Just call them 1, and acknowledge that there's too much uncertainty around those things to make really accurate guesses. This will let you more easily to tell just how good those ships' unknowns might need to be to make up the difference, and it will free martinets like me up to categorize our objections as belonging to the model in general or to a ship in particular--rather than going down the self-defeating rabbit hole of arguing over how to quantitatively value a dial that hasn't yet been spoiled.

Edited by Vorpal Sword

Vorpal Sword: if that post wasn't scarily close to what I do on a daily basis it is pretty much what I would have said. This is getting way to close to work for me!

TIE Droid

9 points at PS1

2 attack

3 defense dice

2 hull

Actions: Target Lock, Evade

Dial: TIE Fighter (tier 4)

Cannot be affected by friendly pilot abilities, Elite Pilot Talents, etc (exact wording TBD).

I've done some research on TIE/d Automated Droids (here on the forum and at BGG) and think 8pt cost and 4 hull fits better. Admittedly, I was going for theme, hence PS0, no focus, immune to pilot crits and more manoeuvrable than A-wings, but still think drones shouldn't be player run per se. However, 8pt cost based on the BGG spreadsheet - Lack of focus has to be considered in weighting other abilities, but this would be unique to drones - would allow 12 ships at 96pts, so this would maybe balance out a human brain being better than the A.I. due to having to coordinate a dozen ships.

I readily admit that I'm not a statistician, so this may be a dumb question:

If most ships are roughly 1 point over costed, and no ship is under costed, is the TIE fighter really the best baseline? If another ship, say for example the X-Wing which you list as roughly 1 point over costed, how much would that change things? Obviously, the TIE would be listed as under costed, but would it also change the weighting and values of things like actions and upgrades? If it would have an effect on weighting and values I would think that identifying a more balanced ship as baseline is worth considering.

Even if changing the baseline model wouldn't affect the results, it seems to me based on your list that the real answer is that the TIE and B are under costed instead of every other ship being over costed.

Again though, as I'm not the statistician, I could be very wrong.

Thoughts?

I readily admit that I'm not a statistician, so this may be a dumb question:

If most ships are roughly 1 point over costed, and no ship is under costed, is the TIE fighter really the best baseline? If another ship, say for example the X-Wing which you list as roughly 1 point over costed, how much would that change things? Obviously, the TIE would be listed as under costed, but would it also change the weighting and values of things like actions and upgrades? If it would have an effect on weighting and values I would think that identifying a more balanced ship as baseline is worth considering.

Even if changing the baseline model wouldn't affect the results, it seems to me based on your list that the real answer is that the TIE and B are under costed instead of every other ship being over costed.

Again though, as I'm not the statistician, I could be very wrong.

Thoughts?

I suggested that the TIE Fighter is overcosted earlier in this thread but no one responded to me, heh. So yes, I agree with you. I think FFG intentionally undercosted the TIE Fighter for thematic purposes to encourage swarms. Unfortunately, that encouragement may have been a little too effective among the player base.

I readily admit that I'm not a statistician, so this may be a dumb question:

If most ships are roughly 1 point over costed, and no ship is under costed, is the TIE fighter really the best baseline? If another ship, say for example the X-Wing which you list as roughly 1 point over costed, how much would that change things? Obviously, the TIE would be listed as under costed, but would it also change the weighting and values of things like actions and upgrades? If it would have an effect on weighting and values I would think that identifying a more balanced ship as baseline is worth considering.

Even if changing the baseline model wouldn't affect the results, it seems to me based on your list that the real answer is that the TIE and B are under costed instead of every other ship being over costed.

Again though, as I'm not the statistician, I could be very wrong.

Thoughts?

I suggested that the TIE Fighter is overcosted earlier in this thread but no one responded to me, heh. So yes, I agree with you. I think FFG intentionally undercosted the TIE Fighter for thematic purposes to encourage swarms. Unfortunately, that encouragement may have been a little too effective among the player base.

:P

Even if this isn't a perfect analysis(it isn't), it is a useful thing to know. I don't know that we'll ever work out the cost anlysis for ships, but as it stands this data feels right for the most part. It tells me how clever I have to be to make something work. The worse the dial the more clever. The lower the jousting level the more clever.

Oh my lord....this is almost as bad as MathHammer....
Fly Casual, please! I feel like I'm on Bolter and Chainsword :S

Oh my lord....this is almost as bad as MathHammer....

Fly Casual, please! I feel like I'm on Bolter and Chainsword :S

Not everyone's version of fun looks exactly like yours. MajorJuggler and I clearly disagree on quite a few things, but it's equally clear that we both enjoy playing with the game this way (or else we wouldn't waste our valuable time with it). :wub:

One of the things that frustrated me most about an earlier contretemps with him is that I suspect we have more in common than not--and that we'd get along just fine at a table together, at least in part because most X-wing players do Fly Casual.

Sorry all for not getting back to you much earlier. I still don't have time to update the model yet, but I wanted to mention a few things that I would like to do going forward, and to update this list with the new -2 point cost A-wings. First I'll reply to some comments.

I don't think it's possible to scale actions by proportion of use, as different ships don't use them proportionally the same. My interceptors live and die by boost, my a-wings don't. Likewise barrel roll is dependent on asteroid setup and enemy formation. I'm more willing to skip a barrel roll with 1 of 7 ties for an asteroid than I am with 1 of 2. Target lock doesn't even have the same weight on different ships, I'm less likely to need it on a hwk than a phantom.

Actions values are not constant in time, or equivalent across ships, or even on the same ship. Then we complicate by tossing 2nd actions in.....

Short answer: yes accurately modeling the action economy is difficult, even getting coefficients that are intended to be statistical "averages" of how efficient they are. I'm always open to suggestions.

I have a very minor critique on the tiers for dials. You have the B-Wing in the 2.5 category and the Y-Wing in the 2 category. I'm unclear as to why you would do that. The Y-Wings dial has 4 red moves where the B-Wing has 6 red moves. The B-Wing has 2 more green, then the Y-Wing. I would group the B-Wing and the Y-Wing in the same group. Like I said this is a very minor critique.

Thanks MrFroggies. I have it in mind to start using a more regimented dial system rating that should clean this up. (See below).

Just on top of what has already been said about the use of Lanchester's Laws for a discrete time game.

You are using observed use of ships in top level gaming (albeit modified) to fit your attack/defence dice distribution model. This means you are falling into classic model calibration issues. Now we have no in-sample/out-of-sample distribution unfortunately. The closest you could come would be to look at distribution prior to a wave release and then use your model to predict for the new wave of ships. But given the paucity of data and lack of stable equilibrium in the metagame between waves this will be quite hard.

Indeed. You cannot fit a model to a set of observed facts and then say model proves your views about those observed facts, Would be interesting to see from MajorJuggler what the sensitivity of his outputs is to changes in the different inputs. i.e. How stable is the model...

The relative damage numbers are fairly immune to changes in ship composition and ranges. As long as you get a reasonable cross-section the numbers don't seem to change by more than a few percents. By far, the biggest variable seems to be the assumption of how often the defender has focus available to spend. Next update, I want to include these calculations so people can run the numbers themselves and see what difference it makes. (See below)

Rakky is also collecting data on Store Championships, which is the best data set yet, I think.

I readily admit that I'm not a statistician, so this may be a dumb question:

If most ships are roughly 1 point over costed, and no ship is under costed, is the TIE fighter really the best baseline? If another ship, say for example the X-Wing which you list as roughly 1 point over costed, how much would that change things? Obviously, the TIE would be listed as under costed, but would it also change the weighting and values of things like actions and upgrades? If it would have an effect on weighting and values I would think that identifying a more balanced ship as baseline is worth considering.

Even if changing the baseline model wouldn't affect the results, it seems to me based on your list that the real answer is that the TIE and B are under costed instead of every other ship being over costed.

Again though, as I'm not the statistician, I could be very wrong.

Thoughts?

6 of one, a half dozen of the other... you have to use SOMETHING as a baseline, so I chose the TIE Fighter. "Fair" point costs are all relative to each other.

What I want to do going forward:

  1. I want to include the probability density function calculators in he Excel file so that 100% of the math is self-contained. This would let folks change the ratios of ranges, action economy, ship types, etc, and see how that affects the normalized damage numbers, so you don't just have to take my word for it.
  2. I want to include a more regimented dial coefficient system, that "scores" each dial in several areas:
    • minimum turn radius
    • slow-speed maneuvers
    • stress clearing
    • K-turns
    • special moves

I'm still thinking about how to implement that, so more comments on that as get further.

And lastly, the Rebel Aces gives us -2 cost for the A-wing. I added a new entry for a missile-less A-wing that costs 2 less. It scores:

Fair jousting cost: 14.8

Actual cost: 15

Jousting efficiency: 98.8%

Predicted total fair cost: 15.5

Total cost efficiency: 103.4%

So the 15 point A-wing is going to make great filler, with an absolutely spectacular jousting efficiency. It just went from being in the worst third of ships, to the best third of ships.

I also wanted to reply to Vorpal specifically, since he went through a ton of work for evaluating model fit!

I think those are interesting results, but we should still take them with a grain of salt (even if it says that the model is working!) for a couple reasons:

  1. We don't really know what the relationship is between point efficiency and a ranked list. Ideally it is linear with good separation between each rank, but that might not be the case.
  2. A ranked list has some information loss; i.e. you don't know what the separation is between different ranks. For example, what if ships ranked 1-4 are all really good, and then 5-8 suddenly fall off a cliff? You will tend to get noise in the measurement when you have multiple ships with similar point efficiency, but they have to get shoe-horned into discrete rank bins.

I understand and respect why you still disagree with the methodology though, it makes sense. Everything past Jousting values are a pseudo-science, and it isn't even a fair comparison for any ship that has unique functionality. I also like your idea of just putting placeholder values of "1" in for wave 4 dial coefficients and the Cloak action. Unfortuantely I'm too busy/lazy right now to go back at this moment and update it, but hopefully FFG will leak the dials for us soon anyway.

That being said, the X-wing is the obvious outlier in this process. Maybe if I am more consistent with the dial ratings then the formulas will better reflect reality. Or maybe X-wings are still in the process of getting phased out, it will be interesting to see the results after wave 4 meta has stabilized. Or maybe X-wings just have a halo around them.... the game is called "X-wing" after all. ;)

Edited by MajorJuggler

Interestingly, I've heard from pretty reliable sources that the Academy Pilot is actually overpriced-- that the FFG design formulas put it at 11 points, but the designers decided to bump it up 1 point to prevent 9 ship lists from being valid at 100 points.

Interestingly, I've heard from pretty reliable sources that the Academy Pilot is actually overpriced-- that the FFG design formulas put it at 11 points, but the designers decided to bump it up 1 point to prevent 9 ship lists from being valid at 100 points.

I wouldn't be surprised if that were from the initial development of the game, but they are underpriced in terms of the balance that has emerged

Wave 4 will be an interesting test of these calculations relative to standard theory.

I think most standard theory indicates that the Bandit Squadron Z-95 is clearly inferior to the Academy Pilot, for instance-- equivalent firepower with an inferior dial and roughly comparable resilience, in a faction that has less support for the playstyle.

Similarly, people are predicting that the Defender will be very interesting at least, potentially a game-changer, and certainly worth testing, where this analysis indicates that it is totally uncompetitive.

I suspect the standard theory is better than the equations here-- but we'll see.

Neither of those issues are open and shut. The Z-95 will be much harder to one shot than a tie, while the Defender is very arguably overcosted. I am firmly of the opinion the Z-95 will revolutionize Rebel squadbuilding. Meanwhile the Defender is pretty strictly inferior to the Bounty Hunter, and will have trouble being a force in the metagame due to taking up a lot of psace while being vulnerable to focus fire in a way that Bounty Hunters just aren't(minimum 3 attacks, far more likey 4-5 versus a minimum of 2 with 3-4 being likely.) Not that the Defender doesn't have advantages, I just don't think they'll be worth the cost.

Depends on what's attacking the Z-95. A Heavy Laser Cannon and the Phantom do have a chance at killing a Z-95.

The ease of one-shotting a ship seems less than relevant-- in practice TIEs get one-shot so infrequently under normal circumstances that this seems unlikely to be a serious concern.

The ease of one-shotting a ship seems less than relevant-- in practice TIEs get one-shot so infrequently under normal circumstances that this seems unlikely to be a serious concern.

A tie requires 3 unblocked hits and no evades. That happens in tandem on about 3% of unmodified rolls. If the attacking ship modifies it's roll it's 1 in 10. (All of these, at range 2.)

In practice the defending ship often also modifies its rolls, is engaged at range 3 first, etc.

Do one-shots happen? Certainly.

Are they a huge factor in balance? Not at this time.

In practice the defending ship often also modifies its rolls, is engaged at range 3 first, etc.

Do one-shots happen? Certainly.

Are they a huge factor in balance? Not at this time.