The secret of data crunching

By Yearfire, in X-Wing

Now that we know listfortress.com will replace the old List Juggler, are anyone willing to share some of their secrets on how to use the available data in any sort of systematic way?

Anything from useful software to relevant introductory text in quantitative methods is appreaciated.

Step 1: get a significant sample size.

You'll have a few years waiting for step 1 to bone up on any further steps.

6 minutes ago, SOTL said:

Step 1: get a significant sample size.

You'll have a few years waiting for step 1 to bone up on any further steps.

So are you saying that any data we collect is worthless until then? If that's the case, I don't belive you're right, but knowing the scope and limitations of the data we have is a good thing, and something I would like a discussion of.

For systemic/quantitative analysis, yes pretty much.

For other analysis, maaaaybe it gets big enough to be useful but it could easily be more misleading than helpful.

I'd argue that based on the data we collect, we can, say with some confidence what is not bad, but we really can't say what is best or against the field or against specific matchups.

I'm generally not willing to believe any conclusion more extreme than "this shows up a lot, it's generally good enough to do well at a major event."

3 minutes ago, Biophysical said:

I'd argue that based on the data we collect, we can, say with some confidence what is not bad, but we really can't say what is best or against the field or against specific matchups.

I'm generally not willing to believe any conclusion more extreme than "this shows up a lot, it's generally good enough to do well at a major event."

with the addendum that it might not have been good enough a month ago, may not be good enough in a month's time, and may not even be good enough right now depending on what people in your local area are playing.

Quantitative analysis is not my field, but I find it hard to belive that the data is useless if you know how to use it.

Was Metawing in 1st edition useless? The sample sizes weren't that big because of the changes from new waves. But I still used it to great effect in list building.

I have no illusion that we can use the data to find out once and for all what is good, but I would expect it to have some value in getting an idea of what kind of archetypes and pilots you might expect to meet in a tournament.

The problem is player skill, dice, and matchups have a ton of influence, certainly more than lists do when taken in aggregate, and arguably more than lists do for some of those elements individually. The sample size isn't really enough to say that the rest of those factors even out over time. Hence why I say that lists can reasonably considered "good enough" based on data, but probably not more.

20 minutes ago, Yearfire said:

Quantitative analysis is not my field, but I find it hard to belive that the data is useless if you know how to use it.

Was Metawing in 1st edition useless? The sample sizes weren't that big because of the changes from new waves. But I still used it to great effect in list building.

I have no illusion that we can use the data to find out once and for all what is good, but I would expect it to have some value in getting an idea of what kind of archetypes and pilots you might expect to meet in a tournament.

As SOTL and Bio have mentioned, not specifically "useless", but at best you're getting loose trends, from which you must extrapolate using your own inference.

Which makes conclusions pretty "fuzzy".

Because metas will shift based on the introduction of new ships (and now, different formats and points adjustments as well), the sample size for any particular "window" or "season" will be small, and smaller yet if you plan to try and capitalize on such data mid-season, thus necessitating even more extrapolation.

So... back to your question:

56 minutes ago, Yearfire said:

So are you saying that any data we collect is worthless until then? If that's the case, I don't belive you're right, but knowing the scope and limitations of the data we have is a good thing, and something I would like a discussion of.

Maybe™

I’m kind of hoping point, upgrade and format changes constantly make the data kind of useless.

I was a leading proponent of using Metawing.

I stopped bothering a long time ago and won't be bothering going forwards. You gain very little extra other than a bunch of pretty graphs, and it's only occasionally telling you something that you can use in a forward-looking way.

Listjuggler was a pox on the game and I hope listfortress dies a death.

I find that the following lazy approach works really well:

  • Take a quick glance at the Top 16 results at recent Tournaments
  • Take a mental note of the Winning Lists
  • Fast Forward through a couple podcasts and take note of lists they ramble on about, describe as good, or highlight as cool
  • Identify the lists that look "difficult" to the average player

Its usually possible to form an fairly accurate list of lists/archetypes to expect during swiss at the next tournament this way. Good lists perceived as "difficult" will always appear in lower numbers during Swiss but show up in small #s in the cut. Lists that seem cool or easy are more commonly used. Easy being more important. Overtime you can look out for "Dash Poe Data". These are lists or ships that become popular but are actually terrible.

Should qualify that this level of analysis doesn't help identify whats actually good ( Top 16 results are often a lie). It just makes it clear what to expect at the next tournament, ignoring new waves and erratas etc.

Edited by Boom Owl
1 hour ago, Biophysical said:

I'd argue that based on the data we collect, we can, say with some confidence what is not bad, but we really can't say what is best or against the field or against specific matchups.

I'm generally not willing to believe any conclusion more extreme than "this shows up a lot, it's generally good enough to do well at a major event."

This. Plus, player’s knowledge of the data alters the data. When folks see that list X shows up a lot, some players are liable to play said list to an event.

My hope is that 2.0 is balanced enough that we see much more variety.

I agree with what was being said: data crunching is not predictive, and the insights are largely simple confirmations of hunches/instincts some better players had.

But if you want to do it anyway, as I will: First you need to ask a question. Some questions are easier to answer than others. Next you can think how best answering that question. Some methods are simple, others not at all. And finally you have to think about representation.

A simple example for all points is faction distribution:
1) What fraction of the swiss/cut does each faction represent?
2) The method is simple: Count lists of a faction, divide by total lists, and you have your percentage.
3) Representing by numbers is already ok, but a pie chart is probably better.

This question can be made slightly more complex.
1) How does the faction distribution change during a cut?
2) Still simple: filter for lists in Top128, Top64, Top32, Top16, Top8, Top4, Final.
3) Different ways. I was lazy and just did 4 pie charts, but there would be better ways (e.g. stacked columns)

I was often interested in some key factors. All can be split by faction:

  • list size, pilot skill(now initiative), total HP in a list, total attack dice, total defense dice,
  • list point cost, bid,
  • arcs,
  • most frequently played upgrades,
  • most frequently played ships/pilots,
  • win percentages.

The more complex questions were:

  • which ships/pilots were combined, and does that correlate with win percentages?
  • is there a correlation between extended, non-normal arcs and winrates?
  • is there a correlation between the number of ships/pilots played and how often they advance to the cut? Do some ships/pilots advance disproportionately rarely/often?
  • are certain upgrades better on specific pilots? (This was especially interesting for Harpoons)

But to go to the pitfalls:

1. Sample size! The smaller your sample size (number of lists, generally) the less interesting your findings. My final regional analysis (which I never posted because 2.0 was announced) consisted of 10388 games with 1825 individual lists, using 5073 ship builds. Vassal league is also a great source.
2. Sample differences! It was not much of a problem in 1.0 because listjuggler already allowed to sort out unofficial tournaments. But in 2.0 we have different formats and we can absolutely not compare extended, second edition and hyperspace to each other. Of course you can try going from second edition to extended, but chances are that extended has answers that pose much larger problems.
3. Incomplete samples! Some tournaments have to be excluded (e.g. if they are incomplete, or cuts are incomplete)
4. Component changes! The regional season had a very clear cut on the January 22nd when NymMiranda was nerfed. This point will be way more important in 2.0 as changes are supposedly more frequent.

That's all on the technical side. But what can you actually tell? As others have said already, it's purely descriptive. That was always enough for me. But most want to be able to predict the meta and efficiently build for the future.

We can go into specifics if you want, but I'll stop the wall of text here.

What tool do you use for the counting, @GreenDragoon
?

5 hours ago, Yearfire said:

Now that we know listfortress.com will replace the old List Juggler, are anyone willing to share some of their secrets on how to use the available data in any sort of systematic way?

Anything from useful software to relevant introductory text in quantitative methods is appreaciated.

"Hey, this list looks fun/broken/interesting/goofy, I think I'll fly it!"

For competitive, I take the top trends, the top 10 pilots and upgrades, as an idea of what I should seriously consider when building a squad, and what I'll be flying against.

That's as deep as I go.

9 minutes ago, Yearfire said:

What tool do you use for the counting, @GreenDragoon
?

Most people use excel and that's usually enough. I do my stuff in RStudio, using the statistics software R. It's very convenient to juggle around large data sets. Going through those 10'000 games in excel is probably not as convenient.

To give some examples (click them to enlarge)

6s0Uckf.png

QAgVLIq.png

ubaUN8p.png

cEi6WUy.png

HYDm0nT.png

zem9Flj.png

Those are some examples of what is possible and - for me - fun to do. It's also not possible without listjuggler.

5 hours ago, SOTL said:

Listjuggler was a pox on the game and I hope listfortress dies a death.

Most likely won't though. There is a demand for this kind of data in the community, so sooner or later somebody awesome like @sozin or @Antigrapist will step up and provide it :)

2 minutes ago, LordBlades said:

Most likely won't though. There is a demand for this kind of data in the community, so sooner or later somebody awesome like @sozin or @Antigrapist will step up and provide it :)

Listjuggler and Meta Wing put the info out there for everybody, not just a few who were willing to drudge through the data.
If there's a powerful pilot/upgrade/combo, I'd rather know about it than have it be a surprise at some event like Regionals.

Metawing was tragically misinterpreted and misused in 1st edition.

The main culprit was indeed the tiny sample size for most of the squadrons.

Especially matchups between some less popular lists (which means anything outside of the most popular 6-8) had laughably small samples. Literally there were often like 3 matches recorded between some of those lists. Three.

And that was used to calculate rankings for the lists, and far too many people took those ranking seriously. Some of the comments on the forums were unbearable.

Therefore anything like Metawing which provides fast and easy and wrong answers is dangerous and should be never replicated again.

Even without list juggler/fortress we would be better off, with more home-brewing and creativity and less copying.

1 minute ago, baranidlo said:

Metawing  was tragicall  y misinterpreted and misused in 1st edition. 

I agree

1 minute ago, baranidlo said:

And  that was used to calculate rankings for the lists

I disagree, I don‘t think specific matchups were used in any way for ranking?

2 minutes ago, baranidlo said:

Even  without list juggler/fortress we would be better off, with more home-brewing and creativity and less copying. 

I strongly disagree. In my opinion it only levels playing fields with people that have an active team where matchups are trained and new ideas tested.

9 minutes ago, GreenDragoon said:

I disagree, I don‘t think specific matchups were used in any way for ranking?

Looking at the algorithm.. - yeah, you're right.

Still the algorithm seem very arbitrary and dubious, and biased to just show the most popular stuff on the top (which is even explicitely mentioned in the author's comments).

And people refused to aknowledge that most popular does not equal the "best".

They have only seen the simple number and took it for the Truth.

Lesson here is - NEVER show an simple easy number to the masses. Even if you add ten asterisks below that, most people will never read them..

Edited by baranidlo
3 minutes ago, baranidlo said:

And people refused to aknowledge that most popular does not equal the "best". 

Which was by the way part of the motivation behind the two middle graphs further up. Mirrors were filtered out there if I recall correctly.

There is a strong correlation between popularity and success, and that is of course to be expected. If 100 players attend a tournament, and the cut is to 32 people, then you expect more Poes in the cut if more players brought Poe. If 50 players bring Poe and he's perfectly average, then you'd expect 16 in the cut. Interestingly, he underperformed. That doesn't mean he had no potential because there is of course a problem: good ships become popular, and that will drag the success down again! If the 100 players bring Poe, then 68 won't make the cut, and half of the Poe lists lost.

This also makes the position of Fenn in these two mentioned graphs so horrifying: the sheathipede was the most frequent ship (44% of Rebel lists had a sheathipede) and still Fenn performed incredibly high, despite being draged down by all those who played him and lost.

…there's a system phase?