Measuring Success

By juicebox, in The Lord of the Rings: The Card Game

so using the old values form before... 5 games, 3 wins, 2 losses.... so

(sum of all scores) * (g * g) / (w * w * w)

(0 + 0 + 75 + 100 + 125) * (5* 5) / ( 3 * 3 * 3) = 300 * 25 / 27 = 277.7777777777778

?

booored said:

Pete... go back to your physics thesis!

hehe :-)

Yes, rounded I also get 278 in that case

Interesting analysis. One thing to consider is how much influence win % should have on the final score versus the actual quest scores. There is no right answer, more a matter of preference, but if you increase the win % rating factor too much than the win % becomes essentially what determines the overall winning score, and the average quest score becomes more of a tie breaker when players have the same win %. When I came up with the original formula I was looking for something that would be simple and intuitive to the average player (i.e. multiply average score times win %) while factoring win % in a meaningful way into the score calculation. Your analysis raises an interesting fact in that at some point it becomes better to concede then use the final score, which I intuitively don't like yet I wonder how often someone's scores will vary so significantly that this becomes an issue. Still, I think it warrants some more thought on if there is a tweak that can be made which fixes this issue, while not giving win % too much weight.

Hmmmmm... very interesting indeed.

Personally, my bias is to keep things as "simple" as possible.

However, if I understand this right, the tension lies between the possibility of there being a certain threshold where a player would decide it is more advantageous to forfeit a game (and take a loss) than see it through to its epic 20+ round conclusion vs the possibility of the system aimed at holding that possibility in check too heavily valuing Win Ratio as compared to Score when it comes to factoring for a final weighted result. Am I following that right?

Yes, correct. I'm not sure if it is a huge issue if someone is doing so badly on a specific run through compared to their other rounds that they just decide to pack it in rather than finish and count it as a loss. With current simple approach the win ratio is still fairly heavily weighted so someone with several losses will probably not get a top score anyways. As well, you have to be doing quite a bit worse than your other scores to even make it worthwile to do this so I don't think it is something that would happen a lot.

RGun said:

There is no right answer, more a matter of preference, but if you increase the win % rating factor too much than the win % becomes essentially what determines the overall winning score, and the average quest score becomes more of a tie breaker when players have the same win %.

...

yet I wonder how often someone's scores will vary so significantly that this becomes an issue. Still, I think it warrants some more thought on if there is a tweak that can be made which fixes this issue, while not giving win % too much weight.

That's absolutely right. If we overdo it (e.g. we would divide 4 times through w, the actual scores will really be only a tie-breaker. We can use the "put up or shut up Rhosgobel" thread and Zjb12's current 48-hrs-competition to test our formulas if they "feel right" in determining the winner, when there are enough games posted. Say a week or so. And after that, we could do the same with with the Dead Marshes. I think in all the other quests, you usually don't score with 225 instead of 100 (or 381 instead of 100 for my formula). But in the Marshes, this could happen. (We should ban Denethor and Shadows of the Past because players will especially include them to manipulate the encounter deck. And we have to find players who go to the bitter end if Gollum escapes and search the encounter deck for him...)

For a quick-fix: Good idea, why not count losses and conceded games as fixed 50 pts for solo, 100 pts for 2 player games an so on, in addition to the formula by you, RGun and leptokurt? I calculated it, if A is the average score of the normal games (100 in my previous examples) and B is the score of the bad game, you can take advantage if B > 2.25 * A + 78.125 (that's for 5 games again).

Initial example:

(75 + 100 + 125 + 50 + 50)/3² * 5 = 222.22... (I assumed a solo game and put the 2 last=lost games to 50).

Example so you see the B>2.25A + 78.125 formula works: The limit where it is advantageous to concede a game if the avg score of "normal" games is 100 would be 225+78.125 = 303.125:

without conceding: (80 + 120 + 80 +120 + 303/5²*5 = 140.6 With conceding: (80 + 120 + 80 + 120 + 50)/4²*5 = 140.625 So if someone wants to take advantage of willingly lose a game and not scoring, this game is "worth" a won game of 303 points. Quite ok, I guess. (If he concedes 2 games, then number is even bigger.)

EDIT: I just noticed the Living Tournament. So we have 3 competitions going on for getting data, great :-D

Hmmm,

here is how I would do it: Each player gets scored based on the number of wins he can pull off (out of a fixed number of games for everybody) and then use the sum of scores of all won games as a tiebreaker.

There are a couple of benefits there:

1. It is simple!

2. Nobody would lose on purpose to game the scoring system.

3. It puts a bigger weight on the win ratio than on the scores, which is the way to go imho.

The Cons

1. it might re-encourage stalling strategies. But I don't find that too bad, because

A) Stalling doesn't auto-win you 5/5.

B) having different strategies available is healthy for the game.

C) Each "regular" deck beats a "stall" every time when tie for wins, which is also the way to go imho.

Just my 2 cents.

plueschi said:

Hmmm,

here is how I would do it: Each player gets scored based on the number of wins he can pull off (out of a fixed number of games for everybody) and then use the sum of scores of all won games as a tiebreaker.

It keeps the emphasis on winning and As well as score. You know several people might go 5/5, then overall score applies. Makes easier sense. Losses then penalize you simply by meaning you weren't 100% proficient. If you went 4/5 then you will be listed accordingly and score the second decider. This would still discourage rabbit decks that some people get concerned about as low scoring but infrequent in winning. There could be some stalling, but wanting your score low would still encourage you not to stall too much.

Interesting, but I'm not sure I like the idea of # wins completely trumping score. If I understand what you're proposing, a deck that wins 5/5 but scores an average of 350 per game would rank higher than a deck that wins 4/5 with an average of 50 per game. Am I interpreting what you proposed correctly? While I agree win % should have more of a weighting than score, I don't know that it should completely overshadow score with score just used as a tie breaker.

Yeah, it seems like a little of this a little of that, depending on what your measuring. Do you want consistency or low score? Ideally you want both. Now, in my tourney this weekend, I only won 2/5 times, so my weighted factor is going to be high, even though my scores are not bad. However, if someone else goes 4/5 with scores both above and below me, it seems they played better this quest than I did. But my weighted average will send my basic average score through the roof, from 144 to 360! Another way of be to post your average score, 144, but also show your win percentage, .400. Seems to me, that if someone goes .800 and averages 156 they should beat me. (and would with a weighted average, but with more than one loss, the weighted average really kills you.).

I am also fighting a head cold, so I may just be speaking jibberish!

RGun said:

Interesting, but I'm not sure I like the idea of # wins completely trumping score. If I understand what you're proposing, a deck that wins 5/5 but scores an average of 350 per game would rank higher than a deck that wins 4/5 with an average of 50 per game. Am I interpreting what you proposed correctly? While I agree win % should have more of a weighting than score, I don't know that it should completely overshadow score with score just used as a tie breaker.

Well, your scenario is entirely within my proposal. In my opinion winning 5/5 with an average score of 350 (or whatever) is a stronger showing than, stomping the encounter deck with "recklessness" and winning 4/5 with an average score of 50. But the most important part is that:

If the deck that went 5/5 was a "stall" deck" and the deck that went 4/5 was a "regular" deck, then can you "improve" your regular deck by winning 5/5 with an average score of 100? If, yes than you have beaten the "stall" deck. And I think allowing your "strong regular" deck more room for a higher score, but winning more consistently is pretty reasonable and a worthwhile achievement.

Hope this clarifies.

I think you either need to agree win % trumps average score and score is only used as tie breaker when win % is the same, or you need to combine them into an aggregate score (like was done this tournament). Just posting both is interesting, but then it is open for debate who ranks higher than who. Although maybe in a forum like this that is a good thing and encourages some interesting debate, however in a tournament where you want a clear winner defined you need a systemic way to rank the players.

EDIT: cross posted with last response - this was in response to zjb's last post.

Maybe I can add a little something more to explain why I value # of wins so highly. Lets say you win 4/5, but your loss was due to "bad luck", flipping two Attercops on turn 1 and your deck could not handle that. Now given the number of games played, every player is likely to have some bad starts and, if you manage to recover from those -no matter how high your score in the end- than I truly think you should be higher placed than the deck that simply cannot handle those instances and crumbles, even if it means that your other wins have a higher score, too.

Of course one deck could have a total of 5 easy games, but there is no way around that and it shouldn't matter too much for the enjoyment of the game.

In the ongoing attempt to considering a fun, reasonable, and balanced approach to "measuring success" in this game, I wanted to summarize a few thoughts from other forum threads...

Personally, I'm considering rolling out a system of Living Tournaments on this forum that will have no set time frame and would allow players to post results from game play that allows comparison that goes beyond what is currently offered in FFG's Quest Log.

Basically, my hope is to frame the way success is measured in those tournaments in a way that values both low score and win ratio for a given deck/player's performance at a given scenario.

At this point, I would like to use RGun's approach, ranking results by a player's meta score that measures for both game score and win ratio.

My personal hope and bias is that this can be achieved by measuring a set of 3 games. However, there has been some discussing around whether sets of 4 or 5 games would be better, and I want to test drive these ideas.

My plan is to offer a few juicebox Tournaments during the month of February that ask for game play reports to come packaged in sets of 3-5 games (depending on the tournament), and I'm hoping lots of people will play and offer feedback around how the play experience was and how balanced the ranking results seem.

This will be both a practical experiment and should also be a lot of fun.

If any of you have compelling reasons for measuring success by sets of either 3, 4, or 5 games, feel free to post your thoughts here.

In my mind, this seems like the primary remaining question before the Living Tournaments can... well, be born. gui%C3%B1o.gif

I don't think anyone went so far as to rationalize why 3 or 5 games would be better than any other number of games in measuring success.

I think the 3 and 5 numbers were thrown arbitrarily simply based upon the argument that the more games you play, the less luck becomes a factor in the outcome. By that logic, we should be considering other numbers (why not 10, or 20, or 100) because it's still possible to get lucky 3 or 5 games in a row.

I think the main discussion around 3 vs 5 had mostly to do with the time involved rather than with the "worth" of such results in a theoretical exercise to quantify success in this game.

In order to really determine how many games is enough to represent a valid statistical pool of results, we'd need to really look at how the encounter decks are made, how much cards are seen in a game and other such factors. This is all way beyond my skills.

Your objectives are to have comparable results over a long period of time. I think you should evaluate what you believe the impact will be on the number of participants you'll have depending on the number of games chosen as the reference. If you think you'll get more participants with a 3 games system, this is what you should choose. I think it's better for someone to see his results compared to many other people rather than just a few.

In fact, your system should allow for an "infinite" number of games. You said that you want to see results for a given "deck/player". This is what should be evaluated. A single player should be able to submit an infinite number of results for a given deck. If I play 20 games with a deck, your system will give me an averaged score (including factoring for win %). If I play only 5, I should still get a score. You want to set a minimum number of games to reduce the luck factor in the results, but there should be no maximum. After a while, for any scenario, you should be able to see emerge a "best player & deck" combination. If you see most top scores are for similar decks, then we'll know that deck is the best for the scenario. If you see multiple players with each a different deck, we'll see all the viable strategies and "better" players should emerge.

As for the actual implementation of such a system, I'm not sure this forum's interface is the best tool for it. Idealy, you'd want a full database containing all the information. This is starting to look like "organized play", such as what we have with the DCI in Magic, for example.

I tend to agree there doesn't need to be a maximum since the formula uses the average which means it is valid to compare scores even if # games isn't the same. There should be a minimum of at least 3 though so someone doesn't just get lucky after first play and stop - they could get lucky 3 times in a row but that is a lot less likely.

Were you planning to have people post at least 3 with exactly the same deck, and then they are free to try again with a different deck and post again (assuming at least 3 more with the new deck) or were you planning to let them change cards as many times as they like and just keep sendng in new scores (so just one running set of scores per player)? Restricting to at least 3 with the same deck makes the results more interesting because you can see what the better decks are, otherwise you just see who the better players are but it will be harder to run it this way and keep track of everything.

juicebox said:

In the ongoing attempt to considering a fun, reasonable, and balanced approach to "measuring success" in this game, I wanted to summarize a few thoughts from other forum threads...

Personally, I'm considering rolling out a system of Living Tournaments on this forum that will have no set time frame and would allow players to post results from game play that allows comparison that goes beyond what is currently offered in FFG's Quest Log.

Basically, my hope is to frame the way success is measured in those tournaments in a way that values both low score and win ratio for a given deck/player's performance at a given scenario.

At this point, I would like to use RGun's approach, ranking results by a player's meta score that measures for both game score and win ratio.

My personal hope and bias is that this can be achieved by measuring a set of 3 games. However, there has been some discussing around whether sets of 4 or 5 games would be better, and I want to test drive these ideas.

My plan is to offer a few juicebox Tournaments during the month of February that ask for game play reports to come packaged in sets of 3-5 games (depending on the tournament), and I'm hoping lots of people will play and offer feedback around how the play experience was and how balanced the ranking results seem.

This will be both a practical experiment and should also be a lot of fun.

If any of you have compelling reasons for measuring success by sets of either 3, 4, or 5 games, feel free to post your thoughts here.

In my mind, this seems like the primary remaining question before the Living Tournaments can... well, be born. gui%C3%B1o.gif

If we do this we should do it proper.. maybe make a better version of teh quest log website.... so peopel can enter and view score lists?

Okay booored, between you and encouragement from TragicTheBlathering, I decided to create an account on CardGameDB to take it for a test drive. No promises, but I'm certainly going to take your recommendations seriously. It does seem well worth exploring that possibility. After all, I have attempted to contact FFG personnel to ask for a sticky thread here on this site, but I have yet to hear anything back.

SiCK_Boy said:

I don't think anyone went so far as to rationalize why 3 or 5 games would be better than any other number of games in measuring success.

I think the 3 and 5 numbers were thrown arbitrarily simply based upon the argument that the more games you play, the less luck becomes a factor in the outcome. By that logic, we should be considering other numbers (why not 10, or 20, or 100) because it's still possible to get lucky 3 or 5 games in a row.

I think the main discussion around 3 vs 5 had mostly to do with the time involved rather than with the "worth" of such results in a theoretical exercise to quantify success in this game.

In order to really determine how many games is enough to represent a valid statistical pool of results, we'd need to really look at how the encounter decks are made, how much cards are seen in a game and other such factors. This is all way beyond my skills.

Your objectives are to have comparable results over a long period of time. I think you should evaluate what you believe the impact will be on the number of participants you'll have depending on the number of games chosen as the reference. If you think you'll get more participants with a 3 games system, this is what you should choose. I think it's better for someone to see his results compared to many other people rather than just a few.

In fact, your system should allow for an "infinite" number of games. You said that you want to see results for a given "deck/player". This is what should be evaluated. A single player should be able to submit an infinite number of results for a given deck. If I play 20 games with a deck, your system will give me an averaged score (including factoring for win %). If I play only 5, I should still get a score. You want to set a minimum number of games to reduce the luck factor in the results, but there should be no maximum. After a while, for any scenario, you should be able to see emerge a "best player & deck" combination. If you see most top scores are for similar decks, then we'll know that deck is the best for the scenario. If you see multiple players with each a different deck, we'll see all the viable strategies and "better" players should emerge.

As for the actual implementation of such a system, I'm not sure this forum's interface is the best tool for it. Idealy, you'd want a full database containing all the information. This is starting to look like "organized play", such as what we have with the DCI in Magic, for example.

But if someone plays and wins his first three games, he would have to be batshit crazy to play another game, because he would be considered the winner of this tournament no matter what everyone else would do. A certain database is needed to judge players properly.

Some simple maths ...... based on statistical confidence.

If we play N games, we can be 95% confident that we are within 100/(square root N) percent of the real score. The real score is the score you would end up with if you played an infinite number of games.

For 3 games this gives 57%, for 5 games 45%, 100 games 10%.

In plain English, if we play 3 games and post of a score of 100 we can be 95% sure that the "real" score is between 157 and 43. If we play 5 games then we can be 95% sure that the "real" score is between 145 and 55.

I'd been voting for 5 games, rather than 3, because I thought that would iron out some of the luck. It turns out when you run the numbers I was wrong, there is still a +/-45% error margin (i.e. 90% of my score) after 5 games. I should have been voting for 100 games!

I don't like these numbers and I'm hoping someone better at maths can point out my errors, but as it stands the conclusion seems to be we will never get a sample size big enough to "trust" the results and be able to post a top ten ordered list of players with any confidence.

The best method may well have been pretty close to the original one, rank results based on the lowest score, not a weighted average.

I think a more accurate measure would be, "play 3 games and post your second best score".

Memetix: these are the kinds of maths I had in mind. I didn't follow that link, but I guess you support my point.

About posting your second best score, I don't think that's whisable. We still want to see "best scores", even if they're total lucky fluke, because they then become potential goals to aim for.

If I make a "hockey" parallel (I'm Canadian, heh!), nobody expects a player to score 92 goals these days, but Wayne Gretzky did it one in the 80. It took an unbelievable combination of player skill (he was the greatest) and environment (more open play, different goaltender style, etc). Should we just remove this mark from the record books? No, it's still there, as the mystical objective any scorer should dream of attaining. Even if Gretzky had been a fluke and scored only 10 goals a year for the rest of his career, we would still want to know about that record (although in his case, he did maintain a high average anyway).

Memetix said:

Some simple maths ...... based on statistical confidence.

If we play N games, we can be 95% confident that we are within 100/(square root N) percent of the real score. The real score is the score you would end up with if you played an infinite number of games.

For 3 games this gives 57%, for 5 games 45%, 100 games 10%.

In plain English, if we play 3 games and post of a score of 100 we can be 95% sure that the "real" score is between 157 and 43. If we play 5 games then we can be 95% sure that the "real" score is between 145 and 55.

I'd been voting for 5 games, rather than 3, because I thought that would iron out some of the luck. It turns out when you run the numbers I was wrong, there is still a +/-45% error margin (i.e. 90% of my score) after 5 games. I should have been voting for 100 games!

I don't like these numbers and I'm hoping someone better at maths can point out my errors, but as it stands the conclusion seems to be we will never get a sample size big enough to "trust" the results and be able to post a top ten ordered list of players with any confidence.

The best method may well have been pretty close to the original one, rank results based on the lowest score, not a weighted average.

I think a more accurate measure would be, "play 3 games and post your second best score".

So, what's the difference between playing 1 game and playing 3 games?

Using the formula, 1 game says the real score is +/- 100% of the one-off score.

i.e. 95% of the time, if you get a score of 100 your real score would be between 0 and 200.

I agree without playing significantly more games the variance can still be fairly large, but I believe one of the goals when this thread started was to penalize players for having decks that are designed to score very low and finish the quest quickly when a specific draw comes up from the encounter deck/player hand, but are not designed to win consistently. The general consenus from people who have posted seems to be that it is as, if not more important, for a deck to be able to win consistently as well as score low. Without taking into account win % you don't measure that and have no incentive for people to make sure their deck can not only score well, but also win consistently. Even if the variance is still high, at least people are forced to create consistent winning decks.

i think it would be better to make the penalty for rounds 3-5 points. not ten. e.g. i'm on round ten . theres 7-10 victory pnts in play. now, i know a round is 10 pnts, why bother with killing those trolls. we usually go with blind chance and quest all the characters and hope for the best. all your victory points combined rarely equal 10 points. one game you get a troll and it takes 2-3 rounds to kill him. i only get-what? 4 pnts?! another game i don't draw him or he gets discarded through shadows. i win 3 rounds earlier. 4 versus 30 points. not balanced. 3-5 points per round seems better.