lichess.org
Donate

Please, fix the tactics training

@SelfmateMan:

Nope, I explicitly used the words used in the title. This is from an article he wrote 9 years later: en.chessbase.com/post/the-elo-rating-system-correcting-the-expectancy-tables

A quote from the article:

"I want to call attention to two main things here. You can see that in the middle region of the graph (for instance within the red box), the white line is steeper than the black trend. This means that rating favorites are not scoring as well as their ratings say they should. Also, look out at the edges, for instance within the blue box. You can see that due to the "400-point rule", the expected score (for the heavy rating favorites) levels off at 92%, but in reality the heavy favorites are scoring much higher than 92%."

So yes, there are parts of the curve where higher-rated players score worse than the model predicts, but there's also a point where they score higher.

The portion where the higher-rated score worse is when the difference is 100-400 points, while above that there's a flip and higher rated players score much higher than the model predicts.

As a reminder, we're talking about a >500 point difference in this lichess case.

Now, this is all consistent with his earlier paper, of course.

Games where the difference is 100-400 points are much more common than games with the very large rating differences, so as a criticism of Elo, that's a fair point (and you'll notice all his graphs in the original paper cut off before we get to the wonderland of >400 rating point differences, and the big focus was on 0-200).

It's just not applicable here, because we're talking about the portion of the curve where the model is mistaken in the other direction.

Further, my point wasn't about the predicted scores from Elo at all. The predictions could be too high, too low, or spot on, it doesn't matter.

My point was that empirical results from games where the players were 500 points apart showed an expected score very close to that implied by the -15 for a loss/+1 for a win ratio.

In summary, I neither am nor was confused. I appreciate your concern though :)
Ahh, i now got your point. This players rating is more than 400 points higher than the puzzle (Edit: And Nakamuras Rating is more than 400 points higher than that of his opponent), so, according to this slightly inaccurate formula he actually scores better than he should. But the fact that in one case it has a bias in this, and in the other case it has a bias in that direction does not make it a correct formula, agree?
The problem here is that the training ratings for players were not reset, so there is no chance at all to be matched up with equally high rated puzzles (they would all start at something like 1500). This will have the effect of raising up the extremely low ratings towards the middle and lowering the extremely high ratings towards the middle.

In other words, it's an expected outcome of what was decided upon. The issue is that the only other choice was basically resetting everyone else's ratings to 1500 at the same time (or some other mass alteration of users' ratings towards the middle). And if you think you are upset at losing rating points after doing puzzles, imagine the outrage at everyone losing points after not even doing anything.
@SelfmateMan: The formula that is wrong is Elo, in this case specifically because of the 400 point rule, where expected score doesn't change.

That is not used here.

My point has nothing to do with how Elo calculates it.

My point is that the empirical data cited by Sonas shows that in games between players with more than 500 points difference, the actual observed score is very close to that implied by the rating change the OP saw.

As I said originally, there is some room for wiggling either way.

It's possible that at 500 points the proper ratio should be more like -12 for a loss and +1 for a win.

It's also possible that at 500 points difference it should be more like -17 for a loss and +1 for a win.

We don't have enough data to really tell for sure (and factoring in RD makes all this more complicated still; the above all more or less assumes a very small RD).

So, to be perfectly clear, my point is that while a "perfect" rating system might penalize a 500 point higher rated opponent a bit less (or a bit more) for losing relative to the reward for winning, it's not going to be substantially different than what the OP saw.

In short, the rating changes the OP thought were absurd are actually quite reasonable, and match the data we have fairly well. It might not be a perfect match (can't ever expect that, of course), but it's certainly close.

That was a fun rabbit hole :)

Switching out to the subject of the puzzles themselves, I agree that with the qualitative changes in the puzzles, we might find pockets that are harder than the old ones and pockets that are easier than the old ones.

I can't say for sure, but I imagine overall they probably are a bit more difficult, just because now there's not this bias from people upvoting puzzles they liked (presumably most often ones they got right).

That also tended to result in a lot of mate puzzles, and having a lot of one type of puzzle is probably a bit easier than having several types.

If all that speculation on my part is true, then yeah, we may well expect puzzle ratings to go down a bit.

I wouldn't worry about it so much...last I checked we can't redeem our puzzle ratings for food or rent :)
yeah, and my point was that the elo rating system is imprecise. However, i wrongly assumed it is used here. Mea culpa.
I'm gonna tell you what the real problem is and I'm gonna use the correct technical term to make it as clear as possible: the new system 'sucks'!
haha :-) All rating systems are just made by EGO, to have some good food when people become desperate in trying to pimp their rating. Just dont care too much about :-)
#14 Agreed. Also note that ratings are Glicko-2, not Elo.
Too bad not all puzzles are really tactics puzzles. Some of them are just what it says — "find the best move for white/black". Like in the following puzzle:
en.lichess.org/training/64379
After scanning through the position and finding no forcing win, you have to deal with the threat of Rxg4+ with a very natural Bf5 and simply have a dominating position with some threats on the 7th rank. But is this really a tactical move — Bf5? I very much doubt so.

This topic has been archived and can no longer be replied to.