To help mitigate skill, you should use datapoints from evenly matched opponents. That is, either people who play together a lot and win as much as they lose, or sample just the top part of a tournament, like top 10-25%.
Is that a good way of collecting data, or does it exclude too much?
How are we determining a players skill level compared to others?