Sadgasm1 OP t1_j1yfh7h wrote on December 28, 2022 at 9:00 AM

Reply to comment by TheGoatzart in Principal Components Analysis, k-means clustering and actual outcome of the 2022 Fifa world cup matches [OC] by Sadgasm1

Oh yeah generally youre right, but the data was also ‘doubled’ in this way. Because each row is a match, the variables are present for each team separately. For example: there’s attempts from outside the box by team1, and attempts from outside the box by team2. The pattern of relation should be symmetrical because these differences are arbitrary as you say.

Sadgasm1 OP t1_j1u9eh8 wrote on December 27, 2022 at 1:08 PM

Reply to comment by mateusb12 in Principal Components Analysis, k-means clustering and actual outcome of the 2022 Fifa world cup matches [OC] by Sadgasm1

lol

Each row in the data is a match, so ‘team1’ represents a win for the “home” team, ‘team2’ represents a win for the “away” team and ‘draw’ represents a draw.

Sadgasm1 OP t1_j1u8bjs wrote on December 27, 2022 at 12:57 PM

Reply to comment by Scottzurp in Principal Components Analysis, k-means clustering and actual outcome of the 2022 Fifa world cup matches [OC] by Sadgasm1

Hi :)

I tried to extract 2 meaningful PC’s from 59 analytics of the matches (variables like ball possession, goal attempts… etc). Then I wanted to see if the matches are clustered on these predictors according to their outcome.

They aren’t 😕 but I liked the plot

Sadgasm1 OP t1_j1u5dug wrote on December 27, 2022 at 12:24 PM

Reply to Principal Components Analysis, k-means clustering and actual outcome of the 2022 Fifa world cup matches [OC] by Sadgasm1

I have used IRON486's dataset that he scraped from fifa.com and posted on Kaggle.

I have used R and posted the script as an R notebook on here.