A Gaussian mixture clustering model for characterizing football players using the EA Sports' FIFA video game system. [Modelo basado en agrupamiento de mixturas Gaussianas para caracterizar futbolistas utilizando el sistema de videojuegos FIFA de EA Sports].

César Soto-Valero



The generation and availability of football data has increased considerably last decades, mostly due to its popularity and also because of technological advances. Gaussian mixture clustering models represents a novel approach to exploring and analyzing performance data in sports. In this paper, we use principal components analysis in conjunction with a model-based Gaussian clustering method with the purpose of characterizing professional football players. Our model approach is tested using 40 attributes from EA Sports' FIFA video game series system, corresponding to 7705 European players. Clustering results reveal a clear distinction among different performance indicators, representing four different roles in the team. Players were labeled according to these roles and a gradient tree boosting model was used for ranking attributes regarding to its importance. We found that the dribbling skill is the most discriminating variable among the different clustered players’ profiles.


En las últimas décadas se ha visto un incremento considerable en la generación y disponibilidad de datos de fútbol, esto se debe fundamentalmente a la popularidad de este deporte así como a los avances tecnológicos realizados en este campo. Los modelos de agrupamiento basados en mixturas Gaussianas representan un enfoque novedoso para explorar y analizar datos de desempeño deportivo. En el presente trabajo, se lleva a cabo una caracterización de jugadores profesionales de fútbol utilizando técnicas de análisis de componentes principales y agrupamiento basados en mixturas Gaussianas. El modelo presentado es comprobado utilizando datos del sistema de videojuegos FIFA de EA Sports, dichos datos representan 40 atributos correspondientes a 7705 futbolistas europeos. Los resultados del agrupamiento revelan una clara distinción entre algunos indicadores de desempeño, los cuales corresponden a cuatro roles diferentes en el equipo. Consecuentemente, los jugadores fueron etiquetados de acuerdo a estos roles y un modelo de árboles de gradiente ampliado fue utilizado para ordenar los atributos de acuerdo a su importancia. Como resultado se identificó a la habilidad de driblear como la variable que mejor discrimina entre los diferentes perfiles de jugadores.



Andrienko, G.; Andrienko, N.; Budziak, G.; von Landesberger, T., & Weber, H. (2016). Coordinate Transformations for Characterization and Cluster Analysis of Spatial Configurations in Football. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part III (pp. 27–31). Cham: Springer International Publishing.

Arndt, C., & Brefeld, U. (2016). Predicting the Future Performance of Soccer Players. Statistical Analysis and Data Mining: The ASA Data Science Journal, 9(5), 373–382.

Arruda Moura, F.; Barreto Martins, L. E., & Augusto Cunha, S. (2013). Analysis of football game-related statistics using multivariate techniques. Journal of Sports Sciences.

Banfield, J., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.

Barros, R. M. L.; Cunha, S. A.; Magalhães, J. W. J., & Guimarães, M. F. (2006). Representation and analysis of soccer players’ actions using principal components. Journal of Human Movement Studies, 51, 103–116.

Bidaurrazaga Letona, I.; Lekue, J. A.; Amado, M.; Concejero, J. S., & Gil, S. M. (2015). Identifying talented young soccer players: conditional, anthropometrical and physiological characteristics as predictors of performance. RICYDE. Revista internacional de ciencias del deporte, 33(11), 75-95.

Bloomfield, J.; Polman, R.; Butterly, R., & O’Donoghue, P. (2005). Analysis of age, stature, body mass, BMI and quality of elite soccer players from 4 European leagues. The Journal of Sports Medicine and Physical Fitness, 45(1), 58-67.

Browne, R. P., & McNicholas, P. D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Paper presented at the Proceedings of the 22Nd ACM SIGKDD Internacional Conference on Knowledge Discovery and Data Mining.

Cintia, P.; Giannotti, F.; Pappalardo, L.; Pedreschi, D., & Malvaldi, M. (2015, 19-21 Oct. 2015). The harsh rule of the goals: Data-driven performance indicators for football teams. Paper presented at the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

Constantinou, A. C.; Fenton, N. E., & Neil, M. (2012). pi-football: A Bayesian network model for forecasting Association Football match outcomes. Knowledge-Based Systems, 36, 322-339.

Dasgupta, A., & Raftery, A. E. (1998). Detecting features in spatial point processes with clutter via modelbased clustering. Journal of the American Statistical Association, 93, 294–302.

Dempster, A. P.; Laird, N. M., & Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 39(1), 1–38.

Di Salvo, V.; Baron, R.; Tschan, H.; Calderon Montero, F. J.; Bachl, N., & Pigozzi, F. (2007). Performance characteristics according to playing position in elite soccer. International Journal of Sports Medicine, 28(3), 222–227.

Erkmen, N. (2009). Evaluating the heading in profesional soccer players by playing position. Journal of Strength and Conditioning Research, 23(6), 1723-1728.

Filipcic, A.; Panjan, A., & Sarabon, N. (2014). Classification of top male tennis players. International Journal of Computer Science in Sport, 13(1).

Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578-588.

Friedman, J. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189-1232.

Gil, S. M.; Gil, J.; Ruiz, F.; Irazusta, A., & Irazusta, J. (2007). Physiological and anthropometric characteristics of young soccer players according to their playing position: relevance for the selection process. The Journal of Strength & Conditioning Research, 21(2), 438-445.

Gyarmati, L.; Kwak, H., & Rodriguez, P. (2014). Searching for a unique style in soccer. arXiv preprint arXiv:1409.0308.

Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques (2nd ed.): Morgan Kaufmann Publishers.

Hughes, M., & Franks, I. (2005). Analysis of passing sequences, shots and goals in soccer. Journal of Sports Sciences, 23(5), 509–514.

Hughes, M. D., & Bartlett, R. M. (2002). The use of performance indicators in performance analysis. Journal of Sport Sciences, 20(10), 739-754.

James, N. (2006). Notational analysis in soccer: Past, present and future. International Journal of Performance Analysis in Sport, 6(2), 67–81.

Jelinek, H. F.; Kelarev, A.; Robinson, D. J.; Stranieri, A., & Cornforth, D. J. (2014). Using meta-regression data mining to improve predictions of performance based on heart rate dynamics for Australian football. Applied Soft Computing, 14, 81-87.

Jolliffe, I. (2002). Principal component analysis (2nd ed.). NY: Wiley Online Library.

Kabacoff, R. I. (2011). Principal components and factor analysis. In R in Action. Shelter Island, NY: Manning Publications Co.

Kampakis, S. (2011). Comparison of machine learning methods for predicting the recovery time of professional football players after an undiagnosed injury. Paper presented at the Machine Learning and Data Mining for Sports Analytics ECML/PKDD 2013 workshop, Prague, Czech Republic.

Lanfranchi, P., & Taylor, M. (2001). Moving with the ball: the migration of professional footballers. Oxford: Berg.

Markovits, A. S., & Green, A. I. (2017). FIFA, the video game: a major vehicle for soccer’s popularization in the United States. Sport in Society, 20(5-6), 716-734.

Mathien, H. (2016). Football data collection. From: https://github.com/hugomathien/football-data-collection

McCall, A.; Davison, M.; Carling, C.; Buckthorpe, M.; Coutts, A. J., & Dupont, G. (2016). Can off-field "brains" provide a competitive advantage in professional football? Journal of Sports Medicine, 50, 710-712.

McLachlan, G., & Peel, D. (2004). Finite Mixture Models: John Wiley & Sons.

Memmert, D.; Lemmink, K. A. P. M., & Sampaio, J. (2017). Current Approaches to Tactical Performance Analyses in Soccer Using Position Data. Sports Medicine, 47(1), 1-10.

Min, B.; Kim, J.; Choe, C.; Eom, H., & McKay, R. B. (2008). A compound framework for sports results prediction: A football case study. Knowledge-Based Systems, 21(7), 551-562.

Moor, L. (2007). Sport and commodification: A reflection on key concepts. Journal of Sport and Social Issues, 31(2), 128-142.

Morgan, S.; Williams, M. D., & Barnes, C. (2013). Applying decision tree induction for identification of important attributes in one-versus-one player interactions: A hockey exemplar. Journal of Sports Sciences, 31(10), 1031-1037.

Odachowski, K., & Grekow, J. (2013). Using Bookmaker Odds to Predict the Final Result of Football Matches. In M. Graña, C. Toro, R. J. Howlett & L. C. Jain (Eds.), Knowledge Engineering, Machine Learning and Lattice Computing with Applications: 16th International Conference, KES 2012, San Sebastian, Spain, September 10-12, 2012, Revised Selected Papers (pp. 196-205). Berlin, Heidelberg: Springer Berlin Heidelberg.

Pau, M.; Ibba, G.; Leban, B., & Scorcu, M. (2014). Characterization of Static Balance Abilities in Elite Soccer Players by Playing Position and Age. Research in Sports Medicine, 22(4), 355-367.

Reilly, T.; Williams, A. M.; Nevill, A., & Franks, A. (2000). A multidisciplinary approach to talent identification in soccer. Journal of sports sciences, 18(9), 695-702.

Romann, M., & Fuchslocher, J. (2013). Influences of player nationality, playing position, and height on relative age effects at women's under-17 FIFA World Cup. Journal of Sports Sciences, 31(1), 32-40.

Sarmento, H.; Marcelino, R.; Anguera, M. T.; CampaniÇo, J.; Matos, N., & LeitÃo, J. C. (2014). Match analysis in football: a systematic review. Journal of Sports Sciences, 22(20), 1831–1843.

Schwartz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464.

Scrucca, L.; Fop, M.; Murphy, B. T., & Raftery, A. E. (2016). mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. The R Journal, 8(1), 289-317.

Shu-Ching, C.; Mei-Ling, S., & Na, Z. (2005). An enhanced query model for soccer video retrieval using temporal relationships. Paper presented at the 21st International Conference on Data Engineering (ICDE'05), Tokyo, Japan.

Spencer, B.; Morgan, S.; Zeleznikow, J., & Robertson, S. (2016). Clustering team profiles in the Australian Football League using performance indicators. Paper presented at the Proceedings of the 13th Australasian Conference on Mathematics and Computers in Sport, Melbourne.

Strnad, D.; Nerat, A., & Kohek, S. (2015). Neural network models for group behavior prediction: a case of soccer match attendance. Neural Computing and Applications, 1-14.

Tüfekci, P. (2016). Prediction of Football Match Results in Turkish Super League Games. In A. Abraham, K. Wegrzyn-Wolska, E. A. Hassanien, V. Snasel & M. A. Alimi (Eds.), Proceedings of the Second International Afro-European Conference for Industrial Advancement AECIA 2015 (pp. 515-526). Cham: Springer International Publishing.

Wickham, H. (2015). ggplot2: Elegant Graphics for Data Analysis: Springer.

Witten, I. H.; Frank, E., & Hall, M. A. (2011). Data Mining Practical Machine Learning Tools and Techniques (3rd ed.): Morgan Kaufmann Publishers.

Xu, R., & Wunsch, D. (2009). Clustering. New Jersey: Wiley-IEEE Press.

Palabras clave/key words

fútbol; sistema de videojuegos FIFA de EA Sports; aprendizaje automático, análisis de componentes principales; agrupamiento basado en modelos de mixturas Gaussianas; árboles de clasificación y regresión.

Texto completo/Full Text:

PDF (English) PDF

------------------------ 0 -------------------------

RICYDE. Revista Internacional de Ciencias del Deporte
Publisher: Ramón Cantó Alcaraz
ISSN:1885-3137 - Periodicidad Trimestral / Quarterly
Creative Commons License