Abstract
There is a growing need to address the variability in detecting cognitive deficits with standard tests in cocaine dependence (CD). The aim of the current study was to identify cognitive deficits by means of Machine Learning (ML) algorithms: Generalized Linear Model (Glm), Random forest (Rf) and Elastic Net (GlmNet), to allow more effective categorization of CD and Non-dependent controls (NDC and to address common methodological problems. For our validation, we used two independent datasets, the first consisted of 87 participants (53 CD and 34 NDC) and the second of 40 participants (20 CD and 20 NDC). All participants were evaluated with neuropsychological tests that included 40 variables assessing cognitive domains. Using results from the cognitive evaluation, the three ML algorithms were trained in the first dataset and tested on the second to classify participants into CD and NDC. While the three algorithms had a receiver operating curve (ROC) performance over 50%, the GlmNet was superior in both the training (ROC = 0.71) and testing datasets (ROC = 0.85) compared to Rf and Glm. Furthermore, GlmNet was capable of identifying the eight main predictors of group assignment (CD or NCD) from all the cognitive domains assessed. Specific variables from each cognitive test resulted in robust predictors for accurate classification of new cases, such as those from cognitive flexibility and inhibition domains. These findings provide evidence of the effectiveness of ML as an approach to highlight relevant sections of standard cognitive tests in CD, and for the identification of generalizable cognitive markers.