Machine Learning Methods for Managing Parkinson’s Disease
Problems
The team at Voxel initially trained computer vision pipelines using open-sourced tools like CVAT for video object detection. They faced challenges with the CVAT user interface, backend data management, interpolation issues, and label exports.
Key Results
Integrations, robust video support, and image group classification provided by Encord helped the team address workplace safety concerns.
< Back to case studies
Think Labs Could Be A Good Fit For Your Team As Well?
Tabel of Content
- How can AI help with Parkinson’s disease?
- Current methods of Parkinson’s disease diagnosis and treatment modes:
- Concerns and challenges:
- Application of machine learning in solving Parkinson’s disease diagnosis and treatment challenges:
- Implementation: Machine learning for Parkinson’s disease early prediction
Introduction To The Problem And Rationale
With these numbers, the diagnosis and treatment for PD have a lot to achieve and improve. PD care suffers from lack of early diagnosis which could control the progression and stop irreversible damage. PD patient care is also constrained by insufficient, sporadic symptom monitoring and lesser access to specialist care globally.
How can AI help with Parkinson’s disease?
Current methods of Parkinson’s disease diagnosis and treatment modes:
Parkinson’s clinical testing methods:
Second test is by administering Levodopa and check for a positive response which is again subjective to individual patient response and of course the unsolicited side effects.
Another level of confirmation is possible with brain scans, magnetic resonance imaging (MRI), Computed tomography (CT), Positron emission tomography (PET) and DaTscan.
Early symptoms largely coincide the effects of aging. In addition, PD symptoms also resemble to other neurodegenerative or non-neurodegenerative conditions, making the exact identification difficult.
Parkinson’s disease treatment:
For more than 5 decades, the symptomatic treatment of PD aims to reduce the declining levels of the neurotransmitter dopamine with the orally administered dopamine precursor, levodopa (Brand name Sinemet). MAO-B inhibitors is another category to tackle Dopamine breakdown. Though the severe adverse effects on motor performance resulting from such medicine defeat the very purpose.
PD patients are prone to develop fluctuations in motor and/or cognitive functions as side effects, e.g. levodopa induced Dyskinesia, levodopa induced Psychosis, vomiting and so on. Deep brain simulations or surgery are the options depending on the stage of the patient when medications are not effective enough.
Identification of the stage and severity of the disease is carried out by the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) and Hoehn and Yahr (HY) Scale.
Concerns and challenges:
Challenges in early diagnosis of Parkinson’s disease:
While the Levodopa challenge predict 70-81% cases, researchers suggest it to be redundant and trust the physical-neurological tests more. Not to mention the uncertain risk of subjecting the patient to unnecessary side effects.
Among the scan tests, MRI tests are next level recommendation for cases where the primary symptoms cannot be ascertained. PET requires to subject the patient to a radioactive substance injection to check the abnormalities of Dopamine transmission. CT scans create a 3D picture for the doctors with main aim to rule out other conditions and not direct diagnosis of PD. DaTscan, another radioactive IV scanning test can only confirm a prognosis and not make first-hand diagnosis of PD. In addition, it has limitations in distinguishing PD with other Dopamine depleting disorders. Given the cost to effectiveness trade-off combined with the administration complexity, these tests are more suitable for later stages and not for early diagnosis.
Thus, requirement of simple yet effective diagnostic tools to capture early signs as depicted in previous figure, is evident.
Concerns and challenges in treatment of Parkinson’s disease
Application of machine learning in solving Parkinson’s disease diagnosis and treatment challenges:
Contrary to that, TotemX Labs proposes solution with small sized and medium sized public datasets, to build a proof of concept.
We create easy-to-implement models. There is a vast variety of applications of ML in PD. In the proposal a few examples are discussed in order to open the discussion for larger arena of possibilities.
Basic machine learning cycle workflow:
apply machine learning for parkinsons disease early detection treatment
This figure represents the generic process flow of a typical Machine Learning model, excluding advance steps of MLOPs, as not applicable in the present usecase. It starts with raw data acquisition from user. To make data understandable, we need to extract features from the data which can be fed further to ML algorithm. So, in the next step the data is cleaned and preprocessed. In feature engineering, we remove unnecessary features as well as create some new features which would help with training the ML model. The dataset with selected features is split in Train and Test dataset, where Test dataset is reserved for evaluation and prediction purpose. The Train dataset is further split for validation purpose. ML model is trained on Train dataset and validated with validation dataset. Once model is finalized the predictions are made for Test dataset, which are provided to user. From feedback of users and predictions the model is improvised with more data and/or better ML algorithms.
Data acquisition
Biomarkers dataset selection factors for Parkinson’s disease prognosis
- Ease and cost of data acquisition,
- Type and size of data,
- Continuous data update,
- Amount of clinical attention required and
- Requirement of physical visit to clinic by patient
The evaluations are tabulated as below-
Data preprocessing
Feature engineering
Feature engineering makes data interpretable for ML algorithms. Creating new features, removing unused features, checking correlation of features, one-hot encoding categorical features, detecting outliers, grouping operations, transformation of features are part of feature engineering process. Feature selection impacts the model performance significantly. The selection is made using different feature combinations, feature importance and model evaluation.
Train Machine Learning models
Feature engineering makes data interpretable for ML algorithms. Creating new features, removing unused features, checking correlation of features, one-hot encoding categorical features, detecting outliers, grouping operations, transformation of features are part of feature engineering process. Feature selection impacts the model performance significantly. The selection is made using different feature combinations, feature importance and model evaluation.
Machine learning evaluation and predictions
For mathematical explanations please reach us to receive access to entire research chapter.
There is no perfect or 100 % accurate ML model. Continuous improvements can be rendered by recurrent cycle fetching more data, creating better features, employing advanced ML algorithms and training more robust ML models.
Implementation: Machine learning for Parkinson’s disease early prediction
With two detailed diverse diagnosis examples on open datasets, the authors provide a practical beginner’s guide of ML in PD care for healthcare professionals. Although, a conceptual summary of existing implementations in treatment would also be provided to motivate the readers in that direction.
Example 1: Voice data based early diagnosis of parkinson’s disease
OBJECTIVE
To show that how simple it is to develop a predictive binary classification ML model to distinguish PD patients from healthy controls using only human voice.
SOLUTION
To build voice-based ML classification model and explain the general ML workflow, outcome and future possible improvements. Two datasets, different ML classifier algorithms and corresponding evaluation metrics are discussed.
Data acquisition – voice data
Dataset I has structure as follows-
Data preprocessing
Feature Engineering
For dataset (II), there are 750+ features, out of which we will select only certain number of features. We will start by taking only Baseline features for initial model development. We will include a greater number of features and observe the change in output. Our aim is to demonstrate the simplest way of building a PD classification model with ML, complex feature selection technique details are avoided. We will also present results with all the features. Although it is usually recommended to perform feature selection and include only most important features in the final model. Significant amount of Feature Engineering shall be covered in Example 2: Tappy keystroke.
Train Machine Learning models
It is a methodological mistake to train the model to learn the parameters of a prediction function and testing it on the same data. Such model leads to a situation called overfitting. In this situation model would not predict correctly on yet unseen data. To avoid it, firstly we will split our feature engineered data into two datasets. Namely Train dataset and Test Dataset. Test dataset is the unseen dataset on which we verify our model performance before deploying the model into system. In our case we have split the dataset into 80% Train and 20% Test.
Cross-validation (CV) is a technique applied in machine learning to find better performing ML estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model. For this we held out some more part of the training dataset in terms of validation dataset. However, this may cause available training samples reduced to the amount which is not sufficient for learning the model. K-fold Cross Validation provides a solution to this issue by splitting the training data into k number of folds. This method is very useful for dataset with small number of samples.
The procedure for k-fold CV is as follows-
Machine Learning Classification Algorithms
I. Support Vector Machine (SVM): SVM, formally defined by a separating hyperplane, is a supervised ML algorithm which can be used for both classification or regression problems. It is believed that it works very good for classification of small dataset. For given labeled training data, SVM outputs an optimal hyperplane which categorizes new examples. In case of binary classification this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
II. Logistic Regression (LR): LR is the go-to method for binary classification problems. Logistic regression transforms its output using sigmoid function to return a probability value. Logistic regression models the probability of the default class (e.g. the first class). For example, to predict person has PD or not, then the first class could be PD patient and the logistic regression model could be written as the probability of PD patient given a person’s age. The older person will have higher probability of having PD than younger person.
III. K-Nearest Neighbors (KNN): KNN is a classifier algorithm where the learning is based on assumption that similar things are near to each other. The standard steps of KNN are –
i. Load the data
ii. Calculate distance (Euclidian) from the new data to already classified data
iii. Based on distance values do the sorting
iv. Pick k top sorted values (value of k is defined by user)
v. Count the frequency of each class that appears and select the class which appeared the most
vi. Return the value of selected class
IV. Multi-layer Perceptron (MLP) Classifier: MLP uses a Neural Network to perform classification. NN consists of at least three layers, namely input layer, hidden layer and output layer. MLPs are mainly involved in two motions, forward and backward. In the forward pass, the input is fed from the input layer through the hidden layers to the output layer, and the prediction is measured against the true labels. In the backward pass, using backpropagation is used to move the MLP one step closer to the error minimum. This is executed with algorithm such as stochastic gradient descent. This is repeated until the state known as convergence (lowest error possible) is achieved.
V. Random Forest (RF): RF is probably the most popular classification algorithm. The underlying concept for RF is based on decision tree classifier. In other words, decision trees are building blocks of RF. Unlike above mentioned algorithms, RF is ensemble learning method. Unlike above mentioned algorithms, RF is ensemble learning method. RF is an ensemble of individual decision trees. Usually a greater number of trees leads to higher accuracy. Simple example: let’s say we have features [x1, x2, x3, x4] with corresponding targets/labels as [y1, y2, y3, y4]. From the input features RF would generate three decision trees [x1, x2, x3], [x1, x2, x4], [x2, x3, x4] and prediction is based on the majority of votes from each of the decision trees created.
VI. Extreme Gradient Boosting (XGBoost): Similar to RF, XGBoost is also an ensemble learning method. It has recently been dominating applied machine learning due to it speed, performance and scalability. XGBoost is the gradient boosting decision tree implementation. XGBoost algorithm is parallelizable so it can harness the power of multi-core computers. It is parallelizable onto Graphics Processing Unit (GPUs) too which enables it to train on very large datasets as well. XGBoost provides inbuilt algorithmic advancements like regularization for avoiding overfitting, efficient handling of missing data and cross validation capability. Reader may refer the XGBoost library documentation for more details and practical implementation.
Except XGBoost, all the above algorithms are implemented from scikit-learn library. In machine learning, a hyperparameter is a parameter whose value is initialized before starting the training of model. Since hyperparameters govern the training process, all of the mentioned algorithms accomplish better results with hyperparameters tuning. We are going to implement all the classifiers with default hyperparameter values only.
Evaluation and Prediction
In this example we have applied all the described ML algorithms on the Test dataset to show the comparison of their performance with different data size, quality and number of features, number of samples etc.
Results for Parkinson’s Speech Dataset I with Multiple Types of Sound Recordings
More and effective data samples can help to improve performance of classifier such as sustained vowels such as ‘a’.
Metrics sensitivity, specificity and MCC are good indicators for selecting a right classifier
More features certainly help the performance of some ML classifiers like NN, Random Forest and XGBoost. Although trade-off like computation cost and overfitting should be considered.
For scalable prediction system algorithms such as NN and XGBoost are good choices
Employing IoT devices such as mobile or digital assistants (Google assistant, Amazon Alexa) can help collecting data as well as diagnosing PD using ML/AI powered app
NOTE: The current version does not include the Tappy stroke dataset predictions and methodological discussion, which was also conducted under the research.The present version is kept concise for web. For more interpretation or discussions on the results, please contact us.