Each month we bring our Datacamp-Ishango.ai scholarship community together to exchange ideas, build connections, and discover the latest developments in the data science field. At our most recent community event, Rebecca Ndukwe, a data scientist and scholarship recipient, presented on Machine learning classification on a reduced high-dimensional dataset. Read on to learn more about her presentation and what we learned.
Rebecca kicked off her presentation by delving into three pivotal machine-learning classification tools, each offering distinct capabilities and applications:
- Logistic Regression
The first tool to come into the spotlight was logistic regression. Binary classification tasks are expertly handled by this statistical technique. It becomes relevant when the outcome variable has categorical values, such as “yes” or “no” or differentiating between spam and legitimate emails. Rebecca clarified its underlying ideas and practical applications.
- Bayesian Logistic Regression
The next method she spoke about was Bayesian logistic regression, which is a development of the traditional logistic regression strategy. Rebecca elaborated on its advantages, particularly the way Bayesian inference is taken into account. The robustness of forecasts is strengthened by this technique by taking into account prior knowledge. The learners understood how Bayesian logistic regression may be a crucial weapon in the data scientist’s toolbox.
- Support Vector Machine (SVM)
Rebecca rounded out this section with a discussion of the Support Vector Machine (SVM). SVM identifies the optimal hyperplane for segregating data into distinct classes, making it exceptionally effective for intricate datasets. Rebecca emphasised the importance of it, particularly in light of today’s intricate data problems.
In the second part of her presentation, Rebecca underscored the paramount importance of dimensionality reduction, particularly when grappling with datasets boasting numerous dimensions. The technique she highlighted was Singular Value Decomposition (SVD), a matrix factorization method.
Rebecca clarified how SVD helps to cut down on the number of features while maintaining important data. She emphasised how SVD can be a useful tool, bringing new levels of classification performance and simplifying complex datasets.
Rebecca concluded her presentation with insights and predictions based on the tools and techniques discussed. She confidently asserted that the Support Vector Machine (SVM) is poised to outshine the other models, particularly when cleverly applying kernel tricks.
Surprisingly, the culmination of results across all four datasets revealed a striking outcome. The Bayesian Logistic model emerged as the star performer, demonstrating exceptional accuracy and consistent area under the Receiver Operating Characteristic (ROC) curve. Following closely behind, the Support Vector Machine (kernel: Laplace) showcased commendable performance.
In essence, Rebecca’s presentation offered a deep dive into the world of machine learning classification, equipping the audience with a heightened understanding of these tools and techniques.