About
How It Works
The SVM Visualizer is a web application built to make machine learning experimentation and visualization accessible and interactive. It wraps around the scikit-learn library, a widely used Python toolkit for machine learning, to allow users to visualize Support Vector Machines (SVMs) on 2D and 3D datasets.
Users can experiment with various SVM kernel methods (Linear, Polynomial, RBF, etc.), input their custom training and testing datasets, and observe how different methods influence the decision boundaries and classification results.
Intended Usage
- Dataset Preparation: Users can upload or manually input 2D or 3D datasets. Sample datasets are also provided for quick testing. Higher dimensions can be used but only the first 3 dimensions will be plotted.
- Model Selection: Choose an SVM kernel method (e.g., Linear, Polynomial, RBF) or other classifiers such as KNN, Decision Tree, or Random Forest.
- Training: Submit the prepared dataset for training. The app schedules training tasks asynchronously and provides status updates (e.g., pending or complete).
- Visualization: Once training is complete, visualize the decision boundaries and classifications on the provided dataset. Confidence levels are represented visually in 2D and 3D graphs.
- Export CSV Export the classification results and other metrics in CSV format for further analysis.
Supported Kernels
Linear Kernel
The Linear Kernel is the simplest kernel method and is used when the data is linearly separable. It is computationally efficient and effective for datasets where a linear decision boundary can separate the classes. This kernel is often used in text classification tasks and other scenarios with high-dimensional data.
Polynomial Kernel
The Polynomial Kernel can model more complex relationships by introducing polynomial features of the input data. It is well-suited for datasets where the relationship between features is non-linear and higher-order interactions are important. The degree of the polynomial can be adjusted to control the complexity of the model.
RBF (Radial Basis Function) Kernel
The RBF Kernel, also known as the Gaussian Kernel, is widely used for non-linear data. It maps the input features into an infinite-dimensional space, allowing for complex decision boundaries. This kernel works well in scenarios where the relationship between features is not linear or polynomial.
Log Regression Kernel
Logistic Regression Kernel applies logistic regression as the classification method, useful for binary classification problems. It provides probabilities for class membership, making it suitable for tasks where confidence levels are important.
KNN (K-Nearest Neighbors)
The KNN Kernel classifies points based on the majority class of their nearest neighbors. It is simple, interpretable, and effective for datasets with well-separated clusters. However, it may struggle with high-dimensional data due to the curse of dimensionality.
Decision Tree
The Decision Tree Kernel uses a tree-like model of decisions to classify data points. It is intuitive, interpretable, and handles both numerical and categorical data. It works well on smaller datasets but can overfit if not properly regularized.
Random Forest
The Random Forest Kernel is an ensemble method that combines multiple decision trees to improve classification accuracy and reduce overfitting. It is robust, versatile, and performs well on a wide range of datasets, especially when feature importance needs to be evaluated.
Features
- Interactive data manipulation with real-time table editing.
- Support for 2D and 3D datasets with dynamic visualization.
- Multiple SVM kernel methods and classifier options.
- Ability to load sample datasets or import CSV files.
- Asynchronous background training with progress updates.