Image Classification on FashionMNIST Dataset

November 2024

Impact

Successfully developed and compared five different machine learning models for fashion image classification, achieving up to 91% accuracy with CNN while providing a comprehensive analysis of each model's strengths and practical applications.

Problem Statement

With the growing need for automated fashion item classification in e-commerce and retail, there's a crucial need to understand which machine learning approaches work best for image classification tasks. Using the FashionMNIST dataset with 70,000 grayscale images of clothing items, we aimed to compare traditional machine learning methods against modern deep learning approaches to determine their effectiveness in real-world fashion applications.

Approach

We planned to implement and compare five distinct models:

  • Convolutional Neural Network (CNN)
  • K-Nearest Neighbors (KNN)
  • Logistic Regression
  • Support Vector Machine (SVM)
  • Decision Trees/Random Forest

Each model would be evaluated on the same dataset to understand their:

  • Classification accuracy and performance metrics
  • Computational efficiency
  • Practical implementation considerations
  • Real-world business applications

Methodology

  1. Data Processing:
    • Utilized 28x28 pixel grayscale images from FashionMNIST
    • Split data into training (60,000 images) and testing (10,000 images)
    • Applied necessary preprocessing for each model type
  2. Model Implementation:
    • Built CNN using PyTorch with multiple convolutional layers
    • Implemented KNN with optimized K value
    • Developed Logistic Regression with multinomial classification
    • Created SVM with kernel optimization
    • Constructed Decision Trees with appropriate depth
  3. Evaluation:
    • Measured accuracy, precision, recall, and F1-score
    • Created confusion matrices for each model
    • Analyzed computational requirements
    • Assessed practical applications

Result

  • CNN achieved highest accuracy at 91%
  • Logistic Regression performed well at 84%
  • KNN showed strong results at 85%
  • SVM demonstrated robust performance at 89%
  • Decision Trees achieved reasonable accuracy at 82%

The project demonstrated that while CNN provides the best accuracy, simpler models like Logistic Regression and KNN offer practical advantages in terms of implementation simplicity and computational efficiency. This analysis provides valuable insights for businesses choosing between different classification approaches based on their specific needs and resources.