Automatic identification of plant diseases timely and accurately is essential to improve plant products. The widely used state-of-the-art CNN-based models still faces challenges and limitations on leaf images with complex backgrounds due to lacking global receptive field and self-attention mechanism. This project proposed an automatic identification approach of plant leaf diseases based on the Vision Transformer (ViT) architecture without any convolution. The proposed ViT model was trained using an open PlantVillage dataset with 39 classes of 5,5448 leaves and background images. The dataset is randomly split into three subsets: training set (70%), validation set (20%) and testing (10%). Data augmentation method, dynamic learning rate reduction, and early stopping methods were used to avoid overfitting and save computation cost. The model achieved the best validation recognition accuracy of 97.05% with 100 training epochs and a test recognition accuracy of 97.11%. The results show that the proposed ViT model achieves state-of-the-art standard accuracy with better performance, higher degree of robustness, and lower computation cost compared with popular state-of-the-art CNN-based architectures.
0 - 0