Animal tracking and feeding monitoring is crucial for automatic individual cow welfare measurement and naturally becomes a prerequisite for autonomous livestock farming systems. The deformable body posture and irregular movement of cows under complex farming environments make tracking of individual animals in a herd very challenging. To tackle the above challenge, a deep learning network-based approach, namely, YOLOv5s-CA+DeepSORT-ViT , is proposed in this article. In our proposed approach, coordinate attention (CA)-integrated YOLOv5 was developed to capture spatial location information to improve the face detection performance for overlapping regions. Then the vision transformer (ViT) was embedded in the reidentification (reID) network Deep Simple Online and Real-time Tracking (DeepSORT) to enhance feature matching and tracking accuracy. The comparative results of the multicow complex dataset constructed from a commercial farm show that the ID F1 score (IDF1) and multitarget tracking accuracy (MOTA) of the proposed YOLOv5s-CA+DeepSORT-ViT are 88.5% and 84.4%, respectively. Meanwhile, the ID switching (ID Sw.) times and the processing time are reduced by 50% and 20% compared to the YOLOv5s+DeepSORT model. Experimental results also showed that the overall cow tracking performance of our proposed approach outperformed the other baselines (e.g. SORT, ByteTrack, BoT-SORT, and DeepSORT).