With the rapidly growing demand for accurate localization in real-world environments, visual simultaneous localization and mapping (SLAM) has received significant attention in recent years. However, those existing methods still suffer from the degradation of localization accuracy in long-term changing environments. To address these problems, we propose a novel long-term SLAM system with map prediction and dynamics removal. First, a visual point-cloud matching algorithm is designed to efficiently fuse 2D pixel information and 3D voxel information. Second, each map point is classified into three types: static, semistatic, and dynamic based on the Bayesian persistence filter (BPF). Then we remove the dynamic map points to eliminate the influence of those map points. We can obtain a global predicted map by modeling the time series of semistatic map points. Finally, we incorporate the predicted global map into a state-of-the-art SLAM method, achieving an efficient visual SLAM system for long-term, dynamic environments. Extensive experiments are carried out on a wheelchair robot in an indoor environment over several months. The results demonstrate that our method has better map prediction accuracy and achieves more robust localization performance.