When there exists camera and scene motion, the disparity of a pixel temporally varies as time goes on. Such temporal disparity variation (TDV) degrades the performance of spatiotemporal stereo matching. In this paper, we devise a robust similarity measure against TDV, and a suitable optimization technique for the proposed measure. We first design the window-based matching cost to evaluate the similarity between pixels for given disparity and a TDV value. We also present the improved spatiotemporal guided-filter-based aggregation technique to gather match costs with temporal weights. The disparity and TDV maps are then obtained by the global optimization. Here, to handle the large number of labels (disparity levels x TDV levels), we use a dual-layer belief propagation that requires less computation and memory while producing comparable results with the belief propagation using a single layer. Experimental results show the proposed method yields consistent and accurate disparity maps under the TDV.