Single image deblurring from a motion-blurred image is a challenging computer vision problem because frame-based images lose information during the blurring process. Several attempts have been explored using event cameras, which are bio-inspired sensors with a high temporal resolution, to compensate for the loss of motion information. However, due to the shortage of datasets in which image and event data are set in per-pixel alignment, recent studies still depend on event datasets generated by the simulator. In real scenarios, it is technically difficult to obtain per-pixel aligned event-RGB data since event and frame cameras have different optical axes. For the application of the event camera, we propose the practical stereo setup composed of a single frame-based camera and a non-coaxial single event camera. In this setup, the disparities of events are essential for the per-pixel alignment of events to image data. To this end, we propose an end-to-end learning framework for image deblurring using a practical stereo setup via distillation learning. The proposed target deblurring network (student) on events, in which disparity exists due to non-coaxiality, finds pixel-level correspondences with images via knowledge distillation from the source network (teacher) trained with a per-pixel aligned event for improving the restoration of sharp images. Once training is done, the source network can be removed, resulting in efficient deblurring for speed without extra inference cost. Our extensive experiments on large-scale datasets demonstrate that the proposed method achieves significantly better results than the prior works in terms of performance and speed, and can be applied for practical uses of event cameras.