Checkpointing of parallel applications in a grid environment

Sajadah, Kreeteeraj and Terstyanszky, Gabor and Winter, Stephen and Kacsuk, Peter K. (2008) Checkpointing of parallel applications in a grid environment. In: Distributed and parallel systems: in focus: desktop grid computing. Springer, Boston, MA, pp. 179-187. ISBN 9780387794471

Full text not available from this repository.
Official URL: http://dx.doi.org/10.1007/978-0-387-79448-8

Abstract

Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.

Item Type: Book Section
Uncontrolled Keywords: Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region
Subjects: University of Westminster > Science and Technology > Electronics and Computer Science, School of (No longer in use)
Depositing User: Miss Nina Watts
Date Deposited: 07 May 2009 15:16
Last Modified: 07 May 2009 15:16
URI: http://westminsterresearch.wmin.ac.uk/id/eprint/6829

Actions (login required)

Edit Item (Repository staff only) Edit Item (Repository staff only)