Sajadah, Kreeteeraj, Terstyanszky, Gabor, Winter, Stephen and Kacsuk, Peter K. (2008) Checkpointing of parallel applications in a grid environment. In: Distributed and parallel systems: in focus: desktop grid computing. Springer, Boston, MA, pp. 179-187. ISBN 9780387794471Full text not available from this repository.
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.
|Item Type:||Book Section|
|Uncontrolled Keywords:||Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region|
|Subjects:||University of Westminster > Science and Technology > Electronics and Computer Science, School of (No longer in use)|
|Depositing User:||Miss Nina Watts|
|Date Deposited:||07 May 2009 15:16|
|Last Modified:||07 May 2009 15:16|
Actions (login required)
|Edit Item (Repository staff only)|