Sajadah, Kreeteeraj and Terstyanszky, Gabor and Winter, Stephen and Kacsuk, Peter K. (2008) Checkpointing of parallel applications in a grid environment. In: Kacsuk, Peter K. and Lovas, Robert and Nemeth, Zsolt, (eds.) Distributed and parallel systems: in focus: desktop grid computing. Springer, Boston, MA, pp. 179-187. ISBN 9780387794471
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1007/978-0-387-79448-8
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.
|Item Type:||Book Section|
|Uncontrolled Keywords:||Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region|
|Research Community:||University of Westminster > Electronics and Computer Science, School of|
|Deposited On:||07 May 2009 16:16|
|Last Modified:||07 May 2009 16:16|
Repository Staff Only: item control page