Sajadah, Kreeteeraj and Terstyanszky, Gabor and Winter, Stephen and Kacsuk, Peter K. (2008) Checkpointing of parallel applications in a grid environment. In: Kacsuk, Peter K. and Lovas, Robert and Nemeth, Zsolt, (eds.) Distributed and parallel systems: in focus: desktop grid computing. Springer, Boston, MA, pp. 179-187. ISBN 9780387794471
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1007/978-0-387-79448-8
Abstract
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.
| Item Type: | Book Section |
|---|---|
| Uncontrolled Keywords: | Checkpointing, First Order Approximation, Natural Synchronisation Points, Critical Region |
| Research Community: | University of Westminster > Electronics and Computer Science, School of |
| ID Code: | 6829 |
| Deposited On: | 07 May 2009 16:16 |
| Last Modified: | 07 May 2009 16:16 |
Repository Staff Only: item control page

