Activity in 2011

Our work in 2011 addressed all three directions defined in the DataCloud@Work proposal:

  • Direction 1. Using BlobSeer for sharing application data in a IaaS
  • Direction 2. Using BlobSeer as a cost-effective storage service built on top of multiple IaaS'es
  • Direction 3. Using BlobSeer for VM management to build a highly-available IaaS

For the first direction, we aimed to offer advanced data sharing facilities to both applications running within distinct VMs deployed in Infrastructure-as-a-Service environments and to clients accessing BlobSeer as a Cloud storage service. Tasks 1-3 described below studied two sub-topics within this direction, focusing on security aspects, on enhancing BlobSeer with self-* properties and on exploiting BlobSeer's features for scientific applications. Task 4 was developed in the context of our second research direction, which deals with efficient data accesses for cloud federations. Finally, the third research topic evolved through Task 5.

Task 1: Adaptive security management in BlobSeer

Goals. The increasing popularity of Cloud computing results in a need for efficient and secure data management. One of the most relevant security topic in cloud data management refers to preventing the users from damaging the stored data or from breaking security policies and data-access protocols. We aim to further improve the security of a large scale data management system such as BlobSeer. The goal is to introduce adequate authentication and authorization mechanisms for BlobSeer users and preserve their privacy through anonymization. Another goal is to extend the security framework that protects the system against malicious usage with adaptive security policies that take into account the past actions of each user. Furthermore, we aim to provide a secure environment to deploy web services over BlobSeer. Each user must have the capability to deploy his own web services that rely on BlobSeer as a data management backend. In addition, a user must be able to securely invoke these services, request access to the services of other users or grant access to the services he has deployed.

Results. We  proposed  a  novel  security  layer  for  the  BlobSeer  data  management  system as well as a  number  of  security  enhancements  to  ensure  a  practicable  and  secure  client  access  management. Our solution offers certificate management, encryption capabilities, as well as credential management and access control lists. Using these mechanisms we enable authentication, authorization and secure data transfer. The proposed solution was integrated into BlobSeer, which offers high performance for data transfer and efficient data management. We tested our solution, proving that it handles all the security tasks very efficiently,without adding any significant overhead to the data management system, thus preserving the overall performance of the system. We also focused on Cloud infrastructures and we developed mechanisms to allow secure access to web services in a Cloud environment for data intensive web services using BlobSeer as a data management backend. We developed an efficient system which provides an adequate level of security for web service-based applications which includes rights management and secure communication.

Several Romanian students were involved in this task for their BS theses, at PUB:

Improving the Security of BlobSeer

(Click for more information)

  • Student: Cristian Marinescu (PUB)
  • PUB Advisor: Catalin Leordeanu (PUB)

BlobSeer Client Access Control

(Click for more information)

  • Student: Raluca Baban (PUB)
  • PUB Advisor: Catalin Leordeanu (PUB)

Secure Access to Cloud Services using BlobSeer

(Click for more information)

  • Student: Oana Goanta (PUB)
  • PUB Advisor: Catalin Leordeanu (PUB)

Task 2: Autonomic Behavior in BlobSeer based on Introspection

Goals. The autonomic management of a distributed storage system aims to support its adaptive steering towards an optimized performance and resource consumption, without the need for human interference. One approach to enhance BlobSeer with self-* properties is by enabling a dynamic allocation scheme for the data providers, that takes into account information provided by an introspection layer (e.g., number of accesses per provider, location awareness, transfer and storage cost ratio). We further aim to introduce mechanisms for deleting data from BlobSeer, in order to support both the adaptive replication (by allowing the decrease of the replication factors) and the self-protection from malicious clients that could overload the system by writing large amounts of data to affect the total available disk space.

Results. We continued our work on enabling BlobSeer with self-adaptive features by dynamically maintaining the replication factors of the data. We enhanced the data replication module with the ability to automatically decrease the data replication factor by means of real-time monitoring with MonALISA. The
 support the
 BlobSeer's Replication
needed value for the replication 
 The decision is further enforced by consistently updating the metadata information to reflect the updates. We also addressed the issue of garbage collection in the BlobSeer data management system. Malicious users can add false data or use inefficiently the available storage space. Taking this into consideration and the fact that data is always created and never overwritten in the system, the issue of being able to delete unwanted data becomes crucial. As a result, we proposed a BlobSeer data deletion algorithm which can be used to eliminate false data that would otherwise pollute the system.

Two Bachelor theses at PUB focused on this task:

Dynamic Replicas Contraction in BlobSeer

(Click for more information)

  • Student: Alexandra Firica (PUB)
  • Inria Advisor: Alexandru Costan (KerData)

BlobSeer False Data Deletion

(Click for more information)

  • Student: Mihaela Badiu (PUB)
  • PUB Advisor: Catalin Leordeanu (PUB)

Task 3: BlobSeer-based data management for scientific applications

Goals. This task aims to enable BlobSeer as a storage service for large datasets generated and processed by scientific applications. In this context, the BlobSeer storage system will offer advanced data-sharing facilities to processing tasks running within distinct VMs in IaaS environments. to integrate BlobSeer with some well-known open-source IaaS platforms, such as Nimbus and OpenNebula. Furthermore, the goal is to explore the ways to take advantage of the BlobSeer's scalable architecture, high throughput under heavy concurrency and versioning support to increase the performance of scientific workflows.

Results. We integrated BlobSeer as a backend for Cumulus, the data storage service provided by the Nimbus platform. On the one hand, we used BlobSeer to store VM images and to improve the performance of VM deployments, by taking advantage of the concurrency-optimized data accesses in BlobSeer. On the other hand, we evaluated the performance of using Cloud storage services for application data. We focused on a data-intensive, climate modeling application called Cloud Model1 (CM1). We executed CM1 in a Nimbus Cloud environment and we used the BlobSeer-based Cumulus service to store its output. We evaluated our approach through large-scale experiments performed on Grid'5000. Furthermore, we investigated the cost of executing MapReduce applications in Cloud environments, in order to find a proper trade-off between cost and performance for this class of applications. We compared the runtime performance of several MapReduce applications executed within the Hadoop framework, in two similar environments: clusters belonging to the Grid’5000 platform and virtual machines deployed on a Nimbus Cloud hosted by Grid’5000 nodes. We are planning to submit a paper on this work to an international conference.

PhD students involved: Alexandra-Carpen Amarie (KerData, INRIA), during a 2-month internship at Argonne National Lab.

Task 4: Federation of Cloud Computing Infrastructures

Goals. This task aims to study the challenges of enabling transparent Cloud federation, so as to easily share resources across multiple clouds. On the one hand, the goal is to study systems to create MapReduce execution platforms on top of federated clouds. Another goal is to optimize the behavior of the storage layer in a federated clouds environment and, more specifically, to explore cost-based optimizations for migrating BlobSeer components.

Results. We implemented Resilin, a service able to federate resources from multiple clouds, which provides similar functionalities with Amazon Elastic MapReduce. Our system offers more flexibility as users can choose between diff erent types of virtual machines, operating systems or Hadoop versions.

A Master student focused on the following task during her internship at Inria:

Resilin: Elastic MapReduce execution platforms on federated clouds

(Click for more information)

  • Student: Ancuța Iordache (West University of Timișoara)
  • Inria Advisor: Pierre Riteau (Myriads)

Task 5: Efficient Virtual Machine management in Clouds

Goals. This research direction was developed during Bogdan Nicolae's postdoc within the INRIA-UIUC Joint Laboratory for Petascale Computing, started in January 2011. The goals was to enhance the BlobSeer-based virtual machines storage system with various management operations such as VM migration (for preventive fault tolerance), by leveraging the global data availability and the efficient versioning support provided by BlobSeer.

Results. We proposed and implemented a complete virtual machine storage solution based on BlobSeer that relies on a lazy VM deployment scheme to fetch VM image content as needed by the application during its runtime, greatly improving deployment time in scenarios where hundreds of VM machines are simultaneously instantiated. Furthermore, this storage solution leverages cloning and shadowing as exposed by BlobSeer to provide high-performance and completely transparent snapshotting support. Several optimizations such as adaptive prefetching and efficient live storage migration were added later on. We obtained significant improvement over state-of-art both in terms of performance and generated network traffic while supporting a series of additional features at no extra cost. These results materialized in a series of associated publications.

Postdoc fellows involved: Bogdan Nicolae (JLPC), Alexandru Costan (KerData).

Long visits of Junior Researchers in 2011

  • Florin Pop (Postdoc at PUB) visited KerData for 1 month. Topic: evaluation of BlobSeer as a backend for scientific data aggregation systems. This visit led to Task 1 in the 2012 work program.
  • Ciprian Dobre (Postdoc at PUB) visited KerData for 1 month. Topic: experiments with BlobSeer used in context-aware applications. This visit led to Task 2 in the 2012 work program.
  • Elena Apostol (PhD student at PUB) visited KerData for 3 months. Topic: deploying MapReduce based multimedia applications in Clouds, using BlobSeer as a storage backend. This work led to Task 3 in the 2012 work program.

Page Tools