Validating CDMI Features – Metadata Search

Here we go again with an announcement of a cloud offering that again validates an existing standardized feature of CDMI. The new Amazon CloudSearch offering lets you store structured metadata in the cloud and perform queries on the metadata. They missed an opportunity, however, to integrate this with their existing cloud object storage offering. After all, if you already have object storage, why not put the metadata with the data object instead of separating it out in a separate cloud?

CDMI lets you put the user metadata directly into the storage object, where it is protected, backed up, archived and retained along with the actual data. CDMI’s rich query functions are then able to find the storage object based on the values of the metadata without talking to a separate cloud offering with a new, proprietary API.

CDMI standardizes a Query Queue that allows the client to create a scope specification (equivalent to a WHERE clause) to find specific objects that match the criteria, and a results specification (equivalent to a SELECT clause) that determines the elements of the object that are returned for each match. Results are placed in a CDMI queue object and can be processed one at a time, or in bulk. This powerful feature allows any storage cloud that has a search feature to expose it in a standard manner for interoperability between clouds.

An example of the metadata associated with a query queue is as follows:

{
     "metadata" : {
          "cdmi_queue_type" : "cdmi_query_queue",
          "cdmi_scope_specification" : [
               {
                    "domainURI" : "== /cdmi_domains/MyDomain/",
                    "parentURI" : "starts /MyMusic",
                    "metadata" : {
                         "artist" : "*Bono*"
                    }
               }
          ],
          "cdmi_results_specification": {
               "objectID" : "",
               "metadata" : {
                    "title" : ""
               }
          }
     }
}

 

When results are stored in a query queue, each enqueued value consists of a JSON object of MIME-type “application/json”. This JSON object contains the specified values requested in the cdmi_results_specification of the query queue metadata.

An example of a query result JSON object is as follows:

{
     "objectID" : "00007E7F0010EB9092B29F6CD6AD6824",
     "metadata" : {
          "title" : "Vertigo"
     }
}

Thus if you are using your storage cloud for storing music files, for example, all of the metadata for each mp3 object can be stored right along with the object, and CDMI’s powerful query mechanisms can be used to find the files you are interested in without invoking a separate search cloud with disassociated metadata,

Validating CDMI features – Object Expiration

Validating yet another feature of the CDMI standard (see previous post for an earlier one), Amazon announced their Object Expiration feature for S3. While not a new concept for storage interfaces, it is the first cloud implementation of this capability that I know of. The idea is simply to have the server side of the cloud do object deletion on your behalf automatically, once the lifecycle of that data has completed.

As part of overall Data Lifecycle Management, object deletion is the most common terminal state for data. CDMI has standardized the interface for this capability in cloud storage with a comprehensive Retention and Hold Management feature (Chapter 17). The granularity of the standard CDMI feature is finer than that of the S3 feature in that it allows for retention and deletion on individual objects (although you could accomplish this in S3 with prefix = object name, it doesn’t scale using the header fields that Amazon uses). The S3 prefix mechanism can be used to scope the expiration policy down to individual “directories” (forward slash terminated parts of object names), and CDMI allows this also for the semantically equivalent CDMI sub-containers.

Complying with Regulations

Although the ability to delete objects when their lifecycle completes is useful, it is insufficient for complying with regulations such as Sarbanes-Oxley, or for eDiscovery needs during litigation. For most enterprises, they need to show that the data has not been modified during its lifecycle. In addition, if a subpoena is issued for the data – you DO NOT want the object deleted, even if it’s retention period has expired – this can cost you millions of dollars in a pending court case…

The CDMI standard anticipates that storage clouds will want to offer a more robust, full featured retention and hold management for corporate data, and that a standard means of achieving it will be needed. Take a quick look at Chapter 17 (it’s quite compact while being comprehensive) and investigate using the standard way to achieve this function. If you are a cloud vendor trying to emulate the S3 interface, good luck to you – Amazon will continue to expand the definition of what “S3” means (like adding this feature), forcing you to constantly modify your cloud’s storage interface to keep up (as well as requiring you to reverse engineer any bugs that exist).

Validating CDMI features – Server Side Encryption

One of the features of many storage systems and even disk drives is the ability to encrypt the data at rest. This protects against a specific threat – the disk drive going out the back door for replacement or repair. So it was only a matter of time before we would see this important feature start to be offered for Cloud Storage as well. Well, today Amazon announced their Server Side Encryption capability for their S3 cloud offering. This feature was anticipated by the CDMI standard interface when it was finalized as a standard back in April 2010.

Standard Server Side Encryption

So, how does CDMI standardize this feature? Well, as usual, it starts with finding out if the cloud actually supports the feature and what choices are available. In CDMI, this is done through the capabilities resource – a kind of catalog or discovery mechanism. By fetching the capabilities resource for objects, containers, domain or queues, you can tell whether server side encryption of data at rest if available from the cloud offering (yes this is granular for a reason). The actual capability name is: cdmi_encryption (see section 12.1.3). This indicates that the cloud can do encryption for the data at rest, but also indicates what algorithms are available to do this encryption. The algorithms are expressed in the form of: ALGORITHM_MODE_KEYLENGTH, where:

“ALGORITHM” is the encryption algorithm (e.g., “AES” or “3DES”).

“MODE” is the mode of operation (e.g.,”XTS”, “CBC”, or “CTR”).

“KEYLENGTH” is the key size (e.g.,”128″,”192″, “256”).

So the cloud can offer the user several different algorithms of different strengths and types, or if it only offers a single algorithm (such as the Amazon offering), the cloud storage client can at least understand what that algorithm is.

So how does the user tell the cloud that she wants her data encrypted? Amazon does this with a proprietary header of course, but CDMI does it with standard Data System Metadata that can be placed on any object, container of objects, queue or domain. This metadata is called cdmi_encryption (see section 16.4), and contains merely a string with a value chosen from the list of available algorithms in the corresponding capability. There is also a cdmi_encryption_provided metadata value to tell the client whether their data is being encrypted or not by the cloud.

Lastly, there is a system-wide capability called cdmi_security_encryption (section 12.1.1) that tells the user whether the cloud does server side encryption at all.

Server side encryption is an important capability for cloud storage offerings to provide, which is why CDMI standardized this in advance of having cloud offerings available. We expect more clouds to offer this in the future, and customers to soon realize that – without CDMI implementations, these offerings are locking them in and causing a high cost of exiting that vendor.

Join the Cloud Storage Movement at SNIA’s Winter Symposium 2011

Every year the Storage Networking Industry Association (SNIA) has a gathering of their members in San Jose to coordinate the work of the various Technical Work Groups, Forums and Initiatives. This year the Symposium will take place January 24th – 27th, 2011 at the Sainte Claire Hotel in San Jose, CA. SNIA opens this Symposium to non-SNIA members who are evaluating membership, so feel free to attend. Please Register for the Symposium if you plan to be there in person.

SNIA Cloud Events

The Cloud Storage Technical Work Group (TWG) kicks off a multi-day face to face session starting at 1:00pm PT on Monday. We will be discussing the submission of CDMI for international standardization and continuing to discuss the scope of the next minor release (1.1) of CDMI. Topics include Federation and NoSQL among others. Bring your own ideas for how to improve CDMI. The full agenda has been posted publicly.

On Wednesday, the Cloud Storage Initiative will give an overview of their activities at a breakfast session starting at 8:30am. Then at noon on Wednesday, be sure and join us for the 2011 Activities Kickoff presentation in the Grande Ballroom. We will be showcasing all of the upcoming activities that you will want to be involved with over the next year. This session will be live streamed if you cannot make it in person. Regardless of whether you will be there in person or remote, please register for this update event (in addition to the Symposium registration above). More information.

Wednesday afternoon is the meeting of the Cloud Storage Initiative from 1-5pm (also in the Grande Ballroom). Be sure and join us and help plan the activities for the upcoming year.

Lastly, on Wednesday night there will be a Birds of Feather (BOF) session on a new group that is forming for the Archive and Preservation in the Cloud.

Whereas with Cloud Backup, the cloud is simply a repository of backup data, with Cloud Archive and Preservation, the Cloud is where the active processes occur that ensure long term retention, preservation and viability of data.
CDMI is uniquely designed to accommodate these needs with the Data System Metadata that it standardizes.
Cloud providers see the ability to offer more than just a best effort storage area with the promise of being the trusted steward of information for the long term.
Additional services such as eDiscovery and automatic format conversion can easily be offloaded to the cloud reducing costs.

Please join us Wednesday evening from 5:30pm – 7:00pm in the Grande Ballroom for a Birds of Feather session to kick off the formation of the CSI Archive/Preservation Special Interest Group (SIG). Light refreshments will be provided. If you would like to participate remotely, please use the following call in information:
Toll Free: 866-244-8528
International:+1-719-457-0816
Passcode: 510843#
Webex: http://snia.webex.com, Meeting Name: Archive and Preservation SIG
Meeting Password: cloud2011

Why not pick one of the “open” APIs instead of CDMI?

There is a post by Jerry Huang , CEO of Gladinet on the problems with trying to be compatible with Amazon’s S3 API. Jerry suggest you look at OpenStack or a common library instead.

Amazon’s API (as with any cloud vendor’s API) is a moving target for sure, but the main issue is that these APIs are under the change control of a single vendor. Doesn’t matter how “open” the API is (in terms of copyright license) because the vendor can change it to disadvantage a competitor. So if you are a competitor, you would be foolish to use that API as the only interface into your cloud. So what happens? Each cloud vendor releases their own “open” API – similar but slightly different (enough to get around copyright), almost always RESTful and pretty much they all do the same thing.

So, you get the situation we have today with rapid proliferation of many different interfaces all pretty much the same. But that doesn’t help the poor clients. They have to code to N different interfaces to work with N different clouds. And since they are rapidly evolving, they have to keep up with all these API changes over time.

The Cloud Storage standard CDMI does not have this problem. CDMI is under the change control of a standards body (SNIA) and accommodates requirements from all the cloud storage players in it’s standardization process. More importantly, it was developed under the SNIA IP policy to help prevent any of the specification author companies from gaming the spec with their Intellectual Property. Thus cloud vendors can pick up the CDMI specification and implement it with confidence. They don’t need to come up with their own API. CDMI also has a standard way to extend the specification for vendor specific functions that still allows for core compatibility with other vendors. Want to do versioning? There is an example vendor extension in CDMI that shows you how.

From a client side point of view, Jerry also mentions common libraries. Jclouds is a good example of this (for Java). There also common libraries for other languages. While that can insulate a client from the many proliferating APIs, it’s a tough task to keep that library up to date with these APIs (just ask Adrian). The sooner the various cloud providers can implement the CDMI standard (even along-side of their existing ones), the sooner common libraries like Jclouds can just maintain a single adapter to a standard API.

SNIA Cloud Activities for 2010

Given that it’s the middle of summer it may be hot where you are, but the SNIA Cloud activities are heating up for the remainder of this year, and you don’t want to be left out.

SNIA Summer Symposium

At the end of July every year SNIA hosts a Symposium in San Jose for all the groups. The Cloud Storage TWG will be meeting from Monday afternoon through Thursday morning. The agenda is posted publicly and non-SNIA members are encouraged to attend.

Also at the Symposium Monday night is a Birds of Feather (BOF) session where we will be doing a demo of CDMI and OCCI working together in a common infrastructure. There will be time for details on the implementation and discussion afterward.

Thursday morning will be a special session to update folks on the SNIA Cloud activities for the remainder of the year. Besides the in person session at the Symposium, the session will also be broadcast as an online Webinar for folks who cannot make it in person. More information and a registration link is available on the SNIA Website.

Storage Developer Conference

#alttext#
In September will be the annual Storage Developer Conference (SDC) and this year Cloud is a big part of the agenda. There will be a CDMI Plugfest throughout the week, a Cloud Hands on Lab for developers, and Cloud Tracks all week including some big cloud related keynotes. But *wait* there’s more. Following SDC at the same hotel on Thursday September 23rd will be the…

SNIA Cloud Burst Event

#alttext# This is an event that is squarely focused on Cloud Storage and brings together end users, cloud providers and storage vendors for a unique experience including demos, a showcase and in depth sessions on this part of the overall cloud industry. More information is available on the Cloud Burst page.

Storage Networking World

For the past two SNWs, there has been a Cloud Pavilion with great traffic and interest from the attendees for those that participate. At this fall’s SNW in Dallas, we will repeat this successful program with a limited number of slots. In addition we will again have a hands on lab for cloud that is always well attended (by end users only). If you are looking for a speaking opportunity, please consider being a sponsor of the cloud summit at SNW where end users come to learn about the cloud and the offerings that are available.

SNW Europe

Last year SNW Europe was a huge success for the SNIA Cloud Participants, with a year over year increase in record attendance. This year will see an increasing set of activities around the cloud, including a new Cloud Pavilion and Hands on Labs. There are a limited number of slots for these and they will sell out early. Included is an opportunity for a speaking engagement as well.

“Membership has it’s privileges”

Many of these opportunities are open only to Cloud Storage Initiative (CSI) member companies. The membership fees help to fund these activities for the members and augment the work of the volunteers with paid resources. If you can help get your company involved, please contact Marty Foltyn (marty@bitsprings.com) for more information.

Gluecon Cloud Conference

As the “cloud” becomes a common platform, web applications still live in a “stovepipe” world. It’s not a question of “should we move to the cloud?” It’s a question of once some, or most, or all of our web applications live in the cloud, how do we handle the problems of scalability, security, identity, storage, integration and interoperability?

What was the problem of “enterprise application integration” in the late 90s, is now the cambrian explosion of web–based applications that will demand similar levels of integration. The problem, put simply, is how to “glue” all of these apps, data, people, work–flows, and networks together.

Glue is the only conference devoted solely to this new problem–set facing architects, developers and integrators. At Glue, we’ll explore the new technologies that are forming around web applications in a post–cloud world.

SNIA’s own Mark Carlson will be presenting on CDMI at Gluecon, and you can check out the rest of the Gluecon agenda here: http://www.gluecon.com/2010/Glue2010_Agenda.htm

If you’re interested in attending Gluecon, use the code, “snia1” to receive 10% off of the registration. If you’re interested in sponsoring Gluecon, please contact Eric Norlin at enorlin@mac.com.

Enabling Cloud Service Brokerage

One of the most interesting use-cases for the pending SNIA CDMI Cloud Storage standard involves Cloud service brokerage.  Cloud Storage services in particular. The Peering capabilities enabled via CDMI will further facilitate the emergence of a new business category in Cloud Computing – The Cloud Broker.

Recently advocated by leading analyst firms, this role would mitigate key risks around the federation of various Cloud services.  Even the US Federal CIO has recently spoken out regarding this key characteristic of upcoming Government Cloud services and related NIST Cloud Computing standards.

Data integration, integrity, portability and security are among the many issues Cloud Brokers are tackling on behalf of Enterprise Cloud customers. See this diagram for some of the useful Cloud Storage relationships enabled by CDMI for configurations which would address these issues.

Members of the SNIA CDMI technical working group and Cloud Storage Initiative will be at SNW Orlando this week. (1. presenting vendor-neutral tutorials, 2. staffing the Cloud Storage pavilion in the Exhibit Area, 3. assisting the service providers in the Cloud Hands-on-Labs and 4. leading a Birds of a Feather (BoF) session on Cloud Storage)  See 2 posts below for full details.

Please drop by and share your thoughts with us on the role of Cloud Brokers – and any other Cloud Storage topic of interest to you!

Cloud Data Management Interface Overview

Lots of people are wondering what CDMI offers, but don’t have time to read the actual specification to understand all it’s features in depth.

Here is a short (20 min) overview of the CDMI features and architecture that should get you started.

Get the Flash Player to see this video.


You can also download the overview presentation for viewing offline.

Draft 1.0 Cloud Storage Specification Available

The SNIA has released a final working draft of the Cloud Data Management Interface (CDMI) version 1.0. The folks that have put this specification together in record time include:

Company – Individual
Bycast Inc. – David Slik
Cisco Systems – Mike Siefer
Hitachi Data Systems – Eric Hibbard
Iron Mountain – Chris Schwarzer
NetApp Inc. – Alan Yoder
NetApp Inc. – Lakshmi N. Bairavasundaram
Olocity – Scott Baker
Oracle – Mark Carlson
QLogic – Hue Nguyen
Individual – Rich Ramos

Thanks as well for the comments from lots of folks outside of the SNIA. These have all been incorporated into this latest public review draft.

The specification is now in front of the SNIA membership for a vote to become a SNIA Architecture industry standard. It will also become both a National (ANSI) and International (ISO) standard down the road.

I have heard lots of FUD around standardization of cloud interfaces, including the notion that standards take “too long” and that they cannot evolve quickly enough to accommodate the rapid evolution of this new industry. The SNIA Cloud Storage TWG that created this specification was formed last April and has produced a robust 1.0 version of a cloud standard in less than a year – record time for a standards body.

The CDMI standard is extensible by vendors to cover the complete functionality of their cloud and can also be “shrink to fit” for clouds that don’t have all the bells and whistles of the full specification. These extensions are expected to be candidates for future version of the specification as multiple vendors implement them.

Storage and Cloud vendors who have putting off implementing CDMI until 1.0 was available, now no longer have an excuse! Customers of public and private clouds can now start asking their vendors to show adoption of the standard in their roadmaps. Tools and common API libraries can now proceed to add support for a standard cloud storage interface.

To get the Draft 1.0 CDMI specification, go here:

http://snia.org/cloud