Solving Cloud Object Storage Incompatibilities in a Multi-Vendor Community

The SNIA Cloud Storage Technologies Initiative (CSTI) conducted a poll early in 2024 during a live webinar “Navigating Complexities of Object Storage Compatibility,” citing 72% of organizations have encountered incompatibility issues between various object storage implementations. These results resulted in a call to action for SNIA to create an open expert community dedicated to resolving these issues and building best practices for the industry.

Since then, SNIA CSTI has partnered with the SNIA Cloud Storage Technical Work Group (TWG) and successfully organized, hosted, and completed the first SNIA Cloud Object Storage Plugfest (multi-vendor interoperability testing), co-located at SNIA Developer Conference (SDC), September 2024, in Santa Clara, CA. Participating Plugfest companies included engineers from Dell, Google, Hammerspace, IBM, Microsoft, NetApp, VAST Data, and Versity Software. Three days of Plugfest testing discovered and resolved issues, and included a Birds of a Feather (BoF) session to gain consensus on next steps for the industry. Plugfest contributors are now planning two 2025 Plugfest events in Denver in April and Santa Clara in September.

It’s a collaborative effort that we’ll discuss in detail on November 21, 2024 at our next live SNIA CSTI webinar, “Building a Community to Tackle Cloud Object Storage Incompatibilities.” At this webinar, we will share insights into industry best practices, explain the benefits your implementation may gain with improved compatibility, and provide an overview of how a wide range of vendors is uniting to address real customer issues, discussing: Read More

Complexities of Object Storage Compatibility Q&A

72% of organizations have encountered incompatibility issues between various object storage implementations according to a poll at our recent SNIA Cloud Storage Technologies Initiative webinar, “Navigating the Complexities of Object Storage Compatibility.” If you missed the live presentation or you would like to see the answers to the other poll questions we asked the audience, you can view it on-demand at the SNIA Educational Library.

The audience was highly-engaged during the live event and asked several great questions. Here are answers to them all.

Q. Do you see the need for fast object storage for AI kind of workloads?

A. Yes, the demand for fast object storage in AI workloads is growing. Initially, object storage was mainly used for backup or archival purposes. However, its evolution into Data Lakes and the introduction of features like the S3 SELECT API have made it more suitable for data analytics. The launch of Amazon’s S3 Express, a faster yet more expensive tier, is a clear indication of this trend. Other vendors are following suit, suggesting a shift towards object storage as a primary data storage platform for specific workloads.

Q. As Object Storage becomes more prevalent in the primary storage space, could you talk about data protection, especially functionalities like synchronous replication and multi-site deployments – or is your view that this is not needed for object storage deployments? Read More

What the “T” Means in SNIA Cloud Storage Technologies

The SNIA Cloud Storage Initiative (CSI) has had a rebrand; we’ve added a T for Technologies into our name, and we’re now officially the Cloud Storage Technologies Initiative (CSTI).

That doesn’t seem like a significant change, but there’s a good reason. Our old name reflected the push to getting acceptance of cloud storage, and that specific cloud storage debate has been won, and big time. One relatively small cloud service provider is currently storing 400PB of clients’ data. Twitter alone consumes 300PB of data on Google’s cloud offering. Facebook, Amazon, AliBaba, Tencent – all have huge data storage numbers.

Enterprises of every size are storing data in the cloud. That’s why we added the word “technologies.” The expanded charter and new name reflect the need to support the evolving cloud business models and architectures such as OpenStack, software defined storage, Kubernetes and object storage. It includes data services, orchestration and management, understanding hyperscale requirements and the role standards play.

So what do we do? The CSTI is an active group that publishes articles and white papers, speaks at industry conferences and presents at highly-rated webcasts that have been viewed by thousands. You can learn more about the CSTI and check out the Infographic for highlights on cloud storage trends and CSTI activities.

If you’re interested in cloud storage technologies, I encourage you to consider joining our group. We have multiple membership options for established vendors, startups, educational institutions, even individuals. Learn more about CSTI membership here.

Expert Answers to Cloud Object Storage and Gateways Questions

In our most recent SNIA Cloud webcast, “Cloud Object Storage and the Use of Gateways,” we discussed market trends toward the adoption of object storage and the use of gateways to execute on a cloud strategy.  If you missed the live event, it’s now available on-demand together with the webcast slides. There were many good questions at the live event and our expert, Dan Albright, has graciously answered them in this blog.

Q. Can object storage be accessed by tools for use with big data?

A. Yes. Technically, access to big data is in real-time with HDFS connectors like S3, but it is  conditional on latency and if it is based on local hard drives, it should not be used as the primary storage as it would run very slowly. The guidance is to use hard drive based object storage either as an online archive or a backup target for HDFS.

Q. Will current block storage or NAS be replaced with cloud object storage + gateway?

A. Yes and no.  It’s dependent on the use case. For ILM (Information Lifecycle Management) uses, only the aged and infrequently accessed data is moved to the gateway+cloud object storage, to take advantage of a lower cost tier of storage, while the more recent and active data remains on the primary block or file storage.  For file sync and share, the small office/remote office data is moved off of the local NAS and consolidated/centralized and managed on the gateway file system. In practice, these methods will vary based on the enterprise’s requirements.

Q. Can we use cloud object storage for IoT storage that may require high IOPS?

A. High IOPS workloads are best supported by local SSD based Object, Block or NAS storage.  remote or hard drive based Object storage is better deployed with low IOPS workloads.

Q. What about software defined storage?

A. Cloud object storage may be implemented as SDS (Software Defined Storage) but may also be implemented by dedicated appliances. Most cloud Object storage services are SDS based.

Q. Can you please define NAS?

A. The SNIA Dictionary defines Network Attached Storage (NAS) as:

1. [Storage System] A term used to refer to storage devices that connect to a network and provide file access services to computer systems. These devices generally consist of an engine that implements the file services, and one or more devices, on which data is stored.

2. [Network] A class of systems that provide file services to host computers using file access protocols such as NFS or CIFS.

Q. What are the challenges with NAS gateways into object storage? Aren’t there latency issues that NAS requires that aren’t available in a typical Object store solution?

A. The key factor to consider is workload.  If the workload of applications accessing data residing on NAS experiences high frequency of reads and writes then that data is not a good candidate for remote or hard drive based object storage. However, it is commonly known that up to 80% of data residing on NAS is infrequently accessed.  It is this data that is best suited for migration to remote object storage.

Thanks for all the great questions. Please check out our library of SNIA Cloud webcasts to learn more. And follow us on Twitter @SNIACloud for announcements of future webcasts.

 

How Gateways Benefit Cloud Object Storage

The use of cloud object storage is ramping up sharply especially in the public cloud, where its simplicity can significantly reduce capital budgets and operating expenses. And while it makes good economic sense, enterprises are challenged with legacy applications that do not support standard protocols to move data to and from the cloud.

That’s why the SNIA Cloud Storage Initiative is hosting a live webcast on September 26th, “Cloud Object Storage and the Use of Gateways.”

Object storage is a secure, simple, scalable, and cost-effective means of managing the explosive growth of unstructured data enterprises generate every day. Enterprises have developed data strategies specific to the public cloud; improved data protection, long term archive, application development, DevOps, Data Science, and cognitive artificial intelligence to name a few.

However, these same organizations have legacy applications and infrastructure that are not object storage friendly, but use file protocols like NFS and SMB. Gateways enable SMB and NFS data transfers to be converted to Amazon’s S3 protocol while optimizing data with deduplication, providing QoS (quality of service), and efficiencies on the data path to the cloud.

This webcast will highlight the market trends toward the adoption of object storage and the use of gateways to execute a cloud strategy, the benefits of object storage when gateways are deployed, and the use cases that are best suited to leverage this solution.

You will learn:

  • The benefits of object storage when gateways are deployed
  • Primary use cases for using object storage and gateways in private, public or hybrid cloud
  • How gateways can help achieve the goals of your cloud strategy without
    retooling your on-premise infrastructure and applications

We plan to share some pearls of wisdom on the challenges organizations are facing with object storage in the cloud from a vendor-neutral, SNIA perspective. If you need a firm background on cloud object storage before September 26th, I encourage you to watch the SNIA Cloud on-demand webcast, “Cloud Object Storage 101.” It will provide you with a foundation to get even more out of this upcoming webcast.

I hope you will join us on September 26th. Register now to save your spot.

IP-Based Object Drives Now Have a Management Standard

The growing popularity of object-based storage has resulted in the development of Ethernet-connected storage devices, also referred to as IP-Based Drives, that support object interfaces, and in some cases the ability to run applications on the drives themselves. These scale-out storage nodes consist of relatively inexpensive drive-sized enclosures with IP network connectivity, CPU, memory and storage.

While inexpensive to deploy, these solutions require more management than a traditional drive. In order to simplify management of these drives, SNIA has developed and approved the release of the IP-Based Drive Management Specification. On April 20th, the SNIA Cloud Storage Initiative is hosting a live webcast, “IP-Based Object Drives Now Have a Management Standard.” It will be a unique opportunity to learn about this specification from the authors who wrote it. In this webcast, we’ll discuss:

  • Major components of the IP-Based Drive Management Standard
  • How the standard leverages the DMTF Redfish management standard to manage IP-Based Drives
  • The standard management interface for drives that are part of JBOD (Just A Bunch Of Disks) or JBOF (Just A Bunch Of Flash) enclosures

This standard allows drive management to scale to data centers and beyond, enabling high degrees of automation and software only management of data centers. Reserve your spot today to learn more and ask questions to the folks behind the spec. I hope to see you on April 20th.

 

 

Cloud Object Storage – You’ve Got Questions, We’ve Got Answers

The SNIA Cloud Storage Initiative hosted a live Webcast “Cloud Object Storage 101.” Like any “101” type course, there were a lot of good questions. Here they all are – with our answers. If you have additional questions, please let us know by commenting on this blog.

Q. How do you envision the new role of tape (LTO) in this unstructured data growth?

A. Exactly the same way that tape has always played a part; it’s the storage medium that requires no power to store cold data and is cheap per bit. Although it has a limited shelf life, and although we believe that flash will eventually replace it, it still has a secure & growing foreseeable future.

Q. What are your thoughts on whether object storage can exist outside the bounds of supporting file systems? Block devices directly storing objects using the key as reference and removing the intervening file system? A hierarchy of objects instead of files?

A. All of these things. Objects can be objects identified by an ID in a flat non-hierarchical structure; or we can impose a hierarchy by key- to objectID translation; or indeed, an object may contain complete file systems or be treated like a block device. There are really no restrictions on how we can build meta data that describes all these things over the bytes of storage that makes up an object.

Q. Can you run write insensitive low latency apps on object storage, ex: virtual machines?

A. Yes. Object storage can be made up of the same stuff as other high performance storage systems; for instance, flash connect via high bandwidth and low latency networks. Or they could even be object stores built over PCIe and NVDIMM.

Q. Is erasure coding (EC) expensive in terms of networking and resources utilization (especially in case of rebuild)?

A. No, that’s one of the advantages of EC. Rebuilds take place by reading data from many disks and writing it to many disks; in traditional RAID rebuilds, the focus is normally on the one disk that’s being rebuilt.

Q. Is there any overhead for small files or object use cases? Do you have a recommended size?

A. Each system will have its own advantages and disadvantages for objects of specific sizes. In general, object stores are designed to store billions of objects, so the number of objects is usually not an issue.

Q. Can you comment on Internet bandwidth limitations on geographically dispersed erasure coded data?

A. Smart caching can make a big difference, but at the end of the day, a geographically EC dispersed object store won’t be faster than a local store. You can’t beat the speed of light.

Q. The suppliers all claim easy exit strategies from their systems. If we were to use one of the on-premise solutions such as ECS or Cleversafe, and then down the road decide to move off-premise, is the migration/egress typically as easy as claimed?

A. In general, any proprietary interface might lock you in. The SNIA’s CDMI is vendor neutral, and supported by a number of vendors. Amazon’s S3 is a popular and common interface. Ultimately, vendors want your data on their systems – and that means making it easy to get the data from a competing vendor’s system; lock-in is not what vendors want. Talk to your vendor and ask for other users’ experiences to get confirmation of their claims.

Q. Based on factual information, where are you seeing the most common use cases for Object Storage?

A. There are many, and each vendor of cloud storage has particular markets. Backup is a common case, as are systems in the healthcare space that treat data such as scans and X-rays as objects.

Q. NAS filers only scale up not out. They are hard to manage at scale. Why use them anymore?

A. There are many NAS systems that scale out as well as up. NFSv4 support high degrees of scale out and there are file systems like Gluster that provide very large-scale solutions indeed, into the multi-petabyte range.

Q. Are there any specific uses cases to avoid when considering object storage?

A. Yes. Many legacy applications will not generate any savings or gains if moved to object storage.

Q. Would you agree with industry statements that 80% of all data written today will NEVER be accessed again; and that we just don’t know WHICH 20% will be read again?

A. Yes to the first part, and no to the second. Knowing which 80% is cold is the trick. The industry is developing smart ways of analyzing data to help with the issue of ensuring cached data is hot data, and that cold data is placed correctly first time around.

Q. Is there also the possibility to bring “compliance” in the object storage? (thinking about banking, medical and other sensible data that needs to be tracked, retention, etc…)

A. Yes. Many object storage vendors provide software to do this.

 

Need a Primer on Cloud Object Storage?

There has been a lot of buzz around cloud object storage recently. But before you get deep into all that cloud object storage can do, it’s good to take a step back and make sure you understand the basics. That’s what the SNIA Cloud Storage Initiative is planning to do on July 14th at our live Webcast “Cloud Object Storage 101.”

Many organizations, like large service providers, have already begun to leverage software-defined object storage to support new application development and DevOps projects. Meanwhile, legacy enterprise companies are in the early stages of exploring the benefits of object storage for their particular business and are searching for how they can use cloud object storage to modernize their IT strategies, store and protect data, while dramatically reducing the costs associated with legacy storage sprawl.

This Webcast will highlight the market trends towards the adoption of object storage, the definition and benefits of object storage, and the use cases that are best suited to leverage an underlying object storage infrastructure.

Join us on July 14th to learn:

  • How to accelerate the transition from legacy storage to a cloud object architecture
  • Understand the benefits of object storage
  • Primary use cases
  • How an object storage can enable your private, public or hybrid cloud strategy without compromising security, privacy or data governance

I hope you’ll register today to join my colleague, Nancy Bennis, Director of Alliances at Cleversafe (an IBM company), and me for this tutorial on cloud object storage.

 

 

On-Demand Cloud Storage Webcasts Worth Watching

As the SNIA Cloud Storage Initiative (CSI) starts our 2016 with a new set of educational programs and webcasts on topics of interest to those developing, implementing & managing cloud storage, I thought it might be a good time to remind everyone of the vendor-neutral educational work the CSI has delivered in 2015.

I’m particularly proud of the work the CSI has done through BrightTalk (a web based content delivery platform) in producing live hour-long tutorials on a wide variety of subjects.

What you may not know is that these are also recorded, and you can play them back when it’s convenient to you. I know that we have a global audience, and that when we deliver the live version it may be in the middle of your busy working day – or even in the middle of the night.

As part of SNIA, the CSI supports the development of technical storage standards; and that means some of our audience are developers. For those of you that are interested in more technical presentations we had two developer focussed BrightTalks:

Hierarchical Erasure Coding: Making Erasure Coding Usable

This talk covered two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code; it compared the two approaches on different parameters that impact the IT business and provided guidance on evaluating object storage solutions.

Expert Panel: Cloud Storage Initiatives – An SDC Preview

At the 2015 Storage Developer Conference (SDC) we presented on a variety of topics:

  • Mobile and Secure – Cloud Encrypted Objects using CDMI
  • Object Drives: A new Architectural Partitioning
  • Unistore: A Unified Storage Architecture for Cloud Computing
  • Using CDMI to Manage Swift, S3, and Ceph Object Repositories

We discussed how encrypted objects can be stored, retrieved, and transferred between clouds, how Object Drives allow storage to scale up and down by single drive increments, end-user and vendor use cases of the Cloud Data Management Interface (CDMI), and we introduced Unistore – an innovative unified storage architecture that efficiently integrates heterogeneous HDD and SCM devices for Cloud storage systems.

(As an added bonus, all these SDC 2015 presentations and others can be found here http://www.snia.org/events/storage-developer/presentations15.)

OpenStack has had a big year, and the CSI contributed to the discussion with:

OpenStack File Services for High Performance Computing

We looked at how OpenStack can consume and control file services appropriate to High Performance Compute in a cloud and multi-tenanted environment and investigated two approaches to integration. One approach is to have OpenStack manage the storage infrastructure services using Cinder, Nova and Neutron to provide HPC Filesystem as a Service. We also reviewed a second option of using Manila file services for OpenStack to control the HPC File system deployment and manage the exports etc. We discussed the development of the Lustre Manila driver and its current progress.

Hybrid clouds were also in the news. We delivered two sessions, specifically targeted at end users looking to understand the technologies:

Hybrid Clouds: Bridging Private & Public Cloud Infrastructures

Every IT consumer is using cloud in one form or another, and just as storage buyers are reluctant to select single vendor for their on-premises IT, they will choose to work with multiple public cloud providers. But this desirable “many vendor” cloud strategy introduces new problems of compatibility and integration. To provide a seamless view of these discrete storage clouds, Software Defined Storage (SDS) can be used to build a bridge between them. This presentation explored how SDS, with its ability to deploy on different hardware and supporting rich automation capabilities, can extend its reach into cloud deployments to support a hybrid data fabric that spans on-premises and public clouds.

Hybrid Clouds Part 2: Case Study on Building the Bridge between Private & Public

There are significant differences in how cloud services are delivered to various categories of users. The integration of these services with traditional IT operations remains an important success factor but also a challenge for IT managers. The key to success is to build a bridge between private and public clouds. This Webcast expanded on the previous Hybrid Clouds: Bridging Private & Public Cloud Infrastructures webcast where we looked at the choices and strategies for picking a cloud provider for public and hybrid solutions.

Lastly, we looked at some of the issues surrounding data protection and data privacy (no, they’re not the same thing at all!).

Privacy v Data Protection: The Impact Int’l Data Protection Legislation on Cloud

Governments across the globe are proposing and enacting strong data privacy and data protection regulations by mandating frameworks that include noteworthy changes like defining a data breach to include data destruction, adding the right to be forgotten, mandating the practice of breach notifications, and many other new elements. The implications of this and other proposed legislation on how the cloud can be utilized for storing data are significant. This webcast covered:

  • EU “directives” vs. “regulation”
  • General data protection regulation summary
  • How personal data has been redefined
  • Substantial financial penalties for non-compliance
  • Impact on data protection in the cloud
  • How to prepare now for impending changes

Moving Data Protection to the Cloud: Trends, Challenges and Strategies

This was a panel discussion; we talked about various new ways to perform data protection using the Cloud and many advantages of using the Cloud this way.

You can access all the CSI BrightTalk Webcasts on demand at the SNIA Website. Many of you will also be happy to learn that PDFs of the Webcast slides are also available there.

We had a good 2015, and I’m looking forward to producing more great educational material during 2016. If you have a topic you’d like to see the CSI cover this year, please comment below in this blog. We value input from all.

Thanks for your support and hopefully we’ll see you some time this year at one of our BrightTalk webcasts.

OpenStack File Services for HPC Q&A

We got some great questions during our Webcast on how OpenStack can consume and control file services appropriate for High Performance Computing (HPC) in a cloud and multi-tenanted environment. Here are answers to all of them. If you missed the Webcast, it’s now available on-demand. I encourage you to check it out and please feel free to leave any additional questions at this blog.

Q. Presumably we can use other than ZFS for the underlying filesystems in Lustre?

A. Yes, there a plenty of other filesystems that can be used other than ZFS. ZFS was given as an example of a scale up and modern filesystem that has recently been integrated, but essentially you can use most filesystem types with some having more advantages than others. What you are looking for is a filesystem that addresses the weaknesses of Lustre in terms of self-healing and scale up. So any filesystem that allows you to easily grow capacity whilst also being capable of protecting itself would be a reasonable choice. Remember, Lustre doesn’t do anything to protect the data itself. It simply places objects in a distributed fashion of the Object Storage Targets.

Q. Are there any other HPC filesystems besides Lustre?

A. Yes there are and depending on your exact requirements Lustre might not be appropriate. Gluster is an alternative that some have found slightly easier to manage and provides some additional functionality. IBM has GPFS which has been implemented as an HPC filesystem and other vendors have their scale-out filesystems too. An HPC filesystem is simply a scale-out filesystem capable of very good throughput with low latency. So under that definition a flash array could be considered a High Performance storage platform, or a scale out NAS appliance with some fast disks. It’s important to understand you’re workloads characteristics and demands before making the choice as each system has pro’s and con’s.

Q. Does “embarrassingly parallel” require bandwidth or latency from the storage system?

A. Depending on the workload characteristics it could require both. Bandwidth is usually the first demand though as data is shipped to the nodes for processing. Obviously the lower the latency the fast though jobs can start and run, but its not critical as there is limited communication between nodes that normally drives the low latency demand.

Q. Would you suggest to use Object Storage for NFV, i.e Telco applications?

A. I would for some applications. The problem with NFV is it actually captures a surprising breadth of applications so of which have very limited data storage needs. For example there is little need for storage in a packet switching environment beyond the OS and binaries needed to stand up the VM’s. In this case, object is a very good fit as it can be easily, geographically distributed ensuring the same networking function is delivered in the same manner. Other applications that require access to filtered data (so maybe billing based applications or content distribution) would also be good candidates.

Q. I missed something in the middle; please clarify, your suggestion is to use ZFS (on Linux) for the local file system on OSTs?

A. Yes, this was one example and where some work has recently been done in the Lustre community. This affords the OSS’s the capability of scaling the capacity upwards as well as offering the RAID-like protection and self-healing that comes with ZFS. Other filesystems can offer those some things so I am not suggesting it is the only choice.

Q. Why would someone want/need scale-up, when they can scale-out?

A. This can often come down to funding. A lot of HPC environments exist in academic institutions that rely on grant funding and sponsorship to expand their infrastructure. Sometimes it simply isn’t feasible to buy extra servers in order to add capacity, particularly if there is already performance headroom. It might also be the case that rack space, power and cooling could be factors in which case adding drives to cope with bigger workloads might be the only option. You do need to consider if the additional capacity would also provoke the need for better performance so we can’t just assume that adding disk is enough, but it’s certainly a good option and a requirement I have seen a number of times.