Expert Answers to Cloud Object Storage and Gateways Questions

In our most recent SNIA Cloud webcast, “Cloud Object Storage and the Use of Gateways,” we discussed market trends toward the adoption of object storage and the use of gateways to execute on a cloud strategy.  If you missed the live event, it’s now available on-demand together with the webcast slides. There were many good questions at the live event and our expert, Dan Albright, has graciously answered them in this blog.

Q. Can object storage be accessed by tools for use with big data?

A. Yes. Technically, access to big data is in real-time with HDFS connectors like S3, but it is  conditional on latency and if it is based on local hard drives, it should not be used as the primary storage as it would run very slowly. The guidance is to use hard drive based object storage either as an online archive or a backup target for HDFS.

Q. Will current block storage or NAS be replaced with cloud object storage + gateway?

A. Yes and no.  It’s dependent on the use case. For ILM (Information Lifecycle Management) uses, only the aged and infrequently accessed data is moved to the gateway+cloud object storage, to take advantage of a lower cost tier of storage, while the more recent and active data remains on the primary block or file storage.  For file sync and share, the small office/remote office data is moved off of the local NAS and consolidated/centralized and managed on the gateway file system. In practice, these methods will vary based on the enterprise’s requirements.

Q. Can we use cloud object storage for IoT storage that may require high IOPS?

A. High IOPS workloads are best supported by local SSD based Object, Block or NAS storage.  remote or hard drive based Object storage is better deployed with low IOPS workloads.

Q. What about software defined storage?

A. Cloud object storage may be implemented as SDS (Software Defined Storage) but may also be implemented by dedicated appliances. Most cloud Object storage services are SDS based.

Q. Can you please define NAS?

A. The SNIA Dictionary defines Network Attached Storage (NAS) as:

1. [Storage System] A term used to refer to storage devices that connect to a network and provide file access services to computer systems. These devices generally consist of an engine that implements the file services, and one or more devices, on which data is stored.

2. [Network] A class of systems that provide file services to host computers using file access protocols such as NFS or CIFS.

Q. What are the challenges with NAS gateways into object storage? Aren’t there latency issues that NAS requires that aren’t available in a typical Object store solution?

A. The key factor to consider is workload.  If the workload of applications accessing data residing on NAS experiences high frequency of reads and writes then that data is not a good candidate for remote or hard drive based object storage. However, it is commonly known that up to 80% of data residing on NAS is infrequently accessed.  It is this data that is best suited for migration to remote object storage.

Thanks for all the great questions. Please check out our library of SNIA Cloud webcasts to learn more. And follow us on Twitter @SNIACloud for announcements of future webcasts.

 

How Gateways Benefit Cloud Object Storage

The use of cloud object storage is ramping up sharply especially in the public cloud, where its simplicity can significantly reduce capital budgets and operating expenses. And while it makes good economic sense, enterprises are challenged with legacy applications that do not support standard protocols to move data to and from the cloud.

That’s why the SNIA Cloud Storage Initiative is hosting a live webcast on September 26th, “Cloud Object Storage and the Use of Gateways.”

Object storage is a secure, simple, scalable, and cost-effective means of managing the explosive growth of unstructured data enterprises generate every day. Enterprises have developed data strategies specific to the public cloud; improved data protection, long term archive, application development, DevOps, Data Science, and cognitive artificial intelligence to name a few.

However, these same organizations have legacy applications and infrastructure that are not object storage friendly, but use file protocols like NFS and SMB. Gateways enable SMB and NFS data transfers to be converted to Amazon’s S3 protocol while optimizing data with deduplication, providing QoS (quality of service), and efficiencies on the data path to the cloud.

This webcast will highlight the market trends toward the adoption of object storage and the use of gateways to execute a cloud strategy, the benefits of object storage when gateways are deployed, and the use cases that are best suited to leverage this solution.

You will learn:

  • The benefits of object storage when gateways are deployed
  • Primary use cases for using object storage and gateways in private, public or hybrid cloud
  • How gateways can help achieve the goals of your cloud strategy without
    retooling your on-premise infrastructure and applications

We plan to share some pearls of wisdom on the challenges organizations are facing with object storage in the cloud from a vendor-neutral, SNIA perspective. If you need a firm background on cloud object storage before September 26th, I encourage you to watch the SNIA Cloud on-demand webcast, “Cloud Object Storage 101.” It will provide you with a foundation to get even more out of this upcoming webcast.

I hope you will join us on September 26th. Register now to save your spot.

Cloud Object Storage – You’ve Got Questions, We’ve Got Answers

The SNIA Cloud Storage Initiative hosted a live Webcast “Cloud Object Storage 101.” Like any “101” type course, there were a lot of good questions. Here they all are – with our answers. If you have additional questions, please let us know by commenting on this blog.

Q. How do you envision the new role of tape (LTO) in this unstructured data growth?

A. Exactly the same way that tape has always played a part; it’s the storage medium that requires no power to store cold data and is cheap per bit. Although it has a limited shelf life, and although we believe that flash will eventually replace it, it still has a secure & growing foreseeable future.

Q. What are your thoughts on whether object storage can exist outside the bounds of supporting file systems? Block devices directly storing objects using the key as reference and removing the intervening file system? A hierarchy of objects instead of files?

A. All of these things. Objects can be objects identified by an ID in a flat non-hierarchical structure; or we can impose a hierarchy by key- to objectID translation; or indeed, an object may contain complete file systems or be treated like a block device. There are really no restrictions on how we can build meta data that describes all these things over the bytes of storage that makes up an object.

Q. Can you run write insensitive low latency apps on object storage, ex: virtual machines?

A. Yes. Object storage can be made up of the same stuff as other high performance storage systems; for instance, flash connect via high bandwidth and low latency networks. Or they could even be object stores built over PCIe and NVDIMM.

Q. Is erasure coding (EC) expensive in terms of networking and resources utilization (especially in case of rebuild)?

A. No, that’s one of the advantages of EC. Rebuilds take place by reading data from many disks and writing it to many disks; in traditional RAID rebuilds, the focus is normally on the one disk that’s being rebuilt.

Q. Is there any overhead for small files or object use cases? Do you have a recommended size?

A. Each system will have its own advantages and disadvantages for objects of specific sizes. In general, object stores are designed to store billions of objects, so the number of objects is usually not an issue.

Q. Can you comment on Internet bandwidth limitations on geographically dispersed erasure coded data?

A. Smart caching can make a big difference, but at the end of the day, a geographically EC dispersed object store won’t be faster than a local store. You can’t beat the speed of light.

Q. The suppliers all claim easy exit strategies from their systems. If we were to use one of the on-premise solutions such as ECS or Cleversafe, and then down the road decide to move off-premise, is the migration/egress typically as easy as claimed?

A. In general, any proprietary interface might lock you in. The SNIA’s CDMI is vendor neutral, and supported by a number of vendors. Amazon’s S3 is a popular and common interface. Ultimately, vendors want your data on their systems – and that means making it easy to get the data from a competing vendor’s system; lock-in is not what vendors want. Talk to your vendor and ask for other users’ experiences to get confirmation of their claims.

Q. Based on factual information, where are you seeing the most common use cases for Object Storage?

A. There are many, and each vendor of cloud storage has particular markets. Backup is a common case, as are systems in the healthcare space that treat data such as scans and X-rays as objects.

Q. NAS filers only scale up not out. They are hard to manage at scale. Why use them anymore?

A. There are many NAS systems that scale out as well as up. NFSv4 support high degrees of scale out and there are file systems like Gluster that provide very large-scale solutions indeed, into the multi-petabyte range.

Q. Are there any specific uses cases to avoid when considering object storage?

A. Yes. Many legacy applications will not generate any savings or gains if moved to object storage.

Q. Would you agree with industry statements that 80% of all data written today will NEVER be accessed again; and that we just don’t know WHICH 20% will be read again?

A. Yes to the first part, and no to the second. Knowing which 80% is cold is the trick. The industry is developing smart ways of analyzing data to help with the issue of ensuring cached data is hot data, and that cold data is placed correctly first time around.

Q. Is there also the possibility to bring “compliance” in the object storage? (thinking about banking, medical and other sensible data that needs to be tracked, retention, etc…)

A. Yes. Many object storage vendors provide software to do this.

 

Need a Primer on Cloud Object Storage?

There has been a lot of buzz around cloud object storage recently. But before you get deep into all that cloud object storage can do, it’s good to take a step back and make sure you understand the basics. That’s what the SNIA Cloud Storage Initiative is planning to do on July 14th at our live Webcast “Cloud Object Storage 101.”

Many organizations, like large service providers, have already begun to leverage software-defined object storage to support new application development and DevOps projects. Meanwhile, legacy enterprise companies are in the early stages of exploring the benefits of object storage for their particular business and are searching for how they can use cloud object storage to modernize their IT strategies, store and protect data, while dramatically reducing the costs associated with legacy storage sprawl.

This Webcast will highlight the market trends towards the adoption of object storage, the definition and benefits of object storage, and the use cases that are best suited to leverage an underlying object storage infrastructure.

Join us on July 14th to learn:

  • How to accelerate the transition from legacy storage to a cloud object architecture
  • Understand the benefits of object storage
  • Primary use cases
  • How an object storage can enable your private, public or hybrid cloud strategy without compromising security, privacy or data governance

I hope you’ll register today to join my colleague, Nancy Bennis, Director of Alliances at Cleversafe (an IBM company), and me for this tutorial on cloud object storage.

 

 

New Webcast: Hierarchical Erasure Coding: Making Erasure Coding Usable

On May 14th the SNIA-CSI (Cloud Storage Initiative) will be hosting a live Webcast “Hierarchical Erasure Coding: Making erasure coding usable.” This technical talk, presented by Vishnu Vardhan, Sr. Manager, Object Storage, at NetApp and myself, will cover two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code. This Webcast, part of the SNIA-CSI developer’s series, will compare the two approaches on different parameters that impact the IT business and provide guidance on evaluating object storage solutions. You’ll learn:

  • Industry dynamics
  • Erasure coding vs. RAID – Which is better?
  • When is erasure coding a good fit?
  • Hierarchical Erasure Coding- The next generation
  • How hierarchical codes make growth easier
  • Key areas where hierarchical coding is better than flat erasure codes

Register now and bring your questions. Vishnu and I will look forward to answering them.

Object Storage 201 Q&A

Now available on-demand, our recent live CSI Webcast, “Object Storage 201: Understanding Architectural Trade-Offs,” was a highly-rated event that almost 250 people have seen to date. We did not have time to address all of the questions, so here are answers to them. If you think of additional questions, please feel free to comment on this blog.

Q. In terms of load balancers, would you recommend a software approach using HAProxy on Linux or a hardware approach with proprietary appliances like F5 and NetScaler?

A. This really depends on your use case. If you need HA load balancers, or load balancers that can maintain sessions to particular nodes for performance, then you probably need commercial versions. If you just need a basic load balancer, using a software approach is good enough.

Q. With billions of objects what Erasure Codes are more applicable in the long term? Reed Solomon where code words are very small resulting in many billions of code words or Fountain type codes such as LDPC where one can utilize long code words to manage billions of objects more efficiently?

A. Tracking Erase Code fragments have a higher cost than replication but the tradeoff is higher HDD utilization. Using Rateless coding lowers this overhead because each Fragment has equal value. Reed Solomon requires knowledge of fragment placement for repair.

Q. What is the impact of having HDDs of varying capacity within the object store?  Does that affect hashing algorithms in any way?

A. The smallest logical storage unit is a Volume. Because Scale-Out does not stripe volumes there is no impact. Hashing, being used for location would not understand volume size, so a separate Database is used, on a volume basis, to track open space. Hashing algorithms can be modified to suit the underlying disk. The problem is not so much whether they can be designed a priority for the underlying system, but really the rigidity they introduce by tying placement very tightly with topology. That makes failure / exception handling hard.

Q. Do you think RAID6 is sufficient protection with these types of Object Storage Systems or do we need higher parity based Erasure codes?

A. RAID6 makes sense for a Direct Attached storage solution where all drives in the RAID Set can maintain sync. Unlike filesystems (with a few exceptions) Scale-Out Object Storage systems are “Storage as a workload” systems that already have protection as part of the system. So the question is what data protection method is used on solution x as apposed to solution y. You must also think about what you are trying to do.  Are you trying to protect against a single disk failure, or are you trying to protect against a node failure, or are you trying to protect against a site failure. Disk failures – RAID is great, but not if you’re trying to do node failure or site failure. Site failure is an EC sweet spot, but hard to solve from a deployment perspective.

Q. Is it possible to brief how this hash function decides the correct data placement order among the available storage nodes?

A. Take a look at the following links: “http://en.wikipedia.org/wiki/Consistent_hashing“; https://swiftstack.com/openstack-swift/architecture/

Q. What do you consider to be a typical ratio of controller to storage nodes? Is it better to separate the two, or does it make sense to consolidate where a node is both controller and storage?

A. The flexibility of Scale-Out Object Storage makes these two components independently scalable. The systems we test all have separate controllers and storage nodes so we can test this independence. This is also very dependent on the Object Store technology you use. We know of some object stores where there is a 1GB RAM / TB of data, while there are others that use 1/10 of that.  The compute is dependent on whether you are using erasure coding, and what codes. There is no one answer.

Q. Is the data stored in the Storage depository interchangeable with other vendor’s controller units? For instance, can we load LTO tapes from vendor A’s library to Vendor B’s library and have full access to data?

A. The data stored in these systems are part of the “Storage as a workload” principle. So system metadata used to track Objects stored as a function within the Controller. I would not expect any content stored to be interchangeable with another system architecture.

Q. Would you consider the Seagate Kinetic Open Storage Platform a radical architectural shift in how object storage can be done?  Kinetic basically eliminates the storage server, POSIX and RAID or all of the “busy work” that storage servers are involved in today.

A. Ethernet drives with key value interface provides a new approach to design object storage solution. It is yet to be seen how compelling they are for TCO and infrastructure availability.

Q. Will the inherent reduction in blast radius by the move towards Ethernet-interface HDDs be a major driver of the Ethernet HDD in object stores?

A. Yes. We define Blast Radius by a compute failure that impacts access to connected hard drives. As we lower the Number of Connected Hard Drives to compute the Blast Radius is reduced. For Ethernet drives, you may need redundant Ethernet switches to minimize the blast radius.  Blast radius can be also minimized with intelligent data placements with software as well.

New Webcast: Object Storage – Understanding Architectural Trade-Offs

The Cloud Storage Initiative (CSI) is excited to announce a live Webcast as part of the upcoming BrightTalk Cloud Storage Summit on October 16thObject Storage 201: Understanding Architectural Trade-Offs. It’s a follow-up to the SNIA Ethernet Storage Forum’s Object Storage 101: Understanding the What, How and Why behind Object Storage Technologies.

Object-based storage systems are fast becoming one of the key building blocks for a cloud storage infrastructure. They address some of the shortcomings and provide an alternative to more traditional file- and block-based storage for unstructured data.

An object storage system must accommodate growth (and yes, the rumors are true – data growth is a huge and accelerating problem), be flexible in their provisioning, provide support multiple geographies and legal frameworks, and cope with the inevitable issues of resilience, performance and availability.

Register now for this Webcast. Experts from the SNIA Cloud Storage Initiative will discuss:

  • Object Storage Architectural Considerations
  • Replication and Erasure Encoding for resilience
  • Pros and Cons of Hash Tables and Key-Value Databases
  • And more…

This is a live presentation, so please bring your questions and we’ll do our very best to answer them. We hope you’ll join us on October 16th for an unbiased, deep dive into the design considerations for object storage systems.