Earlier this month, the SNIA Cloud Storage Technologies Initiative hosted a fascinating panel discussion “Kubernetes is Everywhere: What About Cloud Native Storage?” where storage experts from SNIA and Kubernetes experts from the Cloud Native Computing Foundation (CNCF) discussed storage implications for Kubernetes. It was a lively and enlightening discussion on key considerations for container storage. In this Q&A blog, our panelists Nick Connolly, Michael St-Jean, Pete Brey and I elaborate on some of the most intriguing questions during the session.
Q. What are the additional/different challenges for Kubernetes storage at the edge – in contrast to the data center?
A. Edge means different things depending on context. It could mean enterprise or provider edge locations, which are typically characterized by smaller, compact deployments of Kubernetes. It could mean Kubernetes deployed on a single node at a site with little or no IT support, or even disconnected from the internet, on ships, oil rigs, or even in space for example. It can also mean device edge, like MicroShift running on a small form factor computer or within an ARM or FPGA card for example.
One big challenge for Kubernetes at the edge in general is to provide a lightweight deployment. Added components, like container-native storage, are required for many edge applications, but they take up resources. Therefore, the biggest challenge is to deploy the storage resources that are necessary for the workload, but at the same time, making sure your footprint is appropriate for the deployment infrastructure.
For example, there are deployments for container storage for compact edge clusters, and there is work taking place on single-node deployments. Another emerging technology is to use data mirroring, data caching, and data federation technologies to provide access between edge devices and enterprise edge deployments or deployment in the cloud or datacenter.
Q. What does Container Native Storage mean – how does that differ from a SAN?
A. Container-native storage includes Kubernetes services that allows for dynamic and static provisioning, Day 1 and Day 2 operations and management, and additional data services like security, governance, resiliency and data discovery that must be deployed in context of the Kubernetes cluster. A SAN could be connected to a cluster via a Container Storage Interface (CSI), however it would typically not have all the capabilities provided by a container-native storage solution. Some container-native storage solutions, however, can use an underlying SAN or NAS device to provide the core storage infrastructure, while at the same time, deliver the Kubernetes-aware services required by the cluster. In this way, organizations can make use of existing infrastructure, protecting their investment, while still get the Kubernetes services that are required by applications and workload running in the cluster.
Q. You mention that Kubernetes does a good job of monitoring applications and keeping them up and running, but how does it prevent split-brain action on the storage when that happens?
A. This will be a function provided by the container-native storage provider. The storage service will include some type of arbiter for data in order to prevent split-brain. For example, a monitor within the software-defined storage subsystem may maintain a cluster map and state of the environment in order to provide distributed decision-making. Monitors would typically be configured in an odd number, 3 or 5, depending on the size and the topology of the cluster, to prevent split-brain situations. Monitors are not in the data-path and do not serve IO requests to and from the clients.
Q. So do I need to go and buy a whole new infrastructure for this or can I use my existing SAN?
A. Some container-native storage solutions can use existing storage infrastructure, so typically you are able to protect your investment in existing capital infrastructure purchases while gaining the benefits of the Kubernetes data services required by the cluster and applications.
Q. How can I keep my data secure in a multi-tenanted environment?
A. There are concerns about data security that are answered by the container-native storage solution, however integration of these services should be taken into consideration with other security tools delivered for Kubernetes environments. For example, you should consider the container-native storage solution’s ability to provide encryption for data at rest, as well as data in motion. Cluster-wide encryption should be a default requirement; however, you may also want to encrypt data from one tenant (application) to another. This would require volume-level encryption, and you would want to make sure your provider has an algorithm that creates different keys on clones and snapshots. You should also consider where your encryption keys are stored. Using a storage solution that is integrated with an external key management system protects against hacks within the cluster. For additional data security, it is useful to review the solution architecture, what the underlying operating system kernel protects, and how its cryptography API is utilized by the storage software. Full integration with your Kubernetes distribution authentication process is also important. In recent years, Ransomeware attacks have also become prevalent. While some systems attempt to protect against Ransomeware attacks, the best advice is to make sure you have proper encryption on your data, and that you have a substantial Data Protection and Disaster Recovery strategy in place. Data Protection in a Kubernetes environment is slightly more complex than in a typical datacenter because the state of an application running in Kubernetes is held by the persistent storage claim. When you back up your data, you must have cluster-aware APIs in your Data Protection solution that is able to capture context with the cluster and the application with which it is associated. Some of those APIs may be available as part of your container-native storage deployment and integrated with your existing datacenter backup and recovery solution. Additional business continuity strategies, like metropolitan and regional disaster recovery clusters can also be attained. Integration with multi-cluster control plane solutions that work with your chosen Kubernetes distribution can help facilitate a broad business continuity strategy.
Q: What’s the difference between data access modes and data protocols?
A: You create a persistent volume (or PV) based on the type of storage you have. That storage will typically support one or more data protocols. For example, you might have storage set up as a NAS supporting NFS and SMB protocols. So, you have file protocols, and you might have a SAN set up to support your databases which run a block protocol, or you might have a distributed storage system with a data lake or archive that is running object protocols, or it could be running all three protocols in separate storage pools.
In Kubernetes, you’ll have access to these PVs, and when a user needs storage, they will ask for a Persistent Volume Claim (or PVC) for their project. Alternatively, some systems support an Object Bucket Claim as well. In any case, when you make that claim request, you do so based on Storage classes with different access modes, RWO (read-write once where the volume can be mounted as read-write by a single node), RWX (read-write many. This is where the volume can be mounted as read-write by many nodes.), and ROX (read only many – The volume can be mounted as read-only by many nodes.)
Different types of storage APIs are able to support those different access modes. For example, a block protocol, like EBS or Cinder, would support RWO. A filesystem like Azure File or Manilla would support RWX. NFS would support all 3 access modes.
Q. What are object bucket claims and namespace buckets?
A.Object bucket claims are analogous to PVCs mentioned above, except that they are the method for provisioning and accessing object storage within Kubernetes projects using a Storage Class. Because the interface for object storage is different than for block or file storage, there is a separate Kubernetes standard called COSI. Typically, a user wanting to mount an object storage pool would connect through an S3 RESTful protocol. Namespace buckets are used more for data federation across environments. So you could have a namespace bucket deployed with the backend data on AWS, for example, and it can be accessed and read by clients running in Kubernetes clusters elsewhere, like on Azure, in the datacenter or at the edge.
Q. Why is backup and recovery listed as a feature of container-native storage? Can’t I just use my datacenter data protection solution?
A. As we mentioned, containers are by nature ephemeral. So if you lose your application, or the cluster, the state of that application is lost. The state of your application in Kubernetes is held by the persistent storage associated with that app. So, when you backup your data, it needs to be in context of the application and the overall cluster resources so when you restore, there are APIs to recover the state of the pod. Some enterprise data protection solutions include cluster aware APIs and they can be used to extend your datacenter data protection to your Kubernetes environment. Notably, IBM Spectrum Protect Plus, Dell PowerProtect, Veritas etc. There are also Kubernetes-specific data protection solutions like Kasten by Veeam, Trilio, Bacula. You may be able to use your existing enterprise solution… just be sure to check to see if they are supporting cluster-aware Kubernetes APIs with their product.
Q. Likewise, what is different about planning disaster recovery for Kubernetes?
A. Similar to the backup/recovery discussion, since the state of the applications is held by the persistent storage layer, the failure and recovery needs to include cluster aware APIs, but beyond that, if you are trying to recover to another cluster, you’ll need a control plane that manages resources across clusters. Disaster recovery really becomes a question about your recovery point objectives, and your recovery time objectives. It could be as simple as backing up everything to tape every night and shipping those tapes to another region. Of course, your recovery point might be a full day, and your recovery time will vary depending on whether you have a live cluster to recover to, etc.
You could also have a stretch cluster, which is a cluster which has individual nodes that are physically separated across failure domains. Typically, you need to be hyper-conscious of your network capabilities because if you are going to stretch your cluster across a campus or city, for example, you could degrade performance considerably without the proper network bandwidth and latency.
Other options such as synchronous metro DR or asynchronous regional DR can be adopted, but your ability to recover, or your recovery time objective will depend a great deal on the degree of automation you can build in for the recovery. Just be aware, and do your homework, as to what control plane tools are available and how they integrate with the storage system you’ve chosen and ensure that they align to your recovery time objectives.
Q. What’s the difference between cluster-level encryption and volume-level encryption in this context?
A. For security, you’ll want to make sure that your storage solution supports encryption. Cluster-wide encryption is at the device level and protects against external breaches. As an advanced feature, some solutions provide volume-level encryption as well. This protects individual applications or tenants from others within the cluster. Encryption keys are created and can be stored within the cluster, but then those with cluster access could hack those keys, so support for integration with an external key management system is also preferable to enhance security.
Q. What about some of these governance requirements like SEC, FINRA, GDPR? How does container-native storage help?
A. This is really a question about the security factors of your storage system. GDPR has a lot of governance requirements and ensuring that you have proper security and encryption in place in case data is lost is a key priority. FINRA is more of a US financial brokerage regulation working with the Securities and Exchange Commission. Things like data immutability may be an important feature for financial organizations. Other agencies, like the US government, have encryption requirements like FIPS which certifies cryptography APIs within an operating system kernel. Some storage solutions that make use of those crypto APIs would be better suited for particular use cases. So, it’s not really a question of your storage being certified by any of these regulatory committees, but rather ensuring that your persistent storage layer integrated with Kubernetes does not break compliance of the overall solution.
Q. How is data federation used in Kubernetes?
A. Since Kubernetes offers an orchestration and management platform that can be delivered across many different infrastructures, whether on-prem, on a public or private cloud, etc., being able to access and read data from a single source from across Kubernetes clusters on top of differing infrastructures provides a huge advantage for multi and hybrid cloud deployments. There are also tools that allow you to federate SQL queries across different storage platforms, whether they are in Kubernetes or not. Extending your reach to data existing off-cluster helps build data insights through analytics engines and provides data discovery for machine learning model management.
Q. What tools differentiate data acquisition and preparation in Kubernetes?
A. Ingesting data from edge devices or IoT into Kubernetes can allow data engineers to create automated data pipelines. Using some tools within Kubernetes, like Knative, allows engineers to create triggered events spawning off applications within the system that further automates workflows. Additional tools, like bucket notifications, and Kafka streams, can help with the movement, manipulation, and enhancement of data within the workstream. A lot of organizations are using distributed application workflows to build differentiated use cases using Kubernetes.