Docker – SNIA on Cloud Storage

Get Ready for Part 2 of Kubernetes in the Cloud

June 7, 2019March 29, 2020 Michelle Tidwell Leave a comment

Michelle Tidwell, SNIA Board of Directors

As enterprises move to a hybrid multi-cloud world, they are faced with many challenges. In addition to decisions surrounding what technologies to use, they are also seeing a transformation in traditional IT roles. Storage admins are asked to be more cloud savvy while new roles of cloud admins are emerging to handle the complexities of deploying simple and efficient clouds. Meanwhile, both these roles are asked to ensure a self-service environment is architected so that application developers can get resources needed to develop cutting edge apps not in week, days or hours but in minutes.

Understanding Kubernetes in the Cloud

March 25, 2019March 29, 2020 Mike Jochimsen Leave a comment

Ever wonder why and where you would want to use Kubernetes? You’re not alone, that’s why the SNIA Cloud Storage Technologies Initiative is hosting a live webcast on May 2, 2019 “Kubernetes in the Cloud.”

Kubernetes (k8s) is an open-source system for automating the deployment, scaling, and management of containerized applications. Kubernetes promises simplified management of cloud workloads at scale, whether on-premises, hybrid, or in a public cloud infrastructure, allowing effortless movement of workloads from cloud to cloud. By some reckonings, it is being deployed at a rate several times faster than virtualization.

Containers, Docker and Storage – An Expert Q&A

December 19, 2016December 19, 2016 Alex McDonald Leave a comment

Containers continue to be a hot topic today as is evidenced by the more than 2,000 people who have already viewed our SNIA Cloud webcasts, “Intro to Containers, Container Storage and Docker“ and “Containers: Best Practices and Data Management Services.” In this blog, our experts, Keith Hudgins of Docker and Andrew Sullivan of NetApp, address questions from our most recent live event.

Q. What is the major challenge for storage in containerized environment?

A. Containers move fast. Users can spin up and spin down containers extremely quickly. The biggest challenge in production-bound container environments is simply keeping up with the movement of data.

Docker Engine does not delete base container images when the container is shut down. Likewise, Registry assumes you’ve got unlimited storage on hand. For containers that push frequent revisions (as would be the case in a continuous delivery environment), that leads to a lot of orphaned container images that can fill up all available storage if left unchecked.

There are some community-led scripts that will help to keep things in control. That’s the beauty of community-led technology.

Q. What about the speed of retrieving the data from storage?

A. That’s where being a solid storage architect comes in. Every storage system has different strengths and weaknesses, so it’s important to engineer your solution to fit your performance goals. Docker containers are running on the main kernel of the host system. IO is not constrained by abstraction, as in the case of virtual machines. Rather, it is constrained more by density – hundreds of containers on a host can push massive IOPS, so you want your pipes fat and data sources close to the host systems.

Q. Can you expand on moving Docker Volumes from On-Premise bare metal to Cloud Service Providers? Data Migration? Encryption?

A. None of these capabilities are built-in to Docker Engine. We rely on external storage systems to provide those features. Private-to-cloud replication is primarily a feature of software-based companies, like Portworx, Blockbridge, or Hedvig. Encryption and migration are both common features across other companies as well. Flocker from ClusterHQ is a service broker system that provides many bolt-on features for storage systems they support. You can also use community-supplied services like Ceph to get you there.

Q. Are you familiar with “Flocker” that apparently is able to copy persistent data to another container? Can share your thoughts?

A. Yes. ClusterHQ (makers of Flocker) provide an API broker that sits between storage engines and Docker (and other dynamic infrastructure providers, like OpenStack), and they also provide some bolt-on features like replication and encryption.

Q. Is there any sort of feature in the volume plugins that allows a persistent volume to re-connect to a container if the container is moved across multiple hosts?

A. There’s no feature in plugins to cover that specifically. The plugin API is very simple. In practice, what you would do is write your plugin to expose volumes to Docker Engine on every host that it’s possible to mount that volume. In your container specification, whether it’s a Compose file, DAB file, or what have you, specify the name of your volume. Wherever that unique name is encountered, it will be mounted and attached to the container when it’s re-launched.

If you have more questions on containers, Docker and storage, check out our first Q&A blog: Containers: No Shortage of Interest or Questions.

I also encourage you to join our Containers opt-in email list. It will be a good way to keep up with all the SNIA Cloud is doing on this important technology.

Containers: No Shortage of Interest or Questions

October 10, 2016October 9, 2016 Alex McDonald Leave a comment

Based on record-breaking registration and attendance at our recent SNIA Cloud webcast, Intro to Containers, Container Storage and Docker, It’s clear that containers is a hot topic that folks want to learn more about – especially from a vendor-neutral authority like SNIA. If you missed the live event, it’s now available on-demand together with the webcast slides.

We were bombarded with questions at the live webcast and we ran out of time before we could answer them all, so as promised, here are answers from our expert presenters, Chad Thibodeau and Keith Hudgins. Oh, and please don’t forget to register for part two of this webcast, Containers: Best Practices and Data Management Services, on December 7, 2016.

Q: Would you please highlight key challenges a company may face to move from a hypervisor to container environment?

CT: The main challenge that gets raised in moving from virtual machines to containers is around security as when deployed on bare-metal, all of the containers share the core operating system. However, there are arguments that containers can still be effectively isolated.

KH: Primarily paring down your applications to their minimum running requirements. This can be quite difficult with long-entrenched legacy applications!

Q: With a VM you allocate a finite amount of vCPU and RAM, with a high degree of confidence that those resources will be available to whatever workload is running in the VM. Is that also true of containers – does (or can) the workload get a guaranteed allocation of CPU and memory resources?

CT: Keith, I’ll let you address this one from the application microservice; from the storage side, an SLA or Quality-of-Service can be defined for a container volume if the storage provider offers this capability.

KH: By default, you don’t allocate CPU or ram availability. Most containers are small enough that it’s not a consideration. However, if you need to specify priority, we have a method to do that. Please review the docs here.

Q: Where are microservices most useful? Are there certain environments where they are more likely to be deployed & which verticals or type of solutions/apps will see more benefit?

CT: Microservices can apply to applications within most verticals; for financials it was mentioned that Goldman Sachs is planning on containerizing 90% of their existing applications to web-service such as Netflix. Some of the determining factors are whether the application(s) would benefit from what container technology provides such as rapid deployment, lightweight, portability, and the ability to scale beyond typical monolithic applications.

KH: Microservices are most useful with network-facing applications that don’t require heavy transactional control. Note that it *is* possible to build transactional microservices, but the best practices on that route hasn’t been optimized yet.

Q: What OS version / Hypervisor, support containerization, are working towards cutting the “noisy neighbor” issue?

CT: Containers are supported by both MS Windows and Linux operating systems. The specific version of Linux OS will be more dependent upon the level of capabilities included (Keith, more your area) and MS Windows Server 2016 is the first release of Windows with container (Docker) support.

KH: Docker supports running containers under Windows and Linux kernels. We don’t care whether it’s on metal or virtualized. It’s possible to set affinity groups in a production Docker installation to help manage noisy neighbor issues, but note that fundamentally Docker is NOT a multi-tenant system.

Q: What is “stateful database”? How does it differ from regular databases?

CT: Most databases are stateful such as Oracle, MySQL, Cassandra, MongoDB or Redis. The confusion may be around the Gartner quote which stated “Stateful Database Applications” in which they simply meant that databases are examples of stateful applications.

KH: Any database is by definition stateful. A “stateless” container is one that is running a process that doesn’t store persistent data to disk. This could be a caching system, web application server, load balancer, queue runner, or pretty much any component that doesn’t need to store data. Everything else is “stateful” and needs some way to shove that data into a reliable datastore.

Q: What factors should be considered when choosing between containers and virtual servers for a given project/use case?

CT: The driving factors for container deployments are: portability, minimal footprint (low overhead since no hypervisor or guest OS), rapid provisioning and de-commissioning, scalability and largely open-source based. If any (or all) of these are deemed valuable to you, then you should consider container deployment technology.

KH: That’s a very broad question! It helps to understand that a container is simply a wrapper around one process that is running on a container host. So it’d be one database service, or one web app server, for example. If you can break up your application into a bunch of these single processes and chain those processes together via networking (DB serves data through the network layer to your cache, which supplies the web app, which is behind the load balancer, etc) then it’s a great candidate for containerization.

Q: Sounds like a lightweight hypervisor?

CT: Containers have been compared to virtual machines as a “lightweight VM”; however, there are distinctions mostly around the fact that the hardware resources are not virtualized for containers, but rather the application is abstracted.

KH: That’s not a bad way to start thinking about it. However, you don’t have a second kernel underneath the hypervisor, so there’s no hardware abstraction. Also, in general you don’t want to run a full OS stack per container, just what you need for the application. That way your containers are lean and efficient.

Q: So is there a practical limit to the number of users you need to have for an app in order for containers/microservices to be preferable vs. traditional apps?

CT: Not necessarily. It is more about what you are trying to achieve with the application and the requirements you have around things like: platform agnostic, portability, ease-of-deployment, scalability, etc. But I wouldn’t put a hard number on when containers make more sense over virtual machines or bare-metal deployments for that matter.

KH: Nope! Microservices is far more about making it easier to build and maintain your applications than it is about scaling. Like anything, you can over-abstract your application design and go extra silly with it, but it’s fundamentally about a better way of managing your applications’ lifecycle than it is about how many users you can push through the pipe.

Q: So is graph and memory the same thing?

CT: Keith, I’ll let you address this one.

KH: Nope. Graph refers to our copy-on-write storage for images at runtime. Our docs can explain it way better than I can in a Q&A session. Look here for more info.

Q: Similar to the Docker Container Networking, are there any specific efforts going on around Docker Storage? For example, are you (or will you be) building any products to support features that you mentioned (such as ‘Storage vMotion’ like capabilities)?

CT: Keith, I’ll let you address this one. However, there are initiatives and activities on the storage side around providing vMotion like capabilities for the data and application state.

KH: It’s always a possibility. There’s nothing I can say right now, but stay tuned.

Q: Let me shift gear, here, where does containerization work with NFV, and how should one correlate to the ask of Telco provider(s)?

CT: Keith, I’ll let you address this one–should be right up your alley.

KH: While this webinar is fundamentally about storage technologies, Docker does have a very broad ecosystem of network partners. NFV is a very broad topic and can’t easily be covered in one quick bite, but there are definitely efforts using Docker as both an enablement technology for NFV, as well as integrating Docker’s built-in networking capabilities in an NFV scope for application delivery.

Q: It would be helpful to circle back at the end and summarize what is Open Source and what is a commercial product, I’m trying to grasp what you miss out on by staying just Open Source. I know that excludes the Universal Control Plane but I don’t yet see what UCP delivers, what its USP is.

CT: Keith, I’ll let you address this one–should be right up your alley.

KH: UCP is the only unique commercial component of Docker. It combines a web-based GUI with role-based access control (RBAC) to make it easier to control security and access to Docker components in a production environment. We do maintain a separate codebase for our commercially supported Engine and Registry, but that’s mainly done to maintain a more stable release, with critical patches backported from the upstream open source projects. Fundamentally, CS Engine and DTR are the same product as their open source upstreams, only on a slower, more stable release cycle. Click here for an overview, and links to some more detailed information on what’s involved in our commercial products:

Q: Is there demand for concurrent access, across container hosts, to persistent data? If so, what are those use case scenarios?

CT: Yes, actually if you think about a micro-service architecture, you will most likely have many containers accessing a common set of container data volumes simultaneously. This is exactly the reason for persistent storage–if the containers running the application services get migrated or moved to other physical nodes, they need to maintain access to their respective container data volumes in many cases.

KH: What a great question! That demand is small, but there. In most cases, persistent data is maintained concurrently through clustering processes (database replication, object storage, etc) but there are some edge cases for large file processing (rendering, big data needs) where there are some asks for that capability.

Q: is it possible to run windows applications on Linux container and vice versa?

CT: To my knowledge, you should be able to run Linux applications within the recently announced Windows Server 2016 supported containers (see article here), but you can’t run Windows applications within Linux containers.

KH: No. A container is essentially a process running in a named concurrency group under the kernel. Therefore, you need a Linux kernel to run Linux processes, and likewise for Windows. It will be possible to run Windows and Linux containers under the same management umbrella very soon. We’re waiting on some network features to roll out in Windows Server 2016 SP1 for that capability.

Q: is it a good idea to run legacy apps in a container? Exactly what is the relation between microservices and containers? Container-like technology used to be popular in various UNIX OSes. What is different now? is the best choice a microservice in a container & spin multiple instantiations fast?

CT: Keith, I’ll let you address this one–should be right up your alley.

KH: Container technology is still popular in several UNIX OSes. Under the hood, a Docker Linux container isn’t very different from a Solaris Zone. The difference is primarily the lifecycle tools to build and maintain your containers from both the developer and operations sides. The newer generation of container runtimes is simply much easier to use than older methods. From a Docker perspective: Docker Hub, the ease of use of the ‘Docker’ CLI command tools, and clustering capabilities in Engine are the main differences. As always, design your architecture to fit your team, user, and application needs. However, if you do want to use a microservices approach, maintaining each part of your application stack as a suite of microservices does make running them widely parallel a strong approach.

Q: Is a micro service self-contained with respect to data requirements. Can a service that depends on an external datasource be a micro service?

CT: A micro service is by definition self-contained; however, it also would typically connect to one or more container data volumes. Regarding accessing external data sources, not exactly sure what is meant here, but the micro services can be running on one physical server with the container data volumes being created and managed on a separate DAS/SAN/NAS storage platform.

KH: Yes, absolutely. An API broker for an external, legacy datastore is a good example of a microservice.

Q: How are container images qualified so that they can be trusted for automated pulls?

CT: Keith, I’ll let you address this one–should be right up your alley. I believe that there is NOT a vetting or certification process done by Docker when posting to either the public Hub or to a trusted registry. This would be the responsibility of the container image developer.

KH: In a few ways. First, containers are rarely built from scratch. They are normally built from base images released by trusted providers like Microsoft, Ubuntu, Red Hat, etc. First you should prove trust in that base image through similar methods as you would a VM image. In a Docker Datacenter install, a user with Admin rights can then bring those base images into Docker Trusted Registry (DTR) and then also do a review of internal images built on top of that base before blessing them to go into production. There are also 3rd party security scanning technologies you can use, should that be a concern.

Q: For stateless applications, can Docker help apply updates to the application without taking a downtime? For example, a container is running version n of an application and version n+1 needs to be deployed without causing a downtime to users, could one spin a new container with version n+1 of the application and deploy it?

CT: If the application is truly stateless, then it shouldn’t matter if they are torn down and restarted on another physical server/node to allow the application of a new patch or OS update on the original node. However, this would need to be correctly architected.

KH: Yes. Using Docker Engine in Swarm mode, we provide a command ‘Docker service update’ to do exactly that. Check the docs.

Q: Flocker vs. Convoy vs. others – could you talk about these interfaces and their adoption?

CT: ClusterHQ (Flocker) has developed a generic storage volume plugin that they then provided back to the Docker community to incorporate the Docker engine. I’m not very familiar with Convoy, but it appears to be a Rancher-developed storage plugin that they have made available as open source, but it is not part of the Docker release.

KH: Flocker and Convoy are brokerage-type volume drivers that have the capability to connect with several storage backends. Each has its own API to talk to and manage volumes under the hood. It’s also possible to integrate directly with Docker’s volume API. If you’re mainly interested in integrating purely with Docker Engine, a direct Docker Volume API plugin is the best approach. However, both Flocker and Convoy provide some ease-of-use features and capabilities that might make it attractive to go their routes. Volume API docs are here.

Q: Does the link in communication between different containers that run microservices incur the very load we are trying to escape from monolithic approach?

CT: Keith, I’ll let you address this one–should be right up your alley.

KH: That’s a very philosophical question! I’d argue that using modern API methods like REST over HTTP is so lightweight that the distributed approach makes more sense.

Q: Docker Swarm?

CT: Keith, I’ll let you address this one.

KH: Swarm is our clustering technology to chain together several Docker Engine hosts into one big cluster. Prior to Engine 1.12, it was a standalone product. After 1.12, we added SwarmKit into Engine to make building and maintaining swarms much easier. For more info, check out old Swarm docs and new Swarm docs.

Q: Do the applications need to be re-written/revised to take advantage of Container approach?

CT: Most legacy or monolithic applications will need to be refactored to best take advantage of a micro-service architecture.

KH: Typically, yes. Web applications are already built in a distributed way, so they’re the easiest to convert.

Q: Do microservices implement Unikernels?

CT: Keith, I’ll let you also address this. My take: Containers run microservices and leverage the entire OS (Linux kernel and all of its libraries, drivers, etc.). A unikernel is a very small and minimalistic kernel that doesn’t contain the additional bloat of the full kernel and therefore, is considered to run that much faster and leaner. Docker acquired a unikernels company and will most likely provide support for running microservices with unikernels and how they may provide a container like wrapper.

KH: Not directly. Unikernels are a new method to run arbitrary runtimes under a single kernel. Docker is currently doing some early work with microkernel technology to improve containers, but it’s not rolled into core Engine yet.

Q: Can a Docker container run on bare metal instead of a host OS directly. If yes, what benefits does this approach provide?

CT: A Docker container requires a host OS to run; however, when we refer to “bare-metal” we are referring to a “non-virtualized server”. The key benefit this provides is that you don’t waste efficiencies by eliminating the hypervisor and guest OS and it is also much more manageable as with the hypervisor and guest OS scenario, you have to manage and maintain all of the VMs that may have different guest OS’s and versions.

KH: No. A Docker container needs Docker Engine to run, so you’ll need to run Engine under a supported OS on the metal. Running Engine on a physical server means your containers will get full “on the iron” IO, since there’s no hypervisor abstraction layer between your container and the hardware it’s running on.

Q: Are packaged software companies like Oracle moving to containerization?

CT: Oracle is developing product offerings that are containerized applications. They would be best able to address your question.

KH: Here is Oracle’s GitHub repository of their official Docker containers and here is their official images in Docker Hub.

See? I told you there was no shortage of questions! If you still have one, please comment in this blog below and we’ll get back to you as soon as we can. Follow us on Twitter @SNIACloud to stay up-to-date on what SNIA Cloud is doing with containers. And don’t forget to register for part two of this webcast, Containers: Best Practices and Data Services, on December 7^th. We hope to see you there!