The ability to run analytics from the data center to the Edge, where the data is generated and lives creates new use cases for nearly every business. The impact of Edge computing on storage strategy was the topic at our recent SNIA Cloud Storage Technologies Initiative (CSTI) webcast, “Extending Storage to the Edge – How It Should Affect Your Storage Strategy.” If you missed the live event, it’s available on-demand. Our experts, Erin Farr, Senior Technical Staff Member, IBM Storage CTO Innovation Team and Vincent Hsu, IBM Fellow, VP & CTO for Storage received several interesting questions during the live event. As promised, here are answers to them all.
Q. What is the core principle of Edge computing technology?
A. Edge computing is an industry trend rather than a standardized architecture, though there are organizations like LF EDGE with the objective of establishing an open, interoperable framework. Edge computing is generally about moving the workloads closer to where the data is generated and creating new innovative workloads due to that proximity. Common principles often include the ability to manage Edge devices at scale, using open technologies to create portable solutions, and of ultimately doing all of this with enterprise levels of security. Reference architectures exist for guidance, though implementations can vary greatly by industry vertical.
Q. We all know connectivity is not guaranteed – how does that affect these different use cases? What are the HA implications?
A. Assuming the requisite retry logic is in place at the various layers (e.g. network, storage, platform, application) as needed, it comes down to a question of how much can each of these use cases tolerate delays until connectivity is obtained again. The cloud bursting use case would likely be impacted by connectivity delays if the workload burst to the cloud for availability reasons or because it needed time-sensitive additional resources. When bursting for performance, the impact depends on the length of the delay vs. the length of the average time savings gained when bursting. Delays in the federated learning use case might only impact how soon a model gets refreshed with updated data. The query engine use case might avoid being impacted if the data has been pre-fetched before the connectivity loss occurred. In all of these cases it is important that the storage fabric resynchronizes the data to be a single unified view (when configured to do so.)
Q. Heterogeneity of devices is a challenge in Edge computing, right?
A. It is one of the challenges of Edge computing. How the data from Edge devices is stored on an Edge server may also vary depending on how that data gets shared (e.g. MQTT, NFS, REST). Storage software that can virtualize accessing data on an Edge server across different file protocols could simplify application complexity and data management.
Q. Can we say Edge computing is an opposite of cloud computing?
A. From our perspective, Edge computing is an extension of hybrid cloud. Edge computing can also be viewed as complementary to cloud computing since some workloads are more suitable for Cloud and some are more suitable for Edge.
Q. What assumptions are you making about WAN bandwidth? Even when caching data locally the transit time for large amounts of data or large amounts of metadata could be prohibitive.
A. Each of these use cases should be assessed under the lens of your industry, business, and data volumes to understand whether any potential latency that’s part of any segment of these flows would be acceptable to you. WAN acceleration, which can be used to ensure certain workloads are prioritized for guaranteed qualities of service, could also be explored to improve or ensure transit times. Integration with Software Defined Networking solutions may also provide mechanisms to mitigate or avoid bandwidth problems.
Q. How about the situation where data resides in on-premises data center and machine learning tools are in the cloud to build the model and the goal is not to move the data (security) to cloud, but run and test model only on-premises and score and improve and finally implement?
A. The Federated Learning use case allows you to keep the data in the on-premises data center while only moving the model updates to the cloud. If you also cannot move model updates and if the ML tools are containerized and/or the on-premises site can act as a satellite location for your cloud, it may be possible to run the ML tools in your on-premises data center.