Unification of structured and unstructured data has long been a goal – and challenge for organizations. Data Fabric is an architecture, set of services and platform that standardizes and integrates data across the enterprise regardless of data location (On-Premises, Cloud, Multi-Cloud, Hybrid Cloud), enabling self-service data access to support various applications, analytics, and use cases. The data fabric leaves data where it lives and applies intelligent automation to govern, secure and bring AI to your data.
How a data fabric abstraction layer works and the benefits it delivers was the topic of our recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “Data Fabric: Connecting the Dots between Structured and Unstructured Data.” If you missed it, you can watch it on-demand and access the presentations slides at the SNIA Educational Library.
We did not have time to answer audience questions at the live session. Here are answers from our expert, Joseph Dain.
Q. What are some of the biggest challenges you have encountered when building this architecture?
A. The scale of unstructured data makes it challenging to build a catalog of this information. With structured data you may have thousands or hundreds of thousands of table assets, but in unstructured data you can have billions of files and objects that need to be tracked at massive scale.
Another challenge is masking unstructured data. With structured data you have a well-defined schema so it is easier to mask specific columns but in unstructured data you don’t have such a schema so you need to be able to understand what term needs to be masked in an unstructured document and you need to know the location of that field without having the luxury of a well-defined schema to guide you.
Q. There can be lots of data access requests from many users. How is this handled?
A. The data governance layer has two aspects that are leveraged to address this. The first aspect is data privacy rules which are automatically enforced during data access requests and are typically controlled at a group level. The second aspect is the ability to create custom workflows with personas that enable users to initiate data access requests which are sent to the appropriate approvers.
Q. What are some of the next steps with this architecture?
A. One area of interest is leveraging computational storage to do the classification and profiling of data to identify aspects such as personally identifiable information (PII). In particular, profiling vast amounts of unstructured data for PII is a compute, network, storage, and memory intense operation. By performing this profiling leveraging computational storage close to the data, we gain efficiencies in the rate at which we can process data with less resource consumption.
We continue to offer educational webinars on a wide range of cloud-related topics throughout the year. Please follow us @SNIACloud to make sure you don’t miss any.