Data access and availability have always been critical elements within an organisation’s operational capabilities. Through data management, businesses have moved towards optimising their operations. Once this was achieved, the next big challenge was to democratise data. By ensuring that dedicated data consumers were able to access information quickly and effectively, organisations became empowered to derive more value from data to stay competitive.
But while centralised data architectures - such as data lakes and data warehouses - have proved invaluable in enabling access to centralised data, they have not been as effective in efficiently distributing data. This can be attributed to the inherent inflexibility of these data architecture models; by collecting all available data in one location, a bottleneck is inevitably created. The speed at which both incoming and outgoing data is processed therefore suffers, directly impacting the quality of work of multiple stakeholders within an organisation.
The introduction of data mesh
These issues have recently been addressed by a paradigm shift in the data management field –data mesh. First conceptualised by Zhamak Dehghani, director of emerging technologies at Thoughtworks, data mesh is a decentralised data architecture that replaces centralised data storage with a diffused model. This model is reliant on individual organisational units called “domains.” Each domain is responsible for a limited amount of data tied to a single department within an organisation, while also being linked to the larger data network. Reducing data duplication and granting each domain autonomy over the optimal use of its own data resolves two of the largest issues of traditional, centralised data architecture models. Data mesh architecture holds great potential for the industry. While this approach is not without its own issues, namely data silos and duplication, these can be addressed through data virtualisation. Instead of physically moving data to new locations using batch-oriented processes such as extract, transform and load (ETL), data virtualisation provides a unified, virtual data access layer built on top of many disparate data sources. This organisation-wide layer acts as a connective interface linking a company’s various data sources into one continuous chain. Any requests for data are routed through this layer, automatically retrieved from their data source, and presented on-demand and in real time. This type of speed is enabled because the layers do not store any data within themselves. They only house the metadata that enables them to access information across the various data sources. This model also provides organisations with a single control mechanism for implementing security and data governance protocols.
Data virtualisation as a connective mesh
Through the use of data virtualisation, each individual domain can build virtual models from any given data source. Data consumers are not required to understand the access complexities of the many data sources that provide these models with their steady streams of timely information. This also leads to less duplication, which further decreases access times due to reduced reconciliation efforts.
Another advantage of these data models is that they are not restricted to any one database query language (DQL). A developer who is only fluent in one or two of these DQLs – whether it be SQL, REST, OData, GraphQL, or MDX – can still easily access the companywide data product catalogue without having to write or understand a new language. The data virtualisation layer makes all of a company’s assets accessible to whoever requires them, in a user-friendly manner.
This is best illustrated by the features of some of these data-virtualisation-enabled data domains. They require no setup and provide immediate access to data products with features such as data lineage tracking, self-documentation, change impact analysis, identity management, and single sign on (SSO). Such features dramatically accelerate the development cycle of data products, a process that is otherwise both labour and time intensive.
Targeting data autonomy
Data virtualisation also offers the vital advantage of autonomous data domains. Each domain is given the freedom to operate within a fixed set of parameters, selecting from the data sources that are best suited to the domain’s specific needs and requirements. This means that businesses can reuse their in-house data analytics systems without requiring a complete overhaul. Team members will not feel compelled to learn new skills or languages, which would negatively affect productivity, and can simply adapt existing applications to the new system. Data virtualisation also prevents internal processes from being affected by the introduction of new models, enabling operations to continue smoothly.
While data virtualisation offers many advantages, it need not replace traditional data repositories such as data warehouses and data lakes. Instead, it should be viewed as a layer of intelligence designed to enhance the functioning of any existing data source. When this layer is added to physical data repositories, it ensures that data products are accessible through the virtual layer at any point, while still being operated by the same protocols applied to the wider data mesh.
Data mesh is an exciting new advancement in the world of data management. By combining the storage capacity of centralised data infrastructures with the flexibility offered by data virtualisation, it provides businesses with a powerful new tool with which to access, browse, and utilise their available data.
Authored By- Ravi Shankar, Senior Vice President, Denodo