The frustration of working in the data industry is that so much time is spent finding, understanding, cleaning and reorganising data rather than putting it to good use. The cause comes down to a gap in the capabilities of our data processing platforms.
In software engineering we teach people that data is private to an application and should only be accessed through the application interface. However, the moment we want to do any form of analysis, we rip the data out of the application, copy it around and start using it for different projects. Very quickly the original context of the data is lost and downstream users waste time reconstructing it.
ODPi Egeria is an open source project delivering embeddable metadata management libraries and interchange technology for our data platforms that ensures metadata can flow with the data in a form that is accessible to tools from many vendors. This open metadata management is coupled with open governance APIs to enable business owners to set policies that is then pushed down into the data platforms engines and tools simplifying regulatory requirements and protection of valuable data assets.
The technology includes a comprehensive metadata type model seeded from many popular standards and enhanced with semantics and governance concepts. The underlying metamodel is a graph designed to be distributed across multiple heterogeneous metadata servers. Metadata is then accessible through replication, event notification and federated queries ensuring metadata is shared and linked to build a rich body of knowledge around the data.
In this presentation I will cover the basic mechanisms of Egeria and how its use across our data platforms and tools could revolutionise the data industry.
IBM Distinguished Engineer,