End user companies are consistently stressing the importance of metadata as shown in various research studies conducted by Gartner, TDWI, and other analyst firms. So you ask, what has gone wrong; why are so many ETL and data integration vendors still not fully addressing this important aspect in their products?
Let’s first look at some of the high-level metadata management requirements and see if you agree with these:
- An ETL tool should be flexible enough to handle and record changing application requirements
- An ETL tool should facilitate data governance and data quality initiatives
- An ETL tool should allow you to easily connect to and handle changing data source and target requirements
- An ETL tool should require minimum maintenance
- An ETL tool should allow business users to be included in the ETL/DI process
- An ETL tool should promote reuse of business rules and other ETL project artifacts without requiring the user to identify reuse opportunities
- An ETL tool should allow you to do design time and run time reporting and auditing
- An ETL tool should support a collaborative development
- An ETL tool should facilitate role-based development
- An ETL tool should include a metadata repository that is fully integrated into the product
I’d say that most vendors in our space will claim that they handle most of these high-level requirements, but in practice they only show varying support for 1), 2) and 3) above, and most of them fall short on many of the other requirements.
Why is that, you ask? Because many ETL tools were created in the nineties when metadata management was still in its infancy, and even some of the newer low-end open source tools developed in the last ten years weren’t built with a central notion of metadata management. This is the main reason why it’s still very hard for these vendors to deliver on the more complex requirements asked for by businesses today.
So let’s dive a little bit deeper to see what’s really going on. First, most ETL vendors, who offer data quality solutions, offer them as separate products and provide very little integrated data quality support within their core ETL engines. This is due to the fact that the ETL processes in traditional, point-to-point data mapping tools make it difficult to build generic data quality rules right into their products that could handle a large number of typical data quality rules. So, while data quality support is available by a number of tools out there, it’s an add-on capability and not part of the core product.
I guess there is very little doubt in anyone’s mind that good built-in metadata management is a must if you would like to reduce the maintenance burden of your applications. SSIS users for example have complained about the lack of metadata management support for years and know what I am talking about when I point out this issue.
As highlighted in point 4) above, who wouldn’t want their business users be actively included in the DI lifecycle? Why wouldn’t you want business users to actually define their rules directly in the product rather than in a separate Excel spreadsheet, which then gets thrown over the wall to the ETL developer, who has to implement these business rules? How do you know if what the ETL developer has implemented is right? Most ETL tools have no built in feedback mechanism which permits the business user to double check, much less test, what the developer actually coded.
Without doubt, one would rather have the domain expert be able to implement their own rules then making ETL developers responsible for this task. And again, the main reason why these types of requirements haven’t been satisfied by traditional ETL tools is because it is very difficult to offer business users the right tools when the foundation for defining these rules is tied to the complexities of the physical metadata.
This is the fundamental flaw of 99% of all ETL and data integration tools, and with the exception of expressor no other vendor has addressed this issue like we do. What’s really needed is a metadata abstraction layer ala what expressor offers to be able to turn this important requirement into practical reality.
Not only would you like to have reuse in terms of business rules, but reuse built into the right metadata foundation can go much further. It can give you reuse of dataflow components, data quality checks, domain conversion rules (is this field in Dollars or Euros?), etc. With most traditional tools you are forced to start all over again every time you build your next application. They don’t even let you reuse things you’ve already created as part of the same project. In the cases that reuse is available, it is your responsibility – not the tools – to find the reuse opportunities.
Reporting on your metadata has tremendous benefits for data lineage and analysis. Wouldn’t you want to go back in time and see what happened to any piece of data or metadata and how it has changed over time. Or to know, why a specific business rule was changed or even if it wasn’t, was it was ever used?
Everyone who has to deal with regulations and compliance should demand this kind of information, which requires a rich metadata model to start with, and your vendor’s metadata repository better be fully integrated with all the tools that you use throughout the data integration lifecycle.
More vendors these days do provide some level of collaborative team development, but many tools fall short on delivering on this promise. What’s often compromised is the level of collaboration support you would expect. Without a comprehensive and convincing approach to metadata management, you’ll likely be disappointed in what your vendor has to offer you around collaborative development.
Now the next requirement — role-based development — is another can of worms. So why would you even want such a thing? Well, if you thought that having business users and data stewards to be more active in the development of your DI application, then asking your vendor for how they support different roles in a DI project is the right thing to do. Some vendors state that they support role based development, but the roles which they support are limited to the development staff and if you are lucky, extended to the computer operations group.
Role-based development implies that you provide dedicated interfaces for specific user roles performing specific activities within a project. Don’t assume that your business users would ever be happy to work with an ETL developer-centric product! Different facilities are necessary for different tasks – although you can cut a tree down with a steak knife or slice a turkey with a chain saw the results are typically not very optimal (or pretty).
I believe that by now you get the idea that good metadata management is paramount to a good ETL development environment. So I encourage you to find out if your vendor of choice understands the importance of metadata and has a deep understanding of the requirements associated with it. Metadata has been a stepchild in the ETL world for long enough – expressor has and is promoting it to center stage.
Michael Waclawiczek, VP Marketing, expressor
Register for our free expressor Studio 3.0 beta program!






