So there I was sitting with a trainee in a class the other day. He showed me what he had done in a competitor’s product and indicated that there was “no code”. What he had developed was a data flow with 30 or so steps. Each step did something fairly simple such as appending a new field to the output. I asked him to open a step up so we could take a look. Each step had a SQL statement, some fairly simple and some complex.
So let me digress for a second here. Is SQL not code? Sure a simple select statement might not be considered code. Often vendor specific SQL is used creating portability issues. When it can’t be done with “simple” SQL, and this is a real hoot, a stored procedure is created. My perception is that SQL is code.
Developers often know how to do things very efficiently using SQL. Developers know SQL so well that they don’t even consider it code, it’s SQL. And therein lays the problem. What about managers, data analysts, directors, C level executives and auditors? If you are a public company and therefore need to comply with the Sarbanes-Oxley Act, you have created a huge problem. You cannot have accountability without traceability. Traceability can only be accomplished with standards and best practices.
No matter which ETL / data integration tool you use you will be writing code to implement the necessary business and transformation logic. Some of this code will be generated for you automatically; some of the code will have to be written by hand. So the question is not whether SQL is the right scripting language for DI. The question is how DI implementations can be more streamlined.
To assemble a customer’s full name in SQL one might do this:
customer_full_name = first_name || “ “ || last_name
Sorry, not very apparent what is going on to a typical business user or perhaps a C level executive. How about this:
customer_full_name = concatenate(first_name, “ “, last_name)
Most technical staff will argue that SQL, or any other language, is well known and so therefore the implementation turnaround will be much better. So ask the business user why is takes 3 months to add a few columns to a data warehouse table. Maybe it is because the business user has absolutely no insight into the process?
DI implementations can be more streamlined by actually providing real insight into the process. I believe that we at expressor have taken the right approach by providing a scripting language that is flexible, powerful, platform independent and most important, easy to read by all users.
BTW, the application mentioned above only needs one operator, no SQL, and zero lines of “code” with expressor.
John Russell, chief scientist, expressor