Monday, March 26, 2012

Permit user to select columns for a report: why not use Dynamic SQL?

I certainly have read lots of negative comments about using Dynamic SQL. And I have read some of the excellent web pages on the subject, such as

http://www.sommarskog.se/dynamic_sql.html

I have learned much from those discussions.

But I still find myself faced with a problem for which Dynamic SQL seems like the only reasonable solution: letting a user select the columns to include in a report, and the sort order for the output data.

In the application of concern, there are maybe seven columns that one might want to have in a report. If the user can choose any combination of these seven columns, and the order in which they are displayed (as well as the SORT order), it would be necessary to write something like 7! (7-factorial = 5040) individual reports to provide all the possibilities that included all 7 columns, plus (7 choose 6, with concern for order) (or 7*6*5*4*3*2 = 5040) ways to choose 6 of the 7 columns (with a group-by taking care of the column not chosen), etc.

(Even if we chose to ignore the order, and let the "presentation layer" of a .NET UI take care of that, there are still (2^7 -1) = 127 different ways to choose the columns for the report, knowing that at least one column must be selected.)

And when another level of detail is added to the data, we must duplicate all the existing reports so that we can have them "with and without" the new column.

This would be hundreds or thousands of reports -- just dealing with 7 columns in the selection list, and possibly the desired order for those 7 columns.

In MS Access it seems one has a "query builder" that is essentially constructing Dynamic SQL statements based on the selections the user makes in the "query builder." In Access one can use the "SQL View" to see the SQL statement written by the "query builder."

Why does this user community seem to frown on Dynamic SQL as a way to make thousands of very similar reports (same tables JOINed together on the same columns) that merely need different columns in the SELECT and GROUP BY clauses? This would essentially be duplicating some of the operation of the "query builder" in MS Access. Is that a bad design?

What am I missing?

In the type of situation you describe, Dynamic SQL is a good option. However, I would never allow the end user to be able to type in column names, or even see the SQL code. That would open you to SQL injection.

Instead, build a form to create the SQL string. Use drop down lists and check boxes for the user to select columns and sort criteria. Build the SQL string 'behind the scene'. Do not allow the user to write anything that you put in the SQL String. If you follow such precautions, you 'may' be able to prevent the use of SQL injection by the end users, but not by anyone familiar with the code. (Upwards of 75% of economic damage comes from within...) Preventing even developers (and power users using Excel, etc.) from having the opportunity to use SQL injection would require using stored procedures.

|||

The key word is "report". For purely reporting, particularly reporting on a database that is not part of your OLTP system with live users trying to get work done, you are absolutely right. Giving users ad hoc access to write reports to your live system is just giving them a license to bring down your system. So be careful and make sure to constrain them some how (query governor, qa processes, etc.) My advice is always to provide a copy of the data (either transformed into a format for easy reporting (like OLAP), or just replicated/snapshotted) that the user can write reports to their hearts delight. If the query brings down the server, no revenue is likely lost.

But you are right, the only thing you gain by using stored procedures for reports is plan saving, and who cares? Reports may be run once a day, whereas OLTP queries are often run once a second (or more in some systems). We use stored procedures for reports only when the queries are so complex they won't fit in a single query (well, when the people writing them can't fathom how to make them into a single query :)

I definitely prefer the no dynamic SQL rule for reports that go off of the OLTP system though. The more fixed the set of queries, the more reasonable the support is, and your OLTP system is where money comes in, and that is where you get paid from. Never want to dissapoint there.

|||

Arnie,

Yes, as you mentioned, the user would select columns in some sort of drop-down list. There would be no room for SQL Injection. The User Interface in .NET would also allow the user to define the column order for the output report display, as well as the sort order for that display. The details have not yet been worked out.

Louis,

The reports I am discussing ARE for the "OLTP System." The main purpose of the OLTP system is to provide reports to users (people are producing reports more than any other sort of transaction). The current report menu tree is approx. 40 selections, each with its own stored procedure (it had been 23 reports, but we just recently added a dozen more, or so; soon we must add approx. 200 more if we don't use Dynamic SQL; each returns 2 or 3 data tables; each is maybe a 500-line stored procedure). But the users are asking for a wider variety of reports (to "slice and dice" the data different ways, to look at the same data from different perspectives, and with different levels of detail), which would cause the menu tree to expand to 10,000 entries or so, each with its own stored procedure unless we make a single stored procedure that uses Dynamic SQL.

So we could either write 5,000,000 lines of code and make a separate procedure for each report, and have the User Interface decide which of the 10,000 stored procedures to use (one per report), or we could write a few Dynamic SQL stored procedures that were each maybe 1000-1500 lines, and which could produce all 10,000 reports. (I hate to think of editing 10,000 stored procedures if we decide to rename a table, or a column in a table, or any similar maintenance activity!) (And there is also the issue of how to provide a .NET menu for the user to select from 10,000 reports, as opposed to providing merely a "select columns, display order, and sort order" for a list of 7 columns.)

These reports are run only hundreds to thousands of times per day, on a busy day; maybe dozens of times per day on a slow day. Because of the volume of the data a report may take a minute or two for the stored procedure to run and for the output to be formatted. Many users can only access a small amount of the data, so their reports run in a few seconds.

The users would not be writing any reports -- they would be merely choosing the columns they wanted in their output, which thereby defines the level of aggregation of any sums, etc., of numeric quantities. The "dynamic SQL builder" would be a stored procedure called by .NET, with the column selections and sort order as part of the input parameters.

The main advantage I see for doing this in Dynamic SQL is that it means we don't have 10,000 nearly identical stored procedures, all doing essentially the same thing -- just returning different columns, with different "group by" and "order by" clauses. Instead we have only one or two Dynamic SQL stored procedures, which build the SQL strings for the 10,000 different reports, based on user selections in the User Interface.

We get paid to provide the users with the reports they need. Different situations require different reports. How can you easily let users choose the data they want in their reports, and produce those somewhat-customized reports, without using Dynamic SQL?

|||

However, if this is a web based app, there still may be the opportunity for sQL injection.

It would be much more secure to collect the tables, columns, criteria, and sorting information into parameters, or simulated 'array' strings, and then pass those into a stored procedure and create the dynamic completely in the stored procedure.

For security purposes, I am against application embedded dynamic SQL. The DBA doens't know what's out there, and it is not easy to find and repair when business requirements force changes to the data store.

|||

>>The reports I am discussing ARE for the "OLTP System." The main purpose of the OLTP system is to provide reports to users (people are producing reports more than any other sort of transaction). <<

So are these realtime reports? Like show me exactly what is going on kinds of things? And what is the problem with returning all columns and tossing out the ones that they don't need? Are you returning that many columns with that much data?

Bottom line is that I would provide a reporting database for reporting, especially if these are not real time.

If they are real time needs, and the performance of returning 10 columns is noticably worse than 5, or 2, then dynamic SQL is not a "sin" it is just something to shy away from. I build dynamic SQL stored procedures to support some of the very same sort of things too. Just beware of the issues you can run into, is really all I think any of us are saying.

|||

Arnie,

I am not the .NET programmer, but I am sure he will not be allowing the users to enter any strings -- merely make selections from drop-down lists. So I think we will be safe from SQL Injection. We haven't yet decided on any parameter passing mechanism, or even whether we would provide these reports by a selection from a 10,000 item menu, or by having the user select columns for display, GROUP BY, and sort order.

Louis,

Yes, these are real-time reports, in the sense that a user wants to make edits to the data, and then run a report and see that the edits are reflected in the report. I suppose the user could tolerate some milliseconds of delay -- especially since it takes a while to make the selections for running a report. We are not dealing with anything like a point-of-sale system wherein the up-to-the-millisecond scanning of barcodes at the cash register might be important. Edits in the data tables are a result of numeric entries from the keyboard, so rather slow in terms of computer processing speed.

As for returning all columns and tossing out the ones they don't need, that doesn't really work since the numeric quantities must be summed according to the "GROUP BY" associated with the columns requested in the report.

The main performance issues I've noticed when returning more columns is that temporary tables used in the report procedures contain enough rows to make indexing beneficial, e.g., 200,000 rows. We will implement some way of informing the user that the number of rows returned exceed the limits of an Excel spreadsheet (approx. 65,000 rows), so the user must select a smaller portion of data for the report.

I fully agree with the notion of using Dynamic SQL only when it appears as the most appropriate solution to the problem at hand.

Thanks for your input in this discussion, from both of you.

Dan

sql

No comments:

Post a Comment