Part 5: Adding Smarts to the Argument Information
Adding Smarts to the Argument Information
By Steven Feuerstein
OTN Member since 2001
Finally, it's time to dive into the details: how to tackle some of the most complex logic involved in analyzing potential overloading ambiguities
Installment 5 of Feuerstein's "Building a Code-Analysis Utility" series
In article 5 of this series, I will dive deep into the heart of OverloadCheck, and tackle some of the most complex logic involved in analyzing potential overloading ambiguities. I'll go into detail about multilevel and string-indexed collections, showing how important it is to hide the complexity of such structures behind procedures and functions.
It makes sense to do this now, because I've finished the implementation of overloadcheck.overloadings. It is, therefore, time to move on to implementation of the supporting packages. Which packages should I look at next? I don't want to tackle my lowest-level packages, such as cc_types, yet. For all I know (and expect), cc_arguments and/or cc_smartargs will have other requirements for this package. Might as well hold off. Following my path down the package hierarchy, it seems to me that cc_smartargs should be next. It was also the package that OverloadCheck relied upon the most, though not exclusively, to get its job done.
In fact, my work in OverloadCheck provides the obvious starting point for building cc_smartargs. What elements from cc_smartargs did I identify in my step-wise refinement as needing to implement overloadcheck.overloadings (see Table 1)?
To determine which program to examine first, I can break down all of them into two basic categories: (1) programs that load up argument information for a package and (2) programs that return information about those arguments. Just one program falls into the first category: load_arguments. All the other programs fall into the second. And because I can't implement the programs that return the data before I figure out how to load and store all that information, it seems clear that I should work on implementing load_arguments first.
Loading the Argument Data
Time for a bit of a reality check. To build cc_smartargs.load_arguments, I'll launch into the careful process of building a complex program from scratch, using top-down design, keeping executable sections small and excessively readable. That does not mean, however, that I should write about every one of these steps. I'll assume you have a good feel for that process and won't subject you to the same information in previous articles of this series.
What I would like to do now is quickly drill-down through the levels of complexity until I get to the point where it makes sense to (finally) design the data structures that will cache the argument information (both the raw stuff from ALL_ARGUMENTS and the processed version with all the goodies I need to do my analysis of ambiguities). When I get that sorted out, I can write the programs that make the information available to OverloadCheck. Below are several layers of refinement of cc_smartargs.load_arguments, with a minimum of explanation. I would like to believe that it more or less speaks for itself. You, my patient reader, must of course be the ultimate judge.
Here, then, is the executable section of load_arguments:
2 initialize (program_in, show_in);
4 g_all_arguments := cc_arguments.fullset (program_in);
6 IF g_all_arguments.COUNT = 0
8 cc_util.pl ('');
9 cc_util.pl ('No arguments found for "' || program_in || '".');
10 cc_util.pl ('');
13 END IF;
14 END load_arguments;
After initializing, it's time to get all the raw argument information from ALL_ARGUMENTS and DBMS_DESCRIBE, which I'll do by calling the cc_arguments.fullset function (still to be written). The function returns all this data into something—and I fully plan for that something to be a collection of records in which each row of that collection corresponds to a row in ALL_ARGUMENTS.
If I don't find any argument data for this package, there isn't much for me to do, so I display a message to the screen. Notice that cc_util makes its first appearance in lines 8-10—the cc_util.pl procedure, which means "print a line." From years of experience, I know that I do not want to make any direct calls to DBMS_OUTPUT.PUT_LINE. The utility is simply too problematic. (For an explanation of my issues with DBMS_OUTPUT.PUT_LINE, see O'Reilly & Associates' Oracle Built-in Packages.) I'll instead construct my own substitute for that program (which I'll build on top of the program).
If cc_arguments.fullset does find some argument information for me, I will call the compute_derived_information procedure to finish the job. What is this job? Though I haven't thought it all through yet, my research has shown that ALL_ARGUMENTS data is very complex and multi-leveled (levels within positions within overloadings within program names). I'm certain that I'll want to do some processing on the raw data to make it more usable and useful in my OverloadCheck analyses (I need to identify those invocations, for example). So let's take a look at what that might involve.
Computing Derived Argument Information
First of all, what information do I need to derive, and what data structures will I need to define to hold that stuff? Given the kinds of loops I have already defined in OverloadCheck, I'd better come up with:
* The list of distinct program names within a package
* For each of those program names, the distinct overloadings
* For each overloading, the type of program (procedure or function)
* For each overloading, the full set of argument information
* For each overloading, the list of "top-level" parameters (appearing in the parameter list of an overloading)
* For each overloading, permutations of all its valid invocations (determined by the number of trailing, defaulted parameters in that overloading)
* To figure out those permutations, the last non-defaulted parameter in the list of each overloading.
When I first started thinking about OverloadCheck, I figured that I would dump the contents of ALL_ARGUMENTS into a collection that mirrored the data dictionary view and then write code to massage that information. This, in fact, is exactly what I'm expecting from cc_arguments. fullset. As I began to think about the code I would need to write, though, I started to have some second thoughts. You see, there is an awful lot of special information encoded in those ALL_ARGUMENTS rows. (For details, see the "Research: All About ALL_ARGUMENTS" section of Installment 1 in this series.)
Relying on the "flat" representation of ALL_ARGUMENTS in my collection means that I would have to write a lot of complicated code to answer such questions as:
* In which rows does one program (and each overloading of that program) start and end?
* How can I tell when a procedure has no arguments? And a function?
It turns out that ALL_ARGUMENTS addresses these cases differently.
Then I remembered what I'd recently learned about collection enhancements in Oracle9i Database: you can now index by strings and you can also define collections within collections. Maybe there is a way to re-organize that ALL_ARGUMENTS data to take advantage of these features and simplify the resulting code!
There is a natural hierarchy in ALL_ARGUMENTS, which can be expressed as:
Each row in ALL_ARGUMENTS represents one parameter (or one sub-element of a parameter, if it's a composite) of one overloading of one program name within a package (or standalone procedure/function).
In other words:
where "breakout" is the individual ALL_ARGUMENTS row that breaks out a parameter into sub-elements, such as fields of a record or attributes of an object type.
That looks useful. What kind of code do I have to write to implement such a hierarchy? I'll approach this in two phases:
1. Get comfortable with the technology.
2. Design what I need for OverloadCheck.
Why the "get comfortable" step? I have found that when I start working with new features, especially more complicated ones, it really pays to play around with it; get a feel for the syntax; uncover quirks, bugs, and undocumented features; and identify some best practices that would make sense when working with the structures. When I have a comfortable working relationship with, in this case, multi-level collections, I can work with a higher level of confidence in my relatively complex, real application. I will also be much more likely to avoid novice bugs and bad design decisions.
Getting Comfortable with Multi-level Collections
I'm going to take a detour from OverloadCheck and write a utility that tests my theory about reorganizing ALL_ARGUMENTS data into a multi-level collection. You can find the final code for this detour in the multilevel.sql file. This file creates a procedure called show_all_arguments, which takes as its single parameter the name of a PL/SQL code object (it can be a package or a stand-alone program) and then displays the ALL_ARGUMENTS information in two formats:
* The "straight" dump via a single-level collection that mirrors the data dictionary view
* A formatted display based on a multi-level collection representation of the same data.
Without going into all the steps of a top-design approach, here is the executable section of my procedure:
The process_name procedure relies on DBMS_UTILITY.NAME_RESOLVE to validate and parse the name. We won't go into the details of that here. Instead, let's move on to loading those arrays. Before I can load them, I have to set up the data structures (declare a set of TABLE TYPEs). Remember that I want to emulate this hierarchy:
Let's see what kind of code I would write to create a data structure. First of all, I have found that I pretty much need to work from the bottom up when designing such structures. So what does the "breakout" collection look like? Using the following TYPE statement should do it:
TYPE breakouts_t IS TABLE OF all_arguments%ROWTYPE
-- Sequential position within a parameter
INDEX BY BINARY_INTEGER;
In other words, each row in a collection of this type contains a single row of information from ALL_ARGUMENTS. The index will simply be a sequential number corresponding to the order in which the data is found in the view.
Now I need to create a second level of collections, this time corresponding to each parameter that appears in the program header:
TYPE parameters_t IS TABLE OF breakouts_t
-- Parameter position
INDEX BY BINARY_INTEGER;
Each row in this collection contains a set of all breakouts for that parameter, in which the key or row number in the collection reflects the position in the parameter list. If the datatype of the parameter is a scalar, like a NUMBER, then there is really a one-to-one correspondence. There is just a single row, in other words, in the collection of type breakouts_t. If the datatype of the third parameter is, on the other hand, a record with four scalar fields, then the breakouts collection for row 3 of the parameters collection will contain six (surprise!) rows: one for the parameter, one for the record, and four for the fields.
Next I need to define a structure that contains all the parameter information for a specific overloading. How about this:
TYPE overloadings_t IS TABLE OF parameters_t
-- Index by overloading #
INDEX BY BINARY_INTEGER;
Each row in this collection is a collection of parameters, whose rows in turn contain collections of breakouts. Are you following this so far?
OK, just one more level. (I don't need a collection for the package itself; I concern myself only with the overloadings and arguments for a single package at a time.) Now I need a collection to hold all the overloadings for each distinct program name:
TYPE programs_t IS TABLE OF overloadings_t
-- Index by program name
INDEX BY all_arguments.object_name%TYPE;
Notice that in this case, the index into the collection is the name of the program, which let's me take advantage of the new string-based indexing feature of Oracle9i Database Release 2.
With these collection types in place, I can declare the top-level collection that holds all of my structured ALL_ARGUMENTS information. I will actually, in show_all_arguments, declare two different collections: a collection that holds the direct dump of ALL_ARGUMENTS and a second that takes the multi-level approach.
-- Dump of ALL_ARGUMENTS
-- Multi-level sorting of ALL_ARGUMENTS data
And now I can load up my arrays with data from ALL_ARGUMENTS. Let's take a look at the difference in the kind of code I need to write for these different collections. (See Listing 1.)
With the multi-level collection loaded, I can now retrieve interesting information easily. Suppose, for example, that I need to get the list of parameters for the fourth overloading of the allargs_test.upd program. Let's compare the code I'd have to write for the different ways of storing the argument information. Assume for all cases that I have declared a collection as follows:
1. Retrieve the information directly from the view. This method is relatively straightforward, but inefficient, as it requires executing queries like the following repeatedly:
BULK COLLECT INTO upd_4_parameters
WHERE owner = USER
AND package_name = 'ALLARGS_TEST'
AND object_name = 'UPD'
AND overload = 4
ORDER BY SEQUENCE, POSITION, LEVEL;
2. Scan through the "flat" collection of ALL_ARGUMENTS data, picking out just the rows I want:
FOR argindx IN l_arguments.FIRST .. l_arguments.LAST
IF l_arguments (argindx).object_name = 'UPD'
AND l_arguments (argindx).overload = 4
upd_4_parameters (NVL (upd_4_parameters.LAST, 0) + 1) :=
3. Perform a single-row assignment to retrieve the data.
upd_4_parameters := l_programs ('UPD')(4);
Well, I know which approach I would prefer!
Now, suppose that I want to display the contents of the l_programs multi-level collection. This entails what is essentially a full table scan, working through multiple levels of the structure. The dump_multilevel_array procedure does just that. Let's take a look at the structure and flow of this program; it's a good example of the use of nested local procedures to organize one's code for maximum readability. (Note: the multilevel.sql file contains the full implementation, including calls to display output at each level, which I have removed to allow us to focus more closely on the logical flow. )
When working with nested programs, it's best to read the program from the bottom up or the inside out. To assist in this process, I like to add a comment in the form -- main after the BEGIN statement of the main executable section of a program. Then I can quickly find the starting point for my code analysis.
I search for "main dump_multilevel_array" and find myself looking at lines 46-51. You see here the familiar code used to scan through a collection from FIRST to LAST with the NEXT method; you can't tell by looking at these lines that the row number is in fact a string—the name of the program. For each row in this list of programs, I simply call show_overloadings to show all the information about the overloadings for that program.
Let's scoot up to show_overloadings by searching for "main show_overloadings." In Listing 2, Lines 36-43 show virtually the same logic as lines 46-51: scan through each row of this nested collection and call show_parameters to show all the parameters for that overloading. Repeating the same process, I find lines 27-33 cruising through the parameters list and showing the breakouts for each parameter. And, finally, on lines 20-24, I display the argument data.
Listing 3 shows the output from show_all_arguments (with a temporarily modified version of noparms2 to include a third argument that's a PL/SQL record type). As you can see, the hidden hierarchy of ALL_ARGUMENTS data becomes exposed and transparently available to developers by assigning those records to the quadruply-nested collection, l_programs.
With this fundamental understanding of how I can convert flat data into highly-structured and accessible information in a multi-level collection, let's move on to the construction of the actual multi-level collection I need for OverloadCheck. It's a bit more complicated than what I used in show_all_arguments.
The Actual Multi-level Collection that OverloadCheck Uses
While the basic hierarchy reflected in the programs_t-overloadings_t-parameters_t-breakouts_t configuration is fine, reality does intrude a bit. One problem with the way I defined these collection types is that each contains only another collection as its datatype. This means that I'm completely unable to keep track of other information that might be useful at each of these levels.
Looking back over my research into ALL_ARGUMENTS and the requirements that were established by my top-down design of OverloadCheck, I can construct the following table of additional information that I need:
Entity Information I need about that entity
* Is it overloaded?
* If it is a function, what is its return clause? This information is quite different in nature from the parameter list. Let's keep it separate.
* What is the last non-defaulted parameter? I'll need to know this to determine the various valid invocations of the overloading.
* What are the characteristics (name, datatype, etc) of this top-level parameter? This information is obtainable from the breakouts_t level, but getting it means writing cumbersome code. Better to identify this information as qualitatively different from sub-element breakouts and store it separately.
* I don't need anything more, at least not at this stage of my work on OverloadCheck.
Now, as it turns out, I can determine if a program is overloaded simply by obtaining the COUNT of rows in the program's overloading collection (if there's more than one row, the program is overloaded), so I don't need to change that collection definition. In the other cases, though, I need to create an intermediate structure that can hold additional information and serve as the type for my collection. A PL/SQL record TYPE offers the perfect solution. Let's see how this changes things. As I said, the highest level in the collection hierarchy will stay the same, as in:
CREATE OR REPLACE PACKAGE cc_smartargs
-- All distinct programs in a package/program
TYPE programs_t IS TABLE OF overloadings_t
INDEX BY all_arguments.object_name%type;
However, my overloadings collection will rely on a record, so I can add some value:
TYPE overloadings_t IS TABLE OF one_overloading_rt
INDEX BY /* overloading # - 0 if not overloaded */
And what does my overloading record look like? Very simply:
-- Overloadings for a single program name
TYPE one_overloading_rt IS RECORD (
The first field of the record contains the full set of parameters, which is actually the third level in my collection hierarchy; I'll define that next. The second field contains all the information about the return clause (if this is a function), which is in itself a single parameter (top-level plus breakouts). So, I'll rely on the same record TYPE, which I'm about to define, to represent such data. The third field is the position of the last non-defaulted parameter in the program's header.
So what does an array of parameter information look like? How about this:
TYPE parameters_tt IS TABLE OF one_parameter_rt
INDEX BY /* Sequential position in parameter list */
Each row is a record containing all the information needed about a single parameter that appears in the program's header. I need a record TYPE because I want to keep track separately of that top-level information and the breakouts:
TYPE one_parameter_rt IS RECORD (
Notice that I have switched gears and am relying on data structures from the cc_arguments package (still to be built!). Why is that? I have defined that package to contain all the native information about data from ALL_ARGUMENTS. At this point in my data structures, I have gotten down to bedrock as far as cc_smartargs is concerned. Everything else that I need is basic argument data and so it's time to fall back on the lower-level package. It's worth noting, however, that I don't know what the structure of these cc_arguments TYPES are. I just figure that I'll need a record to hold all information about a single row of ALL_ARGUMENTS/DBMS_DESCRIBE data (so it likely cannot be a %ROWTYPE but must be a user-defined TYPE), and I will need a collection TYPE to hold all the breakouts for a single parameter.
Note: as you will see by glancing at the package specification for cc_smartargs, I presented the data structures above in precisely the opposite order from how they appear (and must appear) in the code. I went from top to bottom to explain the structures. But when declaring such TYPEs, you have to proceed from the bottom to the top—otherwise the compiler will not be able to resolve the references.
Adding Smarts to the Argument Information: Part 2