Research > MS Thesis > Results

Chapter 4. Results

In this chapter, I will describe the results of carrying out the implementation tasks, dealing with describing the data model, defining rules for data access methods and constructing the code generator; and testing tasks, dealing with applying the implemented system to generating the code for real applications and measuring the results of using this approach.

Data Model Specification

Introduction to Basic XML Syntax

The data definition language I designed uses XML syntax and serves for describing the application’s data model, including some additional features which allow the specification of data access details. The application is described in one file, referred to as the schema, where the application’s data model is represented as a tree. To describe the structure of this tree, I will use XML tree concepts, such as element, child and parent elements, and element attributes. Figure 4 contains an example of such a tree:

Figure 4. XML Tree Syntax.

There are three elements in this tree: ElementA, ElementB, ElementC. ElementA has two children: ElementB and ElementC, both of which have ElementA as their parent element. ElementA also has two attributes: attribute1 and attrbute2, whose respective values are “abc” and “xyz.” For brevity, I will refer to these concepts as element, parent, child, and attribute.

Structure of the Schema

The structure of an application schema is displayed in Figure 5. The indents signify parent-child relationships between elements. The text in parenthesis specifies the number of the element’s occurrences. If the element has no children, it has a text value, which is a string, or is specified by a list of allowed values. The complete description of this data definition language is available in Appendix C in XML Schema format.

The tree’s root element is the application element, which contains one or more namespace elements. A namespace element represents a group of data objects: a single application may have more than one namespaces. Each namespace node contains one name element, which is the namespace’s name, and one or more Class elements. Starting with the Class element’s tree level, the schema becomes rather complicated, which is why I will provide a more detailed description of those elements.

Figure 5. Application Schema Structure.

Class element and its children. The Class element represents a data object. Each Class element contains one Name element, one Type element, one Table element, and zero or more AdditionalSproc elements. The current code generator supports classes of four types:

  • a record class represents a data object with all standard data operations allowed;
  • a readonly class is a data object which cannot be created, modified or deleted;
  • a final class is a data object which cannot be modified after creation. For example, a data object dealing with activity logs, placed orders, etc;
  • a link class is a link between two data objects.

Table element. The Table element holds the collection of the data object’s attributes (Field elements). It also contains additional information about the table, such as whether the table is external or not. External tables are declared, but no fields can be specified: these tables represent data from external data sources (a common situation in a “real world” application), so they are ignored by the code generator. Nevertheless, they must be defined, since generated data access methods might refer to them.

Field element and its children. This element represents an attribute of a data object. The element’s children specify various information, used by the code generator. They also specify additional information and criteria used by the application’s presentation layer to display the field’s data.

AdditionalField element and its children. This element represents an attribute added to a data object during data retrieval operations. For example, retrieving records from a “user” table might require constructing an additional “full name” field on the fly by combining the “first name” and “last name” fields. Like with a regular field element, the element’s children specify additional information and criteria used by the data access code and the application’s presentation layer to retrieve and display this data.

AdditionalSproc element and its children. These elements represent custom data access methods, which must be manually defined. The stored procedures are created manually; their definition in the schema signals to the code generator to generate methods exposing these custom stored procedures to the application’s business logic layer.

Consider the following basic example: we have a blog, which consists of posts; a post can be made by any user; a user can post, modify, or delete blog posts based on their permissions, which are hard-coded into the system. Besides, a user may have multiple permissions. Figure 6 contains part of the schema for such an application.

Figure 6. Sample Application Schema.

This example demonstrates how data objects and relationships between them are described in this data definition language. The full version of this basic example (although the non-essential parts are not included) is available in Appendix D.

The described data definition language is enough for making a list of the required data access methods and generating the code for the data layer – which will be discussed in the next two sections.

Defining Common Data Access Methods

Defining the rules for generating data access methods is the underlying concept in this study. Instead of specifying each method explicitly, I provide rules for generating the most commonly used methods automatically, based on the data model alone. The methods can be grouped into the following types: (1) adding a record, (2) modifying a record, (3) reading a record, (4) deleting a record, and (5) reading a collection of records based on some criteria. In all of these cases, a record may represent an entity (a data object) or a relationship (a data object link).

There were four main problems I had to solve to implement this solution:

  1. dealing with attributes which are generated or updated automatically;
  2. dealing with weak entities;
  3. specifying different sets of fields when retrieving collections of records.
  4. defining the set of data access methods.

I solved the first problem by adding the “readonly type” child element to the field element in my data definition language. Fields marked as “readonly” are treated in a special way, based on the “type” attribute of this element. For example, if a field is marked “created,” its value is generated automatically only once, if it is marked “modified,” its value is generated each time the record is modified.

I solved the second problem by adding a required parameter to the delete method, which determines whether all child entities should be deleted as well. This is a partial solution though, for only part of the child entities might be weak entities, whereas others might be strong entities, which should be preserved even after their parent is deleted. For example, deleting an order record, containing data on the customer, payment and delivery information, should automatically remove all the order-related records, such as sub-orders, containing data on each ordered item (like price and quantity). However, deleting this order record should not delete the customer referenced in it. Therefore, the solution to complex cases is to overwrite the delete method’s default implementation.

The third problem – determining what fields to include in the set when retrieving collections of records – was solved in a way similar to the first problem’s solution. Instead of designing a modeling approach like WebML’s Composition Model (Ceri et al., 2000), I added several additional child elements to the Field element in my data definition language:

  • The ExcludeFromTable element specified whether a field was to be retrieved in a collection of records or not (the default value being “false”). For example, marking this element as “true” for a large text field ensures that the field is never included when a collection of records is retrieved.
  • The IncludeWithParentTable element specified whether a field was to be included in a “join” operation, when two or more tables are joined (the default value being “false”). For example, if table A has a relationship with table B, retrieving a record from A will include all fields from B, which have this element marked as “true.” Consider retrieving a “user” record, which has a field “role Id,” which is a reference to the “role” table, with the field “name” marked as “IncludeWithParentTable.” In that case, the retrieved user record will always include the name of the user’s role.
  • The IncludeInList element specifies whether the field is included when the minimum set of fields is retrieved (the default value being “false”). For example, displaying a list of records in a drop-down box requires only the key field and the field to be displayed. A “user” record might include many fields, but marking only the “name” field as “IncludeInList” (the primary key is retrieved by default), the retrieved collection would include only the unique identifier and the name – which is exactly what is needed.

In order to design the rules for generating data access methods, I decomposed the data access functionality into the following logical groups:

  • Instance-related data object functionality. That includes retrieving or updating a record – i.e., methods accessing the fields of a single record.
  • Non-instance-related data object functionality. This group includes creating and deleting a record and retrieving records with two different sets of fields (all-inclusive – GetRecords and minimized – GetList) based on criteria related to the record’s fields.
  • Non-instance-related data object functionality for each one-to-many relationship. Same as the previous group, except the criteria is the related record – therefore, a set of these methods must be created for each relationship of this type.
  • Non-instance-related data object functionality for each many-to-many relationship. Same as above, except the relation type is different, which results in a different implementation.
  • Non-instance-related functionality for each many-to-many relationship. These methods do not belong to any data object: they provide functionality required by data object links.

The complete list of these methods is provided in Appendix E. Certainly, this is not an exhaustive listing of all possible data access operations; however, my assumption is that these methods are the most commonly used.

Code Generator Implementation

The code generator used in this study was implemented in c# on the .Net platform. It generates SQL code for the database-level part of the code, and c# or VB.Net for the application-level code.

There are two main approaches to code generation, often referred to as passive and active. The passive approach implies generating code only once (or re-generating it each time a modification is required). The active approach includes the option to automatically update previously generated and manually edited code. My code generator is a combination of both approaches.

The application-level code is generated using the passive approach: the generator produces a set of classes, which contain the default data access methods. If a new method is required, or an existing method must be altered, it is done by creating a subclass, which inherits all the functionality of the generated parent class, and may add methods or override existing methods. This has proved to be a viable solution. Consider the following example: the Foo class represents the Foo data object and contains methods getA and getB. We need to add a custom method validateFoo – which we add to Foo. In the next development iteration, we add a new relationship to our data model, which results in new data access functionality – in particular, the Foo class must now contain a getC method. Either we regenerate Foo – and lose validateFoo, or we have to find and manually edit all the occurrences of the Foo data object. Both solutions are unacceptable. Instead, we create a FooBar class, which extends Foo, and add the validateFoo method to the new class. In this case, we can regenerate Foo anytime, with all our changes to FooBar remaining in tact.

The database-level code is generated using both approaches. The stored procedures follow the a pattern similar to the application-level code: there are automatically generated procedures, which are re-generated each time the generator executes, and there are custom procedures, which are not affected by the generator. However, the tables and their structure are automatically updated.

The code generator accepts as input a file with the description of the application and processes it in the following steps:

  1. A Parser object is responsible for parsing the input and generating an parse tree. The parser is also responsible for validating the syntax and structural integrity of the schema in the input file. The objects constituting the application’s abstract syntax contain detailed validation rules for each part of the application, such as checking that field lengths do not exceed their maximum values, that the data types are database-compatible, etc.
  2. A SchemaValidator object is responsible for checking the application schema as a whole, which includes guarding against duplicate class names, duplicate primary keys, maintaining correct references in foreign key descriptors, etc.
  3. A SchemaDatabaseLoader object creates a Database object based on the schema file – which is an abstract model of the database part of the application. An SqlDatabaseLoader connects to the application’s database and does the same based on the schema retrieved from the database. The two abstract databases are compared by a DatabaseComparer object, which insures that the two schemas are compatible (for example, the data type of an existing field cannot be changed to an incompatible data type: a string cannot be converted to an integer, for that might result in loss of data). The DatabaseComparer object exposes several collections, including tables to create, tables to delete, tables to modify, constraints to create, etc., which are then accessed by objects responsible for generating the actual code.
  4. A DatabaseHelper object takes the DatabaseComparer as input, generates all the database-level code, connects to the database and updates it based on the data provided by the DatabaseComparer.
  5. An ApplicationLoader object takes the parse tree as input and creates an abstract syntax tree, which is an abstract model of the application. This object is passed on to several objects, which generate the actual code.

Measuring Results

The data access code for all three applications, described in the previous chapter, was generated successfully. Appendix F contains the schema for the Witness Identification application. The two other application schemas are not included due to their size.

Table 1 displays the scope of each application and the amount of generated code.

Table 1. Amount of Generated Code.

To estimate the effectiveness of my code generation approach, I measured what part of the application’s data access code was generated, which is demonstrated in Table 2.

Table 2. Effectiveness of Code Generation Approach.

To estimate the efficiency of my code generation approach, I measured what part of the generated data access code was used in the application, which is demonstrated in Table 3.

Table 3. Efficiency of Code Generation Approach.

 
contact me
blog
research
about