Chapter 6. Conclusion
Summary of the Study
Subject of the Study and Hypothesis
In this thesis I proposed a new code generation approach to developing data-intensive web-based applications. After examining the concept of specifying and automatically generating part of the application’s code, I concluded that the application’s data layer is the optimal candidate for abstraction, specification and automatic generation.
I hypothesized that it was possible to build a code generator which would significantly improve development of data-intensive web-based applications by generating at least 50% of the data access code based on a specification of the application’s data model. My main argument was that the application’s data model is sufficient for deriving most of the data access functionality. To test this hypothesis I built a code generator and tested it on several applications.
Methodology Overview
The methodology of this study consisted of several parts. First I studied the target application architecture to determine the most common data access requirements. My next step was to analyze the data access code to determine what parts of it can be generated, which included both – database-level code, such as the database tables and stored procedures, and application-level code, which served as a bridge between the database and the application’s business logic and presentation layers.
My final step was to design the experiment, which included setting implementation tasks, which dealt with describing the data model, defining rules for data access methods and constructing the code generator; and testing tasks, which dealt with applying the implemented system to generating the code for real applications and measuring the results of using this approach.
Findings
After designing a data definition language, coming up with a set of rules for deriving data access operations from the application’s data model, and implementing the code generator, I tested my approach on three “real world” applications, which differed in their degree of complexity.
The results of testing my approach were two-fold. On the one hand, the approach proved to be effective in the sense that 84% - 99% of the data access code was generated automatically – which supported the study’s hypothesis. On the other hand, only 20% - 35% of the generated code was actually used by the application.
The Balance Between Simplicity and Flexibility
The main thing I have learned from this experiment is that a code generation system is a trade-off between a flexible, yet complex system and a rigid, yet simple system.
I took the complex route, building a system which generated a wide array of data access methods, deriving them from the application’s data model. As a result, the generated code contained most of the required data access functionality – which proved to be a significant improvement in the development process. However, a significant part of that code (65% - 80%) remained unused, making the application’s code harder to understand and navigate. On the other hand, the issues of unused generated code is commonly accepted and is treated as necessary “boilerplate” code. And yet, the mere amount of unused code, as well as the added complexity of describing data retrieval rules in the data model, suggests looking for a better approach.
An optimal solution would offer the same concept of generating most of the data access methods (84% - 99%, according to the results of this study) without explicitly describing them, while preserving a balance between the simplicity of the generated data layer and the flexibility of the modeling approach. This balance might be achieved by a simple change to the modeling approach, described in the previous chapter: the introduction of data views.
Defining a set of data views for each data object might solve two major problems. First of all, that would provide a clean and simple way of describing how data is to be presented to the user – which is an important improvement in terms of separating the description of the structure of the application’s data from the way this data is displayed in different contexts. Secondly, a data view has the potential to solve the main issue revealed by this study: by assigning data views to a data object, we explicitly specify what types of data retrieval operations are required for each object – thus, ensuring that only the required data retrieval methods are generated. Considering that data retrieval methods are the primary source of unused code (as opposed to data access methods in general), limiting these methods to only those which are required by the application, could, potentially, solve the efficiency problem.
Possibilities for Further Research
There are multiple possibilities for future research in this area, some of which I have mentioned in the last section of Chapter 5. The main goal of future research would be to identify ways to make this model of code generation more efficient. In this respect, an interesting direction of research would be to examine existing applications and try to identify patterns in their usage of data retrieval methods. If strong patterns are identified, that may help designing a more efficient way for deriving data access methods from the application’s data model.
Other improvements to the code generation system may include generating intermediate code, as well as the use of code templates. The combination of these approaches would make the system very flexible on two levels: (1) code templates would offer the benefit of changing the target application’s architecture without modifying the code generator; (2) an intermediate code representation would provide the mechanism to generate the final code in different implementation languages and, as in the previous example – without modifying the code generator.
In conclusion, I acknowledge that, although the study’s hypothesis was supported, the experiment proved the suggested code generation solution to be not very efficient due to the considerable amount of unused code. Nevertheless, I maintain that the suggested approach has potential, which may be fully realized through finding the optimal balance between simplicity and flexibility, which may be discovered through further investigation.