Faster Than The Times: Data Modeling

I decided to post some articles I've been working on. They're pretty cool.

Data modeling: the words bring chills to some and visions of 14 hour days to others but with the tools available today such as UML (Unified Modeling Language) XML (eXtensible Markup Language) and CIM (Common Information Model) modeling your data becomes a matter of taking the customer requirements and matching class structures and interfaces to the data types that are necessary. Unlike the old days where a brute force model would work today’s software needs a more structured approach. With the world a few processor generations from “the gang of four” and Managed .Net as the “Center of the Windows Universe” abstracted components are the new keyword for flexible, extensible and secure code.

Properly modeling a consistent UI\Program flow is in and of itself an “evolutionary process” so the term data modeling even means different things to different people. This makes standardizing modeling methodologies even more difficult. There is also the difference between modeling an existing feature set in a new way and modeling a feature set designed from scratch. In this text the term means “developing features such that they can be accessed from multiple sources, from the Native UI to Collaborative services to testing harnesses.” The old paradigm was to gather what needed to be done and just write functions that did it and perhaps handing off UI duties to someone else. Today with feature sets and customer requirements for collaboration and interoperability growing exponentially an object oriented approach is needed to not only limit the amount of code necessary to implement feature sets and make them accessible between app domains but to make the UI easy to use and update.

.Net was designed with these issues in mind and does an excellent job of abstracting data objects and unifying Windows programs under a memory-managed platform; especially in the 2.0 version of the Framework. Of course it can’t handle every case without extension so an effort is needed so that complex objects such as Network Streams become much smaller sub-objects. Such a model might include; port, machine name\IP, permissions, headers, data streams so rather than trying to determine all of the ways you can use a Network Stream you can create XML Schema based scripts to combine the different sub-objects into platform or application specific descriptions. For example, only one feature needs to access the port and machine name while another can process the permissions. Another feature would then decode the headers and stream to determine further processing requirements. Another feature could be used for encryption of returning data streams to add an extra layer of protection. Depending upon the data, security and speed\concurrency needs any one of the accepted patterns, such as State or Strategy can be used to extend the initial program flow. .Net provides native encryption and compression algorithms for text streams and binary streams through the BinaryFormatter so custom strategies are rarely needed for those services.

Network Stream Object
Client
Server
Request

Response

Permission
Port

PC
Name/IP
Connection

Headers
DataStreams
DataStreams
DataStreams
Processing
Input Validation

In this model the 3rd party external client (EC) can then be any module on any machine in a domain or even on the Internet. The server is contacted by the client with a port number and machine name. Since Managed code enables programmers to use declarative security and Windows authentication much of the security overhead can be encapsulated in the calling thread of the client app domain. This abstraction also means that the client has to have access to the pre-compiled server code. The client can be extended to contain the interface for any 3rd party clients that need to have access to the server code. The server code is the middle tier of the abstraction and is needed by any 3rd party client. This also allows for a client\server interface between the entire Network Stream object as described above and any 3rd party tools. As long as the public portion of the request\response feature of the client remains consistent it is possible for the 3rd party client to extend to do more internal processing without requiring new feature requests.

Because of the encapsulation in the server feature permissions need to be correctly applied to the object space before the connection is even attempted. This type of abstraction also allows each property of the sub-objects to be independent of the others so you only need create one Permission object, one Validation object, and one Connection object for the application space. Since the data stream can be any type of .Net stream, this model allows the developer to use one object (class) for most data types since they can be copied to a stream with customizable headers. By creating a header template lookup, several different complex object types can be returned and decoded by the internal client response feature. The 3rd party client is then totally separated from the internal logic of the model. Only the public features in the internal client are exposed and since the data types are known by the 3rd party developer the objects can be extended for application or platform specific needs. This is especially useful for the ever-increasing amount of internet applications. By combining header templates with overloaded method types, all types of database info can be encapsulated between the server and connection spaces while also allowing for local\remote file access between app domains and physical networks.

With the .Net paradigm each of the sub-objects in the model becomes a class under the same namespace (DataAccess). Encapsulation allows that each of these classes can contain smaller objects which handle a part of the processing. This layered (n-tiered) approach means that different clients can access different parts of the model without having access to any other. That is the function of the request feature within the client (DataAccess.Client) space. It sits between the server (DataAccess.Server) and the 3rd party interface. By merely providing multiple overloads for request types, it is possible to control access to any data stream through any connection. The request feature also works in conjunction with the response feature to encode and decode as necessary while verifying thread identities for large numbers of concurrent users.

The server feature formats and forwards requests to the connection feature after validating input parameters from the request feature. This allows that the client request feature has no access to permissions, meaning that the server is isolated from any 3rd party requests. Since all of the methods in the server feature are internal, all calls to the connection feature must be routed first through the public request feature and then be approved by the server. By using a strong name object for each request, high levels of concurrency can be achieved while maintaining data integrity for each request. The server uses a queue to manage requests and responses. This queue contains request-specific information for thread coordination with the client request\response feature.

The connection feature is the final segment of code and is responsible for processing the headers in the request and retrieving the data stream from storage or creating a new entry. The data can then be encrypted for return. By defining your requests with text scripts it is then possible to have requests come from multiple sources including the Internet for easy transfer. In this feature the emphasis is placed on speed rather than security. This allows optimization of this module without affecting the security of the response feature. This feature handles any external storage interfaces such as SQL databases or XML files by simply overloading access methods based on the header processing. This model can be easily extended or adapted to handle different types of application models. This feature is the most complicated since it has to be coordinated with the design of the storage medium. In the case of databases the developer needs to work well with queries and stored procedures while a file system access application needs to handle NTFS well and some apps need to deal with both while handling transaction concurrency.

The key to this type of model is that “most” usable patterns have already been discovered and can be extended as necessary. Most of these patterns are based on the common File, View Edit, Tools, Options, Help environment (The standard Windows Menu\UI paradigm). Of course, it is never a good idea to try and write initially to a pattern, since the differences in application features and requirements mean that in one case a State pattern may be more efficient than a Strategy or Factory pattern for two apps that perform similar functions. When modeling data for consumption and display the key is to remember that any data can be described using a combination of native .Net types and that the description of the data is ALWAYS more important than the features that use it. In many cases personal or financial information is consumed and must be protected by the interface. By ensuring first and foremost that the data remains consistent throughout the process refactoring will then be useful for optimization. The feature set will then expand as testing of current features continues. This is known as an evolutionary design cycle. It means in essence that you should always keep your code simple and always design your features with testing in mind. Some people consider this method to be “designing to the interface and not the implementation.” Another way of saying this is the user doesn’t need to know the details only the data. For any object space overloading the public entry points enables different types and amounts of data to be processed by the same internal server. By keeping with the abstracted component methodology, you will avoid creating complex methods that don’t allow for high levels of granularity with your object space.

Tools such as NUnit (www.nunit.org) give developers a way to test their features individually or as a live client. Script languages based on XML schemas or UML are much more efficient because they have no code overhead. The same parser that is used for the client scripts can also be extended to include test parameters and environment settings. With tools such as NUnit you need to adhere to the format set aside which sometimes causes increases in the amount of code necessary to determine success of a given test case. It is of course possible to plan for using these types of tools through a script\parser interface but again the idea of modeling is to limit the amount of code you have to write and maintain. Component-based scripting does this and more. It enables “cut and paste” editing, ease of storage, no need to recompile to add new requests. Of course adding features for processing the data in scripts requires new code and schema elements but this type of model means that new features are separated from existing features and lessen the chance of regression failures. This abstraction also enables you to make tools based on a subset of features; such as, setting up initial environments, creating database tables, create web pages using XML\XSLT, or viewing XML documents. All that is needed is a custom client space. Below are listed the basic data objects necessary in each object space of the model. These are determined by either writing out a paragraph or two describing the necessary functionality or the data that needs to be exchanged. The .Net Library 2.0 contains advances in C# such as anonymous methods, which allow “inline” delegates; iterators, which add the “yield return” and “yield break” methods to reduce amount of code necessary for base collections; nullable types, which allows value types to assign null to the type instance; generics, which allow templated base classes for collections of any .Net object type. Look for coverage of these new features, coming soon.

Client Data Objects

HttpWebRequest
HttpWebResponse
WebRequest
WebResponse
XmlDocIn
XmlDocOut
EncryptionKey
EncryptAlgorithm
RequestQueue
MemoryStream
SecurityPrincipal
ThreadPrincipal
RequestType – complex

Server Data Objects

PortNumber
IPAddress
IOPermissions - ACL
WebPermissions - SSL
XMLParser
HeaderLookup
ValidationRegularExpressions
WebService

Connection Data Objects

HeaderBlock
AccessPermissions – ACL\Thread
EncryptionAgorithm
EncryptionKey
NetworkStream
FileStream
XMLFactory

Saturday, December 24, 2005

Data Modeling

No comments:

Faster Than The Times

About Me

Followers

Blog Archive

Links