Persistence : internal workings

Persistence mechanism is the way of keeping information between user sessions in order to reuse it in a future session. Applied to the object-oriented domain, persistence solutions try to make objects persist through the sessions as if they were still in-memory objects.

In the most common programs, data persistence is done directly via SQL queries to a relational database. This is what Martin Fowler calls a Transaction Script1). This approach causes multiples troubles from code repetition to storage engine dependency, difficulty to maintain the code, etc.

On another side, Domain Model driven architectures offers good perspectives because of code re-usability and the fact that the programmer focuses on the essential and directly uses domain vocabulary.

To overpass the problems brought by direct database calls and benefit of the domain model approach, a solution is needed that will let us persist domain objects through an object-oriented API independent from the storage engine. Moreover, domain model declarations should not be dependent on this API because of the reuse principle.

At this time, the PHP world does not yet provide any persistence layer that egg on using domain driven design and that let write reusable code. Active Record based solutions are widely used by PHP programmers, but this pattern brings high coupling to the database structure, and current implementations are not designed for future refactoring, since you need to extend a basic proprietary class to declare your model.

That are some reasons why the Spiral team has chosen to develop a domain-driven-friendly persistence layer based on Data Mapper design pattern and more precisely on its Repository form.

Persistence patterns is a subject that brings many discussion in the PHP world. This is maybe caused by the fact that object approach is new in this programming language, and this one is still torn between the world of Java-like-full-object-orientation and the classical quick scripting approach.

In this context, it is obvious that Spiral persistence layer implementation is biased by the vision we have of object-oriented programming, and more specifically domain driven architectures in this case. We neither consider that our implementation is the best, nor think we have the ultimate solution to all problems, this is an implementation that is open to modification, improvement, and correction. If you are interested by the subject and you want to bring your contribution, please contact the Spiral project community, we will be happy to meet your opinion.

Persistence in Spiral : Make your own facades

For using persistence tools in Spiral, you are incited to develop your own repository for each root entity of your domain model. This work could seem to be painful and useless, but in fact, all your repositories may be based on the Spiral Object Repository API. This API aims to provide a simple interface to persist objects independently from the underlying storage engine, and, most of the time, it takes you 2 lines of code to implement a method of your repository. See the manual for documentation and examples on the way of creating custom repositories.

Well, if Object Repository is so easy to use, why not using it directly to persist my objects ?

Direct calls to the Spiral Object Repository from the service layer of your application will make your MVC controllers explicitly dependent from the Object Repository API. Even if it is not forbidden, be aware that your service controllers would not be reusable in another context, or if, in the future you would want to change the persistence layer.

That is the reason why we encourage you to declare your own repository each time you need to persist entities. Another thing is that your personalized repository will behave as a facade over the Object Repository and so factorize your code in reusable methods. Finally if you want, for obscure reasons, write direct data access, it will be easy for you to create a new implementation of your own repository that will directly send SQL queries to the database rather than using the Object Repository API.

If you wonder why it will be easy to change from one implementation to another with no repercussion on other classes, you should learn more on Inversion of Control and the Dependency Injection Container.

Spiral Object Repository API

Introduction

To simplify repositories implementation, an object-oriented persistence API has been developed : the Object Repository.

The Object Repository acts as an object-oriented database API. It provides methods to store and retrieve objects as if you were using a simple collection of objects but with evolved features that let you query objects filtered by some criteria. Have a look further to the section Querying objects for more information on this subject.

In the absolute, implementation of this interface could be a simple adapter to an existing object-oriented database. It could also map to other techniques of storage like language serialization stored in files or the use of web services…

Object/Relational Mapping

The basic implementation in Spiral is an Object Relational Mapping tool. This documentation only describes the internal working of this ORM and does not explain how to use and configure it. If you are looking for this information, read the manual.

The implementation of an ORM tool makes you face many troubles. In order to simplify the problem, we can separate features into sub-features. So we want to be able to :

  1. Add a new object to the repository
  2. Retrieve an object from its OID
  3. Update an existing object
  4. Delete an existing object
  5. Query multiple objects by criteria

Other features could be imagined or added as well.

Objects representation layers

Our ORM implementation manages 3 object representation layers :

The 3 layers of object representation : Native objects, Meta objects and Persisted objects

During the writing process, native objects are transformed to a meta representation called meta objects that are atomized version of the native objects and easier to use in a relational world. This process is done thanks to the object introspector component.

Then these meta objects are used to persist information in the relational database thanks to the storage engine component.

When reading objects, data persisted in the database are retrieved by the storage engine component and then transformed to meta objects. These meta objects can then be instantiated as in-memory objects thanks to the objects introspector again.

Adding a new object to the Object Repository

Well, a native instance is given to the Object Repository, we now have to store it in the database. It's not an easy problem to solve, since we want to be able to store native PHP instances with in-memory relations to other objects and maybe more than one type (class or interface) by inheritance.

This problem is quite equivalent to a serialization process, since a complex object has to be transformed into a list of atomic elements (like integer or strings) that are characteristic enough to for a later unserialization (create back the instance from these atomic elements).

This is the role of the Object Introspector to create a Meta Object that will contain all atomic information that represent the native instance we want to store.

Since this process is totally independent from the storage engine, we have chosen to separate it from data writing process. This separation simplifies the code of storage engine drivers even if we are conscious that one more layer often brings performance issues.

This Object Introspector can use different strategies to collect information from native instances, but we will not study these strategies here. You are encouraged to read the source code to know more on implementation details.

A Meta Object contains atomic information on the object :

  • the instantiation class of the native object
  • an array of attributes that have to be persisted (it's up to the Object Introspector to decide which ones and how to get it)

If the original instance contains relations to other objects, the Object Introspector will have to add these objects to the Object Repository. The Meta Object corresponding attributes will then only contain Object ID returned by the Object Repository for added relations.

An Object ID or OID is a unique value that makes it easy to find and identify a persistent object. The Object Repository user can use this OID to directly retrieve an object from the repository. In the case of database mapping, the OID can correspond to the primary key of the row associated to the native object.

Our added object now needs an OID. One solution could be to directly insert the object in the database and get the auto-generated ID back. This solution brings some deadlock issues and is well described by Martin Fowler2). Our solution is to ask the Storage Engine driver to generate an OID. This OID will be associated to the Meta Object in the Meta Objects Identity Map, and to the in-memory instance in the Native Objects Identity Map3).

Finally, the OID will be set as “new” through the Unit Of Work4). The actual INSERT query you are waiting for will be sent later at the end of the PHP script. This way, you can save communication bandwidth, since you only send useful queries to the database server.

All the “add” process is illustrated by the following sequence diagram.

UML sequence diagram showing the "add" process

Retrieving an object from its OID

The process of retrieving an object from its OID is very simple.

First of all, the Object Repository check in the Identity Map whether the wanted instance is already in memory. If so, the Object Repository just have to return the instance.

We have judged that it is not necessary here to read data from the database if it has been already done. This can be illustrated as the adage that a man with two watches will never know the good time. This problem is well explained by (once again) Martin Fowler 5).

If this choice causes problems, this behavior could be changed or adapted in future versions. Notably for preventing cases of multiples applications accessing the same database at the same time.

If the wanted object does not exist yet in the Identity Map, the Object Repository asks the Storage Engine driver to find the object matching the OID. Then the object is returned to the user after being added to the Identity Map of course.

Updating an existing object

If an object is added to the Object Repository and if this object already exists in the Identity Map, this object is considered as an already persisted object that need to be updated.

Note that it is not possible to update an object that have not been retrieved through the Object Repository, since there is no way to know if the object should be added or updated.

This problem which is more an implementation choice, might be discussed.

Objects that are retrieved from the database are instantiated. An AOP over-layer controls the use of this object and warns the Object Updates Observer that changes have been done on the object. This Object Updates Observer will then update the corresponding Meta Object and the Unit Of Work in order to set the object as “dirty”.

That is why, there is no need to inform the repository that changes have been done on the object, since it is already done via the Object Updates Observer.

AOP or Aspect Oriented Programming aims to separate concerns of a process into multiple aspects. Thanks to AOP it is possible to alter the executed code of an object to change its behavior. You can learn more on AOP in FIXME

As when you add an object, the Object Repository will actually send an UPDATE query to the database only when repository modifications will be committed.

Delete an existing object

When an object is deleted from the repository, it is set as “deleted” in the Unit Of Work.

Modifications will be actually done by sending a DELETE query to the database when Object Repository changes will be committed.

Query multiple objects by criteria

TODO…

Recent changes RSS feed