Published 2011-03-22 00:00:00

Trying to keep up with the Ideas of March - and get a few more posts out.

DB_DataObject is the workhorse behind most of my project work. It saves a huge amount of development time, and makes code considerably simpler to understand, modify and maintain. Recently I've added a few new features to this granddaddy of ORM's, and solved a perpetural data management problem at the same time.

AutoJoins

Last month saw the commit of the autoJoin method to pear's svn server, this is a cut down version of something that's already in use in the Pman Roojs interface layer. The basic problem it solves is that when you are doing rapid development, you do not really want to be spending time writing the code that add's all the tables together into a join. 

for example a typical list people with their office details would look something like this.

$person = DB_DataObject::factory('person');
$office = DB_DataObject::factory('office');
$person->joinAdd($office,'LEFT');
$person->selectAs();
$person->selectAs($office, 'office_id_%s');
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This would rely on a links.ini file describing the table connections

[person]
office_id = office:id

With the new autoJoin() method, this code is reduced to 

$person = DB_DataObject::factory('person');
$person->autoJoin();
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This not only simplifies the code, but if we change the database design and add a company field to the person table and change the links.ini file to

[person]
office_id = office:id
company_id = company:id

Then the resulting data will include the person's company details, just by adding a single line in the links.ini.The original code in Roo.php enabled filtering and distinct queries to be built automatically - this may one day end up as some of the options as a parameter for autoJoin().


Archiving data

This link.ini schema map file has another huge benefit, which I've now realized with a quite simple piece of code to archive data from a database, so it can be removed and restored later. This was necessary on two projects, one where the database contained press clippings about a client company for a media monitoring project. In this case the owner wanted to remove all the clippings and users relating to a specific client as they where no longer using the service, and it was taking up disk space. Along with this it was obviously slowing down the queries due to the large data size.

The other problem area was our mail filtering and anti-spam product. It  had a rolling archive of all mail delivered to a client, which was automatically pruned using cron jobs. On most of our servers this was limited to about a week, however on one client we had data in the database for over a years worth of emails for 500 staff. With that amount of data, it was consuming a considerable amount of disk space and certian searches where rather slow. 

My concept to solve both issues was basically the same. using the relationship data in the links.ini file, it is possible to work out what data in the database is both 'owned' and required when exporting and restoring data.

In the mail example, each email is referenced by a few rows in the Attachment table (so they could quickly search for files). So when we export the emails we could also export and flag for deletion the related Attachments. Along with this the mail row had references to the Senders, who we would have to exist if we restored the data, however we did not need to delete them. 

The solution to all this is currently a class available in the Pman Admin module called Dump. It could quite easily be moved into a library DB_DataObject_Dump. As long as you have your DB_DataObject configuration set up correctly, it takes a table and a query, and locates all the records from related tables that can be deleted or are required for a restoration then creates a number of files.

a restore.sql - which can recreate the data which has been removed, using a large number of INSERT statements for the core data, dependant records and related records.
a deletes.sql - which removes the core data and dependant from the database.

Along with this it also supports hooks (or a dumb class based method API), which are just methods that can be implemented in the Dataobjects being dumped, that enable you to copy and restore the files relating to the database records (for example our original emails.) The second part of the script uses that information to generate the following shell scripts

a restore.sh - which copies files from the backup into the original location
a backup.sh - which copies the files from the original location to the backup location
a delete.sh - which deletes the files from the original location.

All I need to do now is write a nice UI for it all...




Add Your Comment