09 Apr 2012

Javascript Templating, AngularJS and Roo.XTemplate

 Ok, as I said earlier, a nice Easter break, meant I could get back to coding for fun. The other issue I had to deal with over the weekend was how to do the templating for the Javascript application. 

Having written a template engine for PHP, I can pretty much say that not using a template engine (which automatically escapes output) is essential for outputing HTML. It makes everything more maintainable, and reduces the risks that as a error prone human being, you are likely to open your application up to exploits.

Since most of the display logic on my applications is moving away from PHP generating HTML to PHP generating JSON, and the Javascript UI rendering this data and displaying it to the user. There has been one area I've slacked off on, that being sorting out a solution for using this JSON data from the PHP backend and rendering a elegant front end in HTML to show the end user.

Obviously this is not going to be indexed via google (Somehow JSON / Javascript interfaces are not very google friendly). But for applications that require authentication this not only helps scale the application under load, but also make it far easier to maintain.

My first effort to solve this issue was based on the original Template code in Roo (Roo.Template) however initially I decided to do the implementation in PHP. The concept was to take a group of 'HTML' template files, and use the PHP to convert those files into 'Javascript functions'. so calling the generated javascript function with 'data' as the argument, it would return the HTML to be rendered.

This basic implementation supported things like conditional inclusion, embedding Javascript, nested functions, looping and much more. It was however horribly crude and rather fragile if the template syntax was invalid. I used this for a few files relating to the ticket rendering in web.mtrack, however, I was never really happy with the solution.

Read on for my opinion / review of Anjular and the introduction to the updated Roo.XTemplate

08 Apr 2012

Roo.XComponent introduction

http://www.roojs.com/roojs1/docs/symbols/Roo.XComponent.html
With a nice long Easter holiday, I finally got a chance to hack on some of those todo items that I'd been putting off for quite a while. One of the core mini tasks was to enable mtrack information inside of my main email/accounting do everything platform. 

The Pman codebase, as I've mentioned before forms the core of pretty much all applications I work on now, mostly intranet or extranet focused web applications, that work like desktop applications. All of these applications are built up of various componenents, for example the Accouting module has Components like general ledger, managing invoices, timesheet and tracking editing and a pricing management component. 

The whole premise of the Pman project codebase on the Javascript side, is to register all the components sometime after the page is loaded, then after the user has logged in, or authentication has been checked, it will create and render all these components in sequence, appending them to the container panels.

In the case of the Accounting module, there is a default top level module called 'Pman', which the Accounting 'Tab' is added to, then all the sub components are added to that. This whole design was done a couple of years ago, and has been working very well, it's reliable and adding extra componenets, is just a mater creating a file then saying what the parent is and in what sequence it will appear.

Read on to find out how this all works...

31 Jan 2012

Roo J Solutions Limited is recruiting

Since we have been very busy already this year, I have now almost completed the process of migrating from a Sole Proprietor into a Limited Company. Roo J Solutions Limited is now a registered Hong Kong Company. We are now looking for full or part-time staff (based in Hong Kong). 

Please read the full post for details. 

14 Jan 2012

Free your data... seed webkit browser mirror button

One of the great things about the internet is the availability of cheap or free services online, so many clients are using gmail, dropbox, github etc. for their business operations. But all to often they forget that these services are often playing the oldest game in the technology industry. "Vendor Lock-in".

While the ones I mentioned are not to bad, you can cheaply and easily rescue or backup your data to another location, or move to an alternative provider. Not all of them are like that. 

We are in the middle of a migration project from Netsuite (It's a SAS Oracle based ERP system) to Xtuple, which is a open source ERP system, based around postgresql. This is a slow and painfull migration, as there is no standard for ERP data, and exporting is slow and clumsy over SOAP. Anyway, as a plesant distraction from this large migration, the same client also wanted us to look at migrating from backpack, a 37 signals product.

Backpack, unlike all the SAS systems I mentioned has deliberately made it hard, or practically impossible to migrate from their services. The primary offering of backpack is a online file storage service that you can permit clients or suppliers the ability to do share files and folders. It is only web based (unlike dropbox or box.net), and there is no desktop client that you can use to access the files other than the web interface.

When I started looking at how the company could extract the data, I tried out a few of the classic tools, like wget and httrack however the strong use of javascript, and the convoluted login system with login keys ensured that those kind of tools did not work. The other requirement was the ability to organise the files into folder, by just mirroring the site, you would just end up with thousands of folders called asset/123123/ where the number is probably the UID of the database record.

So how to rescue the data... Read on for the trick..

15 Dec 2011

Deleting the View and Controller..

This is NOT a post for people who do not use MVC, Please delete your code, and write it properly.. Anyway, as anybody who has used or written a reasonable framework in PHP knows, MVC is pretty much the golden rule for implementation. There are a dozen frameworks out their based around the principles, with different levels of complexity.

My own framework was designed around those principles, and for many years worked perfectly for those classic display a crap load of HTML pages using information from a database. The Model (DB_DataObject's), View (HTML_Template_Flexy) and Controller (classes that extend HTML_FlexyFramework_Page) delivered pages. Designing sites basically involved gluing all these pieces together. As the sites grew over time, shared code usually ended up in the Models, and each page had a controller which might render the share templates. All was well, and code was reasonably easy to maintain and extend.

Now however almost all the projects I've worked on in the last few years use the Roo Javascript library (the ExtJS fork), and are built ontop of the Pman components (originally a project management tool, that grew into a whole kit of parts).  One of the key changes in the way the code is written, is how little code is now done to get the information from the database to the end user.

Obviously the whole HTML templating is been thrown out the window, (other than the first primary HTML page), the whole user interface is built with Javascript, and generated by User interface builder tools. The interaction of the interface is handled by signals (listeners) on the Roo Javascript components. These in turn call the Back end (PHP code) and almost always retrieve JSON encoded data, and that is rendered using the UI toolkit.

When I first started moving to this development model, I tended to retain the previous idea of having multiple controllers to handle the Select/Create/Update/Delete  actions, as time went rather than have multiple controllers for each of those actions, I would use a single controller to manage a single Model entity (like Product). POST would always update/modify the model, and GET would always just view and query the data.

Eventually I realized that since all these controllers where essentially doing the same thing, a single generic controller should be able to do everything that all these single controllers where doing. And so was born the Pman_Roo class.

So Basically {index.php}/Roo/{TableName} provides generic database access, for the whole application. Most code development on the PHP side is now contained within the DataObject Models. This greatly enhances code reuse as similar code ends up closer together, Unlike before where shared code was moved from the controllers to the model when necessary, now most of the code starts off in the model. This speed project development up considerably not to mention the huge savings of not having to try and manipulate data into HTML.

How does it work.


A GET or POST request is recieved by the server either from Roo's Form/Grid/Tree or directly by Pman.Request(), a handy wrapper arround Roo.Ajax, that handles error messages nicely.

The  request {index.php}/Roo/{TableName} checks that the tablename is valid, then goes on to do the following actions depending on the params supplied. The documentation in the Pman_Roo class is the most up-to-date documentation. and details what calls are made (if available) on the relivant dataobject.

Using the class, it is now possible to handle pretty much any Database related query without implement any controller, and easily managing data permissions.

Snapshot of current documentation is in the extended view.. (latest will be in the source)
Posted by in PHP | Add / View Comments()

02 Sep 2011

Watch-out PHP 5.3.7+ is about.. and the is_a() / __autoload() mess.

Well, for the first time in a very long while I had to post to the PHP core developers list last week, unfortunately the result of which was not particulary usefull.

The key issue was that 5.3.7 accidentally broke is_a() for a reasonably large number of users. Unfortunately the fixup release 5.3.8 did not address this 'mistake', and after a rather fruitless exchange I gave up trying to persuade the group (most people on mailing list), that reverting the change was rather critical (at least pierre supported reverting it in the 5.3.* series).

Anyway, what's this all about, basically if you upgrade to any of these versions and

a) use __autoload() 
or
b) any of your code calls is_a() on a string, 

you will  very likely get strange failures..


The change in detail.


in all versions of PHP since 4.2 the is_a signature looked like this

bool is_a ( object $object , string $class_name )

As a knock on effect from fixing a bug with is_subclass_of, somebody thought it was a good idea to make the two functions signature consistant, so in 5.3.7+ the signature is now

bool is_a ( mixed $object_or_string , string $class_name )

And to make matters worse, that change to the first 'object_or_string', will also call the autoloader if the class is not found.


How is_a() has been used in the past.


On the face of this, it would not seem like a significant change, however, you have to understand the history of is_a(), and why it was introduced. In the early days of PEAR (before PHP 4.2) there was a method called PEAR::isError($mixed), which contained quite a few tests to check if the $mixed was an object, and was an instance of 'PEAR_Error'. A while after PHP 4.2 was released, this was changed to use this new wonderfull feature, and basically became return is_a($mixed, 'PEAR_Error').

Since PEAR existed before exceptions (and is still a reasonable pattern to handle errors), It became quite common practice to have returns from methods which looked like this.

@return {String|PEAR_Error} $mixed  return some data..

So the callee would check the return using PEAR::isError(), or quite often just is_a($ret,'PEAR_Error'), if you knew that the PEAR class might not have been loaded. 

So now comes the change and let's see what happens.

The __autoload() issue.


Personally I never use __autoload, it's the new magic_quotes for me, making code unpredicatable and difficult to follow (read the post about require_once is part of your documentation). But anyway, each to their own, and for the PEAR packages I support I will usually commit any reasonable change that helps out people who are using autoload.

So there are users out there using autoload with my PEAR packages, as I quickly found last week. Quite a few of these packages use the is_a() pattern, and the users who had implemented __autoload() had very smartly decided that calling autoload with a name of a class that could not or did not exist was a serious error condition, and they either died, or threw exceptions.

Unfortunatly, since is_a() was sending all of the string data it got straight into __autoload(), this happened rather a lot. Leading to a run around hunt for all calls to is_a(), and code changes being put it to ensure that it never puts a string in the first argument.


The is_a(string) issue


While I'm not likely to see the autoload issue on my code, I'm not sure I really appreciate having to fix it so quickly without a timetable to change it. The other change that may cause random, undetectable bugs is the accepting a string.

imagine this bit of code

function nextTokString() { 
    if (!is_string($this->tok[$this->pos])) {
return PEAR::raiseError('....')
    }
    return $this->tok[$this->pos++];
}

... some code..
$tok =$this->nextTokString()
if (is_a($tok,'PEAR_Error')) {
    return $tok;
}
... do stuff with string.

Now what happens if the token is 'PEAR_Error', is_a() will now return true. The big issue with this is that unless you know about the is_a() change, this bug is going to be next to impossible to find.. No warning is issued, is_a() just silently returns true, where before it just returned false.

I was hoping that PHP 5.3.9 would go out the door with this reverted, or at least a warning stuck on string usage of is_a(), but nope, none of my efforts of persuasion appear to have worked.

While I do not think the change is particularly necessary (as the use case for the new signature is very rare, and acheivable in other ways), I think reverting this change before PHP 5.3.7+ went into major deployment is rather critical.  (yes it can take months before PHP releases start commonly arriving on servers). Then if it's deemed a necessary change (by vote) then go for it in 5.4... and add  a warning in the next version in the 5.3 series..

  

Anyway the fixes / workaround:


The simplest fix is to prepend tests with is_object

eg. 

if (is_a($tok,'PEAR_Error')) {

becomes

if (is_object($tok) && is_a($tok,'PEAR_Error')) {

if ( $tok instanceof PEAR_Error)) {


While you could start looking at the code and determining if you really need to prefix it with is_object(), the reality is unfortuntaly it may be simpler to stick this extra code in, just in case you start delivering strings where objects where expected.

Update


This has been fixed in 5.3.9, however part of this derives from some confusion over instanceof

When PHP5 was released and added instanceof, doing this when the class did not exist caused a fatal error.

if ( $tok instanceof Unknown_Class ) {

However, this was changed in 5.1 to not cause a fatal error, The documentation is not totally clear on this, especially for anyone who used PHP 5.0.*. 

Unfortunately, since migration times are slow, supporting 5.0-5.1 is a reality of life for anyone writing libraries (actually Most of the libraries I write for still provide support for PHP4). So using any 'new' feature of the language basically prevents you from supporting older version of PHP with new code.

In this case, PHP5 usage has slipped below 0.3% so removing support for this should be fine. 


Posted by in PHP | Add / View Comments()

11 Aug 2011

Cli parsing in FlexyFramework, PEAR Console_GetArg

 And another rare article get's published, I've been slacking off posting recently. As I've been busy getting some interesting sites online. The biggest being a rather fun viral advertising campaign on facebook www.facebook.com/deargoodboy. Which I ended up project managing, after originally only committing to do the facebook integration.

Anyway back to the open source stuff. One of the tasks I've had on my todo list for a while is revamping the CLI handling of my framework, Which probably has a tiny following, but is one of those increadably simple, yet powerfull backbones to all my projects.

While this article focuses on the changes to the framework, it should also be of interest to others developing frameworks, and anyone interested in using PEAR's Console_GetArg.

Posted by in PEAR | Add / View Comments()

25 Apr 2011

Gtk3 introspection updates and Unusable Unity..

Well, as Gnome 3 is out, it has to be tested. Luckily I've not got a huge deployment to sort out, but as I have a few applications that use Gtk, I thought it was about time I upgraded one of my machines to see what chaos I will have to deal with in the future.

So it was one of my Ubuntu boxes that got the pleasure of a Natty and Gnome3 PPA upgrade. (I use debian on my other development box, which actually got destroyed last week with a complete disk failure, although I suspect the motherboard may have problems... It's getting old like me...)

Upgrading to Natty is not to bad, from what I remember it only took a small amount of brain surgery to get it to boot correctly after the upgrade. But once up, you get the pleasure of the Unity desktop. My first impressions where not to hot on unity, my wife uses it on her netbook, it's great there, after the initial shock of me upgrading without her knowing, she actually said it was alot better than compiz. Although she missed the special effects.

But after using Unity on the big screens, it just became unbearable. Detached menus may seem like a cool idea, and are quite handy on a netbook, but they are an absolute nightmare when using things like gimp on dual head full HD monitors, my wrists hurt after a few minutes.... 

Along with the removal of the Applications/Places/System menu's which while klunky are still handy for quickly finding applications. A classic example of this Alleyoop Memory Checker, a very nice wrapper around valgrind. In the Unity world if you do not know the name of the application, then finding it is a huge mouse journey around big icons. 

As for the left icon menu, all I can say is that I'm not the worlds best designer (although at least I did study it), but it's so graphically noisy that it unusable. It's basically a bad re-invention of Docky/Cairo Dock, which do far better jobs at providing a similar task role. 

So after all that I did try and get gnome-shell going, but unfortunatly the Gnome3 PPA build is not currrently working, and also has a rather nasty habit removing all usable desktop enviroments. I ended up adding xterm to one of the /etc/Xorg/X.sessiond files and starting up gnome-panel, mutter and docky to produce a usable desktop for the time being, while I wait to test out the latest gnome-shell.

So on with the harder stuff.. - Gtk3 and introspection.

One of the key applications I use to develop is app.Builder.js , it's a drag/drop interface to build web applications, that also allows you to fill in all the code and associate it clearly with the element and event occuring. It's written in Javascript, and uses Gnome seed to run on the desktop. As I've mentioned before Seed is a bridge layer between the Webkit Javascript engine, and Gobject-introspection, the now standard way to interface Gnome/Gtk/Glib etc. projects to non-C languages, eg. Python, Javascript (and others...)

With the introduction of Gtk3, GObject introspection has also been updated, and the updated mix between the two had quite a few knock on effects to the builder I had written using Gtk2 and pre-0.9 versions of Gobject introspections. Heres a general summary of the changes.

TreeIter and TextIter 

The latest version of GObject introspection has a feature called caller allocates, this basically means that previously with Seed we had to create an instance of a TreeIter, the the TreeIter call would be of type 'inout' (eg. the Iter would be sent into the method, and returned out)

eg.                                                               
var iter = new Gtk.TreeIter();
model.get_iter_from_string(iter, path);
// iter would now contain the tree iter for that path..

In newer versions, the iter is an 'out' value, which means you have to create an object for the iter to be added to. eg.
var iret = {};
model.get_iter_from_string(iret, path);
// iret.iter now contains the tree iter.

TreeSelection 

since the get_selected method for a GtkTreeSelection now has 2 out values, the call has change from

OLD:
var iter = new Gtk.TreeIter();
selection.get_selected(model, iter);

NEW
var sret = {};
selection.get_selected(sret);
// sret now contains { model: **THE MODEL**, iter: **THE ITER** }

TreeModel get_value


Since get_value does not have a return value, seed with return the 'out' values as the return object.

OLD:
var value = new GObject.Value('');
model.get_value(iter, 2, value);
print(value.value);

NEW
var str = model.get_value(iter, 2).value.get_string();
print(str);

Drag pixmap becomes surfaces

This is a pure Gtk3 API change (BC break)

OLD:
var pix = widget.create_row_drag_icon ( path);
Gtk.drag_set_icon_pixmap (ctx, pix.get_colormap(),   pix,  null, ..... )

NEW:
var pix = widget.create_row_drag_icon ( path);
Gtk.drag_set_icon_surface(ctx, pix);

Drag drop data passing..


The drag drop signals appear to work ok, however I've not managed to get the data to go back and forth, 
a quick workaround is to just use some form of global variable to store the current dragged item (I doubt you will get more than one dragged item at once..)

Drag drop API

alot of these appear to have played musical chairs.
GtkWidget.prototype.drag_source_set -> Gtk.drag_source_set
Gtk.drag_source_set_target_list -> GtkWidget.prototype.drag_source_set_target_list
Gtk.drag_dest_set -> GtkWidget.prototype.drag_dest_set

Internal Seed changes


I've added a few more fixes to Seed in the last few weeks, mostly to handle compiling correctly and detecting the correct version of introspection. for the most part it's working fine, however I'm still a bit baffled by a Glib memory corruption bug, which occured after multiple model.set_value and model.get_value calls. After running valgrind, I managed to stop the corruption occuring by increasing the allocated size for a struct by 1 byte 
Around line 546 and 640 of seed-engine.c the change goes something like this.
-                  out_args[n_out_args].v_pointer = g_malloc0 (size);
+                  out_args[n_out_args].v_pointer = g_malloc0 (size+ 1);

Arround line 738 of seed-structs.c  
-  object = g_slice_alloc0 (size);
+  object = g_slice_alloc0 (size +1);
 



Posted by in Gtk | Add / View Comments()

10 Apr 2011

How to spam in PHP..

Well, after having written a huge anti-spam system, it now time to solve the reverse problem, sending out huge amounts of email. Only kidding, but the idea of scaling email sending using PHP is quite interesting.

The reason this has been relivant in the last two weeks is two fold, first off, my slow and sometimes painfull rewrite of mtrack has got to the point of looking at email distribution. Along with this I have  a project that needs to distribute press releases, and track responses. Since both projects now use the same underlying component framework (Pman.Core and Pman.Base). It seemed like an ideal time to write some generic code that can solve both issues.

Classic mailing, you press, we send...

I've forgotton how many times I've written code that sends out email, pretty much all of it tends to be of the varient, that the user of the web applicaiton presses a button, then the backend code generates one or many emails, and sends it out. Most frequently using SMTP to the localhost mailserver.

In most cases this works fine. You might run into trouble if your local mailserver is down or busy, but for the most part it's a very reliable way to send out less than 10 emails in one go.

Queues and bulk sending

One of my associates makes a tiny amount of money by offering the service of sending out newsletters and news about bar's and restaurants, to do this he bought a commercial PHP package, which I occasionally have the annoying task of debugging and maintaining. What is interesting about this package are the methods it uses to send out email. Basically once you have prepared a mailout, and selected who it goes to, it creates records in a table that goes something like this:
User X  | Mailout Y
123     | 34
124     | 34
...
There are two methods to then send these mailouts, first is via the web interface, that uses a bit of ajax refresh to keep loading the page and send out a number of emails in on go (eg. 10 at a time). or there is the cron version that periodically runs and tries to send out all the mails in that table.

This method always sends to the localhost mailserver, and let's that sort out the bounces, queuing, retry etc. It has a tendancy to be very slow , and use up a huge amount of memory if sending out huge volumes of email. Most of it get's stuck in the mailserver queue, and the spammer has no real idea if the end users might have recieved it. If the mailserver get's stuck or blocked, the messages can often sit in the queue until they expire 2 days later, by which time the event at the bar may have already occurred.

The MTrack way

I'm not sure if I mentioned before, but I was intreged by the method used by mtrack when I first saw it. For those unaware of what mtrack is, it's a issue tracker based on trac. One of it's jobs is to send out emails to anyone 'watching'  a project/ bug etc. 

From my understanding of what mtrack was doing (the original code has long been forgotten and removed now). Is that it set up a 'watch' list, eg. Brian is watching Project X, and Fred is watching Issue 12. 

When Issue 12 changed, or someone committed something to Project X, no actual email was sent at that moment. This obviously removed a failure point on the commit or bug update, and if you had 100's of people watching an issue (like launchpad) for example, this would prevent the server hanging while it was busy sending all the emails.

The unfortunate downside was that to make the notifications work a cron job was required, this cron job had to hunt and find all the changes that had occurend and cross reference that with all the people who may have been watching those issues. The code for which was mindblowingly complex, and i suspect was a port of the original trac code.

As somebody who basically looks at really complex conditional code and wonders 'is that really the best way to do this', I really had to come up with an alternative.

Bulk mailing done right....

So to solve my issues with mtrack and the other project, I devised a system that was both simple and fast at the same time. Here's the lowdown.

First off, for both the Mtrack and mailout system, they both generate the distribution list when the web application user pushes the button. So for Mtrack, somebody updates the ticket (adding a comment for example). And the controller for the ticket page basically does a few things

a) If you have modified the ticket owner (or developer) make sure they are on the 'watch list' or subscribers. 
b) ask the watch list (the dataobject code for core_watch) to generate a list of people to notify (in our core_notify table), and make sure we do not send an email to the person filling in the form (as he knows he just did that and does not need to be reminded..)

For the other mailout system, It also just generates elements in the core_notify table, actually since the database table for the distribution targets different in that application, we actually have a seperatea table called XXXX_notify, and using the joy's of DB_DataObject and Object orientation, that class just extends the core_notify table, from what I rembember the only bit of code in that class is var $__table = 'XXXX_notify', since the links.ini handles the reference table data.

And now for the really cool part, sending the mails out. Obviously this is done via cron jobs (so as not to distrupt the user interface). The backend consists of two parts (pretty much how a mailserver works.). The first is the queue runner. This basically runs through the notify table, and makes a big list of it's of what to send out. This uses the ensureSingle() feature of HTML_FlexyFramework, to ensure only one instance of the queue can be running at once.

Then rather than sequentially sending each email, it basically proc_open's a new PHP process to send each email. This enables the queue to send concurrently many emails, rather than relying on a single pipleline. The code monitors these sub processes, and ensure that only a fixed number are running at the same time. We do not want to look to much like a spammer to our ISP..

The code that sends out the single email can then use MX resolution, and send direct to the real email server, and log results (success, failure or try later.)

Now to actually test all this....

Posted by in PHP | Add / View Comments()

22 Mar 2011

DataObjects links.ini and Archiving database data - an ideas of march post..

Trying to keep up with the Ideas of March - and get a few more posts out.

DB_DataObject is the workhorse behind most of my project work. It saves a huge amount of development time, and makes code considerably simpler to understand, modify and maintain. Recently I've added a few new features to this granddaddy of ORM's, and solved a perpetural data management problem at the same time.

AutoJoins

Last month saw the commit of the autoJoin method to pear's svn server, this is a cut down version of something that's already in use in the Pman Roojs interface layer. The basic problem it solves is that when you are doing rapid development, you do not really want to be spending time writing the code that add's all the tables together into a join. 

for example a typical list people with their office details would look something like this.

$person = DB_DataObject::factory('person');
$office = DB_DataObject::factory('office');
$person->joinAdd($office,'LEFT');
$person->selectAs();
$person->selectAs($office, 'office_id_%s');
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This would rely on a links.ini file describing the table connections

[person]
office_id = office:id

With the new autoJoin() method, this code is reduced to 

$person = DB_DataObject::factory('person');
$person->autoJoin();
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This not only simplifies the code, but if we change the database design and add a company field to the person table and change the links.ini file to

[person]
office_id = office:id
company_id = company:id

Then the resulting data will include the person's company details, just by adding a single line in the links.ini.The original code in Roo.php enabled filtering and distinct queries to be built automatically - this may one day end up as some of the options as a parameter for autoJoin().


Archiving data

This link.ini schema map file has another huge benefit, which I've now realized with a quite simple piece of code to archive data from a database, so it can be removed and restored later. This was necessary on two projects, one where the database contained press clippings about a client company for a media monitoring project. In this case the owner wanted to remove all the clippings and users relating to a specific client as they where no longer using the service, and it was taking up disk space. Along with this it was obviously slowing down the queries due to the large data size.

The other problem area was our mail filtering and anti-spam product. It  had a rolling archive of all mail delivered to a client, which was automatically pruned using cron jobs. On most of our servers this was limited to about a week, however on one client we had data in the database for over a years worth of emails for 500 staff. With that amount of data, it was consuming a considerable amount of disk space and certian searches where rather slow. 

My concept to solve both issues was basically the same. using the relationship data in the links.ini file, it is possible to work out what data in the database is both 'owned' and required when exporting and restoring data.

In the mail example, each email is referenced by a few rows in the Attachment table (so they could quickly search for files). So when we export the emails we could also export and flag for deletion the related Attachments. Along with this the mail row had references to the Senders, who we would have to exist if we restored the data, however we did not need to delete them. 

The solution to all this is currently a class available in the Pman Admin module called Dump. It could quite easily be moved into a library DB_DataObject_Dump. As long as you have your DB_DataObject configuration set up correctly, it takes a table and a query, and locates all the records from related tables that can be deleted or are required for a restoration then creates a number of files.

a restore.sql - which can recreate the data which has been removed, using a large number of INSERT statements for the core data, dependant records and related records.
a deletes.sql - which removes the core data and dependant from the database.

Along with this it also supports hooks (or a dumb class based method API), which are just methods that can be implemented in the Dataobjects being dumped, that enable you to copy and restore the files relating to the database records (for example our original emails.) The second part of the script uses that information to generate the following shell scripts

a restore.sh - which copies files from the backup into the original location
a backup.sh - which copies the files from the original location to the backup location
a delete.sh - which deletes the files from the original location.

All I need to do now is write a nice UI for it all...




Posted by in PEAR | Add / View Comments()
« prev page    (Page 3 of 24, totalling 233 entries)    next page »