PEAR

11 Aug 2011

Cli parsing in FlexyFramework, PEAR Console_GetArg

 And another rare article get's published, I've been slacking off posting recently. As I've been busy getting some interesting sites online. The biggest being a rather fun viral advertising campaign on facebook www.facebook.com/deargoodboy. Which I ended up project managing, after originally only committing to do the facebook integration.

Anyway back to the open source stuff. One of the tasks I've had on my todo list for a while is revamping the CLI handling of my framework, Which probably has a tiny following, but is one of those increadably simple, yet powerfull backbones to all my projects.

While this article focuses on the changes to the framework, it should also be of interest to others developing frameworks, and anyone interested in using PEAR's Console_GetArg.

Posted by in PEAR | Add / View Comments()

22 Mar 2011

DataObjects links.ini and Archiving database data - an ideas of march post..

Trying to keep up with the Ideas of March - and get a few more posts out.

DB_DataObject is the workhorse behind most of my project work. It saves a huge amount of development time, and makes code considerably simpler to understand, modify and maintain. Recently I've added a few new features to this granddaddy of ORM's, and solved a perpetural data management problem at the same time.

AutoJoins

Last month saw the commit of the autoJoin method to pear's svn server, this is a cut down version of something that's already in use in the Pman Roojs interface layer. The basic problem it solves is that when you are doing rapid development, you do not really want to be spending time writing the code that add's all the tables together into a join. 

for example a typical list people with their office details would look something like this.

$person = DB_DataObject::factory('person');
$office = DB_DataObject::factory('office');
$person->joinAdd($office,'LEFT');
$person->selectAs();
$person->selectAs($office, 'office_id_%s');
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This would rely on a links.ini file describing the table connections

[person]
office_id = office:id

With the new autoJoin() method, this code is reduced to 

$person = DB_DataObject::factory('person');
$person->autoJoin();
$results = $person->fetchAll();
$ret = array();
foreach($results as $r) {
        $ret[] =$r->toArray();
}
echo json_encode(array('total'=>count($ret), 'data'=>$ret));

This not only simplifies the code, but if we change the database design and add a company field to the person table and change the links.ini file to

[person]
office_id = office:id
company_id = company:id

Then the resulting data will include the person's company details, just by adding a single line in the links.ini.The original code in Roo.php enabled filtering and distinct queries to be built automatically - this may one day end up as some of the options as a parameter for autoJoin().


Archiving data

This link.ini schema map file has another huge benefit, which I've now realized with a quite simple piece of code to archive data from a database, so it can be removed and restored later. This was necessary on two projects, one where the database contained press clippings about a client company for a media monitoring project. In this case the owner wanted to remove all the clippings and users relating to a specific client as they where no longer using the service, and it was taking up disk space. Along with this it was obviously slowing down the queries due to the large data size.

The other problem area was our mail filtering and anti-spam product. It  had a rolling archive of all mail delivered to a client, which was automatically pruned using cron jobs. On most of our servers this was limited to about a week, however on one client we had data in the database for over a years worth of emails for 500 staff. With that amount of data, it was consuming a considerable amount of disk space and certian searches where rather slow. 

My concept to solve both issues was basically the same. using the relationship data in the links.ini file, it is possible to work out what data in the database is both 'owned' and required when exporting and restoring data.

In the mail example, each email is referenced by a few rows in the Attachment table (so they could quickly search for files). So when we export the emails we could also export and flag for deletion the related Attachments. Along with this the mail row had references to the Senders, who we would have to exist if we restored the data, however we did not need to delete them. 

The solution to all this is currently a class available in the Pman Admin module called Dump. It could quite easily be moved into a library DB_DataObject_Dump. As long as you have your DB_DataObject configuration set up correctly, it takes a table and a query, and locates all the records from related tables that can be deleted or are required for a restoration then creates a number of files.

a restore.sql - which can recreate the data which has been removed, using a large number of INSERT statements for the core data, dependant records and related records.
a deletes.sql - which removes the core data and dependant from the database.

Along with this it also supports hooks (or a dumb class based method API), which are just methods that can be implemented in the Dataobjects being dumped, that enable you to copy and restore the files relating to the database records (for example our original emails.) The second part of the script uses that information to generate the following shell scripts

a restore.sh - which copies files from the backup into the original location
a backup.sh - which copies the files from the original location to the backup location
a delete.sh - which deletes the files from the original location.

All I need to do now is write a nice UI for it all...




Posted by in PEAR | Add / View Comments()

13 Feb 2011

SQL change management done right.

I'm slowly going through mtrack at present, breaking the HTML rendering into templates, and making nice clean classes for the fetching of underlying data. All this is due to implement quite a few ideas once it's in a state to be easily changed.

The more deeply I go through mtrack though, there are parts I find that you look at  and think, "That's a really smart way to handing that problem". Like the change control auditing, where each component has a ID for the last updated (and created) This maps to a event table reord containing who/when data. Much cleaner than the more common practice of using two datetime and user fields.

However, as always, there are parts of the code where you want to pull you hair out and say, No way should you solve a problem like this. Mtrack's SQL change management is one of those areas.

It's approach is not uncommon, and I've seen it before. Basically it uses an XML file to define the schema and then has a series of files schema1.xml,schema2.xml,schema3.xml... and so on. 

The installer works out what the current version is, then compares schema's between the current and selected version. and works out how to update the database. There is a driver for PostgreSQL and SQLite. 

My fundamental issue with all this is that while on the face of things it does not seems like a bad idea, however, it ignores the plain simple fact that there is already a language used for Database Schema definitions and modifications, and it's called SQL!

Anyone who uses SQL can quickly read a SQL file and understand what it's doing, even if it's in a different dialect (SQLite etc...), but putting that type of information in a XML file just adds so much noise. Worse of all, it involves learning and remembering a small subset of knowledge, that is only relevant to a tiny problem domain. Then the worst thing is it's just plain annoying to read.

For my projects I'm luckly only having to deal with a single database vendor (MySQL usually). To manage changes I keep SQL updated, this file contains all the table definitions along with the later changes. To update the database, I just pipe it through the mysql command line with the '-f' (force) switch. This is a trivial, simple and effective way to keep the database schema in sync wit the project code. There are even wrappers in the Component Framework code to do this simple task for you

The big problem however is that SQL, for all these great benefits has turned out to be horribly incompatible between database vendors. If I carry on with the above idea of keeping my database modifications in a SQL file,  then I would end up with one for each of the databases I want to support. Then I not only have to keep each of the schema files up-to-date, but have to remember the syntax for multiple database vendors, and apply each change to every file, not really a good long term plan.

So rather than keep multiple files up-to-date, I wondered, why not convert the SQL schema changes from one dialect to another, and just keep an SQL file as I currently do, and make it feaasible for anyone to install using the database of their choice.

This is one of those 'why has no-one done this before moments', but for the life of me, I could not find anything that came up to quickly on google. So I had a look at what was 'close enough' for this idea to work, and what a supprise, most of the code for this is already in PEAR.

The SQL Parser package, as very basically introduced here, http://www.sjhannah.com/blog/?p=16, provides pretty much all the backend code for a solution to this. However, there was not previously any code in either the parser, or writer/compiler to actually deal with DDL commands like alter table etc.

I've just commited the changes for this, so you can now very easily extend the current SQL_Parser_Compiler class to output your favourite dialect of SQL, based on reading an SQL file containing the base changes.

For a example of how to use it in real life, here's a nice simple example from the forked mtrack codebase.

And here's the commit that makes it possible..

And finally, a good example of a SQL file that can be run through the parser/generator.

Posted by in PEAR | Add / View Comments()

21 Sep 2009

PEAR state of play, why move to PEAR2

Just before I saw Brendan's post about PHP4 compatibility in PEAR, I had been getting a few queries about making a couple of my PEAR packages more 'PHP5' compatible or PEAR2 ready.

From my perspective, pretty much all of the packages I maintain (As far as I know) are PHP5 'compatible'. however they may emit E_STRICT errors.

This brings up the interesting question, which I guess all the current maintainers, users and contributors have come across, how much value is added to the existing packages by adding that level of support.

From an 'ideal world' / 'perfect coding' perspective, they would benefit from this changes. but as somebody who earns an income by delivering projects as quickly and efficiently as possible, the return on investment for making those changes is very tiny, if not negative.

Since the packages generally just work, making the changes required, would not really change that 'just work' situation, and as  Jamie Zawinski famously said "How will this software get him laid?"

Two of the biggest changes I'm aware of for this 'PHP5 compatibility' issue are the 'static' method prefix and getting rid of 'var' which completely break PHP4 compatibility (and yes we still maintain PHP4 applications, and clients rarely have budget to make changes like this). Doing these changes would mean that I would have to either freeze or depreciate PHP4 support, or start maintaining dual versions. (Personally I would prefer a hook in the package builder that would do the replacement for me, so I could upload 2 package on each release).

Going forward, PEAR2 is still in a gestation period, (as PHP5.3 and namespaces support has just come out.) Resulting in any code that had  targeted PHP5.3/PEAR2 aging very quickly (eg. requiring changes to handle the final changes to the namespace syntax.). This may start changing soon, however I suspect it would really take some significant effort in time to start creating PEAR2 packages for existing code (which has a rather poor return on investment) . And without a reasonable base number of packages, the attraction of submitting code to PEAR2 is lessened. A classic chicken and egg situation.

At the same time, there is no real alternative to PEAR2, pretty much all other 'framework' solutions have been built around the assumption that you have to accept a majority of the 'framework' to utilize the single packages. Which is even worse that the pains that PEAR(1) imposes on you.

All that said, if you want to send me patches to fix any big PHP5 issues in my packages please don't hesitate, I will try and make the changes.

Posted by in PEAR | Add / View Comments()

13 Mar 2009

Dataobjects, Flexy Releases - and Ext-Roo builder for FlexyFramework.

Well, after a busy few months, things have gone quiet again. Hopefully it's a short term thing, but it has given me a bit of time to do get back to the less profitable things in life (like blogging).

PEAR releases

Yes, after a quite a few emails bugging me to release updates to DataObjects and Template Flexy, I finally got round to getting them out the door  (and even fixing a few bugs after the got out). I've also made an effort to get Services_JSON onto the pear release system, as I use it quite a bit, and it's been sitting in as the proposal for Services_JSON for well over a year.

Facebook

Yes, that other great waster of time, I've finally set up an account for me - "AK BK Consulting" - After my first effort of using facebook fell apart, due to the mess it made of mixing family stuff and professional stuff, (I use my wife's account for my family stuff). I set up that one so I can join up with PHP developers anywhere and IT people in Hong Kong. So feel free to add me as a friend (as I dont have many ;) - this blog should be  syndicated into my page...

ExtJS / Roo builder for HTML_FlexyFramework


I was messing around this week writing a builder for HTML_FlexyFramework, my little lightweight page loader, that integrates quite nicely with Dataobjects, Template Flexy and pear in general. Part of the incentive for this was seeing a little project that a potential client had developed, in some windows application that generated an how site starting with the database schema.

The idea  was quite nice, and the interface of the builder was quite friendly. But the downside was that the resulting code was just unusable jibberish. So rather than work out how to add features to it, I wondered if using DataObject's Generator as a core, I could generate a whole ExtJS/RooJS interface from the database, and then edit that to quickly get a frontend up and running.

The code's sitting in my akpear repo RooJS_DB_DataObject (it actually writes ExtJS1.1 code) and does the basic tasks of setting up a grid for each table, a simple form to edit each table, along with some limited joins based on the links.ini

If you want to try it out, it runs like this:
php  RooJS_DB_DataObject/cli.php 'mysql://dbuser:dbpass@dbhost/dbname'

Nice little proof of concept.. It's got some idea how to 'update' it's generated code, but I've disabled that at present. It should however give you a quick way to jumpstart an ExtJS application.



Posted by in PEAR | Add / View Comments()

14 Jul 2007

crazy require_once optimization rears it head again.

There's been a long thread the last couple of weeks covering require_once and some rather crazy ideas to change the PEAR standards to do all sorts of strange things to load code up.

What is most ridiculus is the benchmarking that is being done to 'prove' that we should all move our code loaders to some 'allfiles.php' and we are magicially going to running be 15% faster.

What this whole concepts fails to put into place is loading all the PHP files up, either tiered one to another or all from one place is such a small part of a PHP page responding from a request.

Think of what is happening when you look at this page.
  • You start by looking at a bootstrapper (that sets config option) - probably quite quick as it just set's some variables.
  • Then you run into the Framework class - that works out which file to load based on the URL you are typing. - This does quite a bit of text maniplation, making best guesses about where things might be. (this is quite generic and could be made specific to the application quite easily)
  • We then do the page action stuff, like pulling data from databases which requires quite a few PEAR libraries. And does some really slow crap like connecting to databases and pulling data. This stage probably does far to many queries and pulls down to much data that it doesnt use.
  • Then we do output stuff, Normally we are using a compiled template, (so we dont load all the template parsing code - well at least we save a bit of effort here). so we need to pull down at least the Flexy class, then the compiled template.
Now looking at that whole process and thinking it's a bit slow, You would probably go through the following steps to optimize it.
  • replace it with a staticly generated page!!!
Well, yeah, that's it!....

Only kidding... - but if you do this to a normal page, you would probably want to do the following
  • reduce / remove database calls
  • cache as much as possible.
  • reduce / remove libraries needed and code used.
  • move any data intensive code into C
  • use an opcode cacher.
At no point would you get so silly that you would re-jig the 'require_once' calls just to get 15% saving on the code loading component, because wait for it..... the code loading component of your application (when using APC) would take such an insignificant percentage of the total running time.

Working out how to remove a whole file or pre-process some bit of data would be a far better use of your time. And if you where running a team focusing on this you would probably have fired the guy who wasted all that time on such a pointless task as moving the require_once lines around.

Sounds like moving the deck chairs on the titanic?
Posted by in PEAR | Add / View Comments()

24 Nov 2006

Multilanguage setup for Flexy and FlexyFramework.

Having just released the increadibly minor update to the Flexy Template Engine in pear, I can now release the documentation that goes with it...

FlexyFramework, with HTML_Template_Flexy include the ability quickly create multilingual websites, by translating the templates.

I've set this up a few times now, and kept meaning to document it. - So here it goes.

Having set up FlexyFramework, (see the previous post about this),

... The full instructions are in the Extended entry....

Posted by in PEAR | Add / View Comments()

23 Jul 2005

Flexy, the condom of template engines updated

After seeing quite a few "fixed XSS problem" cvs commits on the mailing lists, I finally remembered that a release of Flexy was overdue. The thought, "that would not have happened if they had used flexy" came to mind a bit too often.. Flexy is designed to be safe by default (like a firewall where you have to try hard to open ports)

HTML_Template_Flexy has not had much added to it in the last 6 months, probably as it just 'works', tries not to get in your way, or do to much.. But there have been a few bugs left over from the last release in January, along with a few tiny feature requests. So I finally got round to tidying up a release.

There is nothing mind shattering in this release, But users and potential users may be interested in the extra pages in the manual. They should come online by monday:
  • flexy:nameuses
  • flexy:tojavascript jsvar="phpvar"
Along with a pretty complete list of configuration options that I updated a few weeks ago.

As far as I know there is not to much more missing.. but some of these are a bit critical
  • {_( translation markers )_} , and how to use the translation toolkit.
  • flexy:raw="somevar" for putting text within tags (as otherwise the syntax would break HTML editors.)
  • Using some of the modifiers from the Savant Plugin (hint {xxx():numberformat}
I'm sure there is more, but that's all I could think of at present..


Posted by in PEAR | Add / View Comments()

04 Apr 2005

PHP as a template engine, or recipe for disaster?

Whenever someone starts saying template engines, there's an equally vocal community that gently suggests that PHP is a great template engine. Well, I think this week that sounded alot like bollocks...

The pear website, while not a masterpiece for PHP code, has however been written by some pretty smart people, and uses (in parts) the concept of PHP as a template engine. Last week however we got a very polite email to the group mentioning that it was possible to do Cross site scripting attacks on some pages.

The root of the issue was that it was outputing variables (either directly from input or indirectly) which had not been escaped correctly for HTML or javascript, so it was possible to make your favourite javascript hacks work through the url..

While the issues with pearweb where not that serious, it did illustrate the problem of simple PHP templating against more complex engines like Flexy.

When I wrote Flexy, I'd been doing webdev for quite a while, and realized that like everyone else, I make mistakes (some may say like my opinions on this blog). So to some degree, I tend to prefer my applications to protect me from myself, while at the same time allow me to deliberatly break things.

One of the more unusual features of Flexy, is that all tags eg. {stuffThatOutputsVariables} or the method calls are by default html escaped. (unless you explicitly add the :h modifier). Not only this, these tags within javascript blocks, just dont work. You are forced to use the <flexy:tojavascript tags to send variables to the javascript code, again, reducing the chances of accidentally letting your friendly hacker have fun with your site..

So while PHP templates have some advantages, in that it lacks the requirement for compiling. That penalty seems a small price to pay for the extra protection.. so Flexy's new catchphrase may be, "Put your condom on, and use a Flexy Template Engine..."
Posted by in PEAR | Add / View Comments()

27 Mar 2005

Generating excel, again.

For a change, I've taken break from bashing internals, and got back to real work. (More on DBDO later this week hopefully)

One of my on-going projects, that has been dragging on longer than I would of liked is a shipping management application. I think it's mentioned in the archives, but for anyone who missed it, it is a mid sized XUL application which deals primarily with the management of a trading companies shipping requirements. I originally outsourced the main development, and have been tidying up and refining the code as we near final deployment (which as usual has taken longer than expected.)

This week I sat down and focused on the last major part of the project, reporting. Almost all the requirements for reporting include the ability to download an excel file of the data. So previously I had been making heavy use of PEAR's Spreadsheet_Excel_Writer. In using it, I had gone through various stages of evolution
  • Writing raw Excel_Writer code in PHP, This however becomes very tedious, is not amazingly readable, kind of breaks the seperation of display/computation. And tends to be less flexible over a long period of time.
  • Using a gnumeric as a template and using XML_Tree to merge data with it and output via Spreadsheet_Excel_Writer, again this helped in terms of enabling a simpler API for spreadsheet writing, and moving some of the layout/look and feel into the Gnumeric template. But the code for doing this was not quite as elegant as I would have liked.
  • Using Javascript to read HTML tables and create a CSV file, that is sent to the server, and back again as text/csv mimetype (forcing the browser to open it in excel/openoffice etc.). Which was nice from an architectural point of view, by lacked any formating.
  • And finally this week. Using javascript to generate a Spreadsheet_Excel_Writer specific XML file (by mixing a XML template file and the HTML content of the page), sending it to the server, and then letting PHP use the DOM extension and simple iteration with Spreadsheet_Excel_Writer to generate the page.
This weeks solution while not quite complete has a number of key advantages, some of which appeared after I started using it.
  • No display level code goes into the Action->Data manipulation stage (we just store the data ready for the template engine/ template to render)
  • It is possible to visualize the data prior to it ending up in the excel file.
    • hence debugging the data output and finding issues is a lot quicker
  • More code reuse,
    • the library for XML to Excel is simple to reuse,
    • the code for extracting the data from the html and generating XML is simple enough for copy & paste. and maybe possible to create a js library eventually.
  • It offers infinate possibilities for formating, and changing layout.
  • Less memory intensive, the data retrieval/storage and excel file create are broken up into two seperate processes.

The extended entry includes a few more details....
Posted by in PEAR | Add / View Comments()
   (Page 1 of 3, totalling 27 entries)    next page »