Flourish PHP Unframework
This is an archived copy of the forum for reference purposes

How efficient is the SQL?

posted by rirez 7 years ago

How efficient with SQL is Flourish? For example, if we do a simple fActiveRecord call to pull out a row from a database, does Flourish automatically fetch all the row's data? Or does it wait for a request to one of its columns, and proceed to fetch just that?

As another example, if I were to count all the rows in a database, would it be okay to simply build a full fRecordSet object and using count() on it? Does building an fRecordSet object try to get all the data (which would be catastrophic for extremely large tables) or does it wait for us to do an operation?

Poking around in the source seems a bit unclear, though I did find a function called tally() in fRecordSet that suggests that more queries are done on the fly as data is requested. I can see both options having drawbacks, though: lazy querying leads to more queries in the long run, while the alternative can easily load way too much data than necessary.

Some explanations would be really useful!

The Flourish ORM (and the whole database subsytem, with the exception of fUnbufferedResult) uses eager loading, so all data it loaded in a single query/request. When you load build an fRecordSet, all columns from all rows are load in a single query that has its results buffered in memory. When you create an fActiveRecord object, if it hasn't already been created during the current execution, all columns are loaded in a single query.

In my experience, this is more efficient than lazy loading. Lazy loading only sends the data you ask for when you need it, but involves more round-trips and more PHP code execution.

The result of this is that you should only ask for data you are planning on using, and possibly segment your database tables such that blobs and text columns be in separate tables if you only rarely access them. So if you only want to count records that match a pattern, it is definitely better to use fRecordSet::tally(). That way all of your conditions are used, but a single number is returned rather than processing every row and constructing fActiveRecord objects.

Additionally, I find that for areas that are hit a lot, I tend to write custom SQL inside of methods in my fActiveRecord classes so that I get better performance, but the data access is encapsulated. Basically the ORM helps you get up and running quickly and easy tweak things and handle lots of situations without much work, but all of the convenience and extensibility does cost extra processing. Using a profiler such as xdebug is very useful to identify points in your code that are the most expensive, but also the areas you will get the most benefit from optimizing.

posted by wbond 7 years ago

Thanks for the quick reply!

I see. This would be a reason to further normalize entire databases.

Would, for example, fRecordSet::buildFromSQL help with this issue by only grabbing the columns we need, if we're in a read-only situation? Or is there a better solution for only getting the data we need?

posted by rirez 7 years ago

No, that is not possible. fActiveRecord is built around an interpretation of the active record pattern where an object represents one complete row in a table and encapsulates logic related to that row. There is no support in any form for only loading part of a row. In fact, I would probably expect the extra complexity of implementing that in a flexible manner that supported all of the current functionality would likely further decrease the performance of the ORM. fRecordSet will throw an exception if you don't SELECT * or SELECT table.* in your SQL statement.

If you only want to select specific columns, fDatabase and fResult are the tools in Flourish to do that. fResult::unescape() allows easily mapping raw DB values into PHP objects. Where I work there have been some situations where we've built lighter-weight abstractions on top of fDatabase and fResult for situations where we needed more specialized access, such as situations where we were dealing with poorly constructed legacy database tables that hand hundreds of columns. This approach takes more work, but allows the code to be specialized for your situation.

posted by wbond 7 years ago

I guess this means a good rule of thumb is to use fORM for well-structured databases with well-structured access, and fall back to our usual patterns of SQL work with the help of fDatabase/fResult to keep it sane (and retain that flourish-y goodness) when working with uglier schemas and data.

posted by rirez 7 years ago