Disable Excessive Validation? / Help Please!
Is there anyway to reduce the amount of hand holding flourish does when used fDatabase? When processing large datasets, the amount of checks flourish does for data consistency seriously slows a script down.
For example, I have a table "daily_tt_23" that has a foreign key to the field "stps_id" in table "stps". Each time, I try to insert a record in "daily_tt_23", flourish first confirms the value of "stps_id" actually exists in table "stps". It also encapsulates each insert into a transaction. So now, to process a single SQL command, I'm executing 4 SQL commands.
What I'd really like to see is some way to globally disable all of this consistency checking in advance. That way, depending on the project, you could allow flourish to help you out for low volume stuff or get out of the way for high volume stuff.
Unfortunately, all of this is slowing my scripts down too much. Unless there is a workaround, I won't be able to use flourish for command line processing. I still love flourish and will use it for web sites, but will have to go back to my old techniques for batch processing.
Here is an example of the query log :
090702 17:10:21 1 Query truncate table daily_tt_23
1 Query SHOW CREATE TABLE daily_tt_23
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 3
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (3, 1201200, 12012, 7, 1201200, 12012, 'DPCNGT', '226-012-000', 'GT', 0, 10, '2009-07-02 17:10:21')
090702 17:10:22 1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 3
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (3, 1705341, 17053, 7, 1705341, 17053, 'DPCNGT', '226-012-000', 'GT', 0, 10, '2009-07-02 17:10:22')
1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 3
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (3, 1705927, 17059, 7, 1705931, 17059, 'DPCNGT', '213-001-001', 'GT', 0, 10, '2009-07-02 17:10:22')
1 Query COMMIT
1 Query BEGIN
090702 17:10:23 1 Query SELECT stps_id FROM stps WHERE stps_id = 3
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (3, 14165644400, 14165, 11, 14165644499, 14165, 'DPCNGT', '226-012-000', 'GT', 0, 10, '2009-07-02 17:10:22')
1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 4
090702 17:10:24 1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (4, 1201200, 12012, 7, 1201200, 12012, 'DPCNGT', '226-012-000', 'GT', 0, 10, '2009-07-02 17:10:23')
1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 4
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (4, 1705341, 17053, 7, 1705341, 17053, 'DPCNGT', '226-012-000', 'GT', 0, 10, '2009-07-02 17:10:24')
1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 4
1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (4, 1705927, 17059, 7, 1705931, 17059, 'DPCNGT', '213-001-001', 'GT', 0, 10, '2009-07-02 17:10:24')
1 Query COMMIT
1 Query BEGIN
1 Query SELECT stps_id FROM stps WHERE stps_id = 4
090702 17:10:25 1 Query INSERT INTO daily_tt_23 (stps_id, gta, gta_15, gta_len, egta, egta_15, xlat, dpc, ri, ssn, ngt, created) VALUES (4, 14165644400, 14165, 11, 14165644499, 14165, 'DPC', '226-012-000', 'GT', 0, 0, '2009-07-02 17:10:24')
1 Query COMMIT
-
Message #317
I've sort of found the solution to this problem. The answer is to use simple SQL instead of fActiveRecord. Doing this on a very small dataset(18,000 records) reduced my script execution time from an average of 6 seconds down to an average of 3.2 seconds.
I know that doesn't seem like a lot. However, this was just a small subset of my records. Normally, this script will process >250K records multiple times. Other scripts I'm converting will process millions of records. All those seconds will add up.
I also think I can improve the speeds even more by getting rid of the fRecordSet usage as well.
So, now I still get to use all the other flourish goodness, but I'll do the queries the old fashioned way.
justbn07/02/09 18:20:38 -
Message #318
From the SQL you posted, it seems to me you are using the ORM, not just fDatabase. The ORM does automatic data escaping, unescaping, validation with nice messages and more, but obviously not without a cost.
The simplest way to improve your SQL would be to start a transaction manually before you process all of your records:
fORMDatabase::retrieve()->query('BEGIN'); foreach ($records as $record) { $record->store(); } fORMDatabase::retrieve()->query('COMMIT');
However, you are going to find that for massive data sets, pure fDatabase calls will be much more efficient:
$db->query("INSERT INTO table(column, second_column) VALUES (%s, %i)", $column, $second_column);
One of the things on my todo list for fResult is batch unescaping, so instead of calling escape for each column of each row, you will be able to escape a result set with one method call:
$res->unescape('column:datatype', 'second_column:datatype'); foreach ($res as $row) { // ... }
wbond07/02/09 18:26:50
