ADF BC Tuning I: Entity Objects

This week’s post will be the first of a five-week series about an important but little-discussed topic: Tuning your business components for maximum performance. A lot of projects put very little effort towards business components tuning (usually nothing more than improving the SQL of expert-mode VOs), and because of this, a lot of developers new to the framework come away with the (false) impression that business components perform poorly. Business components actually perform quite well, so long as they’re properly tuned.

This week, I’m going to talk about tuning entity objects. Over the next weeks, I’ll cover associations, view objects, view links, and application modules.

Update Batching

Normally, each entity object instance that requires a post (that is, each new instance, deleted instance, or instance with at least one persistent attribute modified) issues its own JDBC command to post itself to the database. For many cases, probably even most cases, this is just fine: The user is editing a single row, and maybe adding/removing four or five detail rows, so we’re talking about somewhere between one and six separate posts to the database.

But you know, cases where this isn’t so are pretty common. In some applications, users might, in a single transaction, be changing dozens, hundreds, or even (in rare cases) thousands (or, I suppose, millions) of rows in a single transaction. Shuttle controls and multi-select lists are serious sources of this, since they often allow very quick creation or deletion of lots of rows. Five trips to the database when it’s time to post and commit changes isn’t a big deal, but even 30 is going to seriously impact application performance, and if it’s repeated across many users, starts to be a strain on resources.

If you’ve got a situation where dozens or hundreds of updates are going to be executed in a single transaction, you should seriously consider using JDBC Update Batching. JDBC Update Batching is a JDBC API that allows you to “batch” multiple JDBC statements into a single JDBC statement. This adds a bit of overhead (so you shouldn’t go around using it willy-nilly), but for sets of more than about 5 DML operations, the overhead is considerably smaller than the cost of using individual DB posts.

ADF supports update batching for many entity objects. There are, however, a few exceptions to this:

  • JDBC Update Batching does not support RETURNING clauses. This means that ADF can’t use it for any entity object definition that has an attribute explicitly marked as REFRESH AFTER INSERT or REFRESH AFTER UPDATE, that has an attribute of type DbSequence, or that uses a ROWID as a primary key.
  • JDBC Update Batching does not support streaming datatypes. So no LOBs or interMedia.

If you want to use update batching, here’s what you need to do:

  1. Decide which entity objects in your application are good candidates for update batching (the ones where multi-row posts are going to be fairly common, and that aren’t disqualified for the reasons above), and select the “Use Update Batching” in the Tuning secton of their General Page, as shown here:
    Entity Object Editor Tuning Options
  2.  Set the “When Number of Updates Exceeds” field to determine when update batching will be performed. If the number of new, deleted, or modified EO instances is less than or the same as the number specified, batch updating will be skipped. Because using the batch API does involve some overhead, I don’t particularly recommend leaving the default of “1” here. You may want to tinker with the number to find the right balance, but 5 or so is a good first pass.

You might note that there’s another tuning option above, “Retain Association Accessor Rowset,” but this is really part of tuning your associations, not your entity objects, so we’ll cover it in the next post.

JDBC Update Batching is a big help if you have anywhere from 5 to a few hundred DML operations you want to execute at once. But it’s not really up to the task of handling thousands of operations, let alone millions–the JDBC batch, at that point, is so big that it starts to cause problems. At the high tens-of-thousands point, you probably don’t even want to *create* all those entity object instances, or query their data into memory; that’s a lot of memory for just one transaction. A better solution for these cases is not to use entity objects at all, but rather to update the table using some sort of package API. Handling huge amounts of data is what the database is for; don’t saddle your poor middle tier with it.

Fault-In

You’re probably familiar with default view objects, the ones created on the “Updatable View Objects” page of the Business Components from Tables wizard, the “View Objects and View Links” page of the Default Data Model Components wizard, or the “New Default View Object” option from an entity object’s context menu. A default view object’s query has:

  • A SELECT clause containing the columns and expressions for each of the entity object’s persistent and SQL-calculated attributes.
  • A FROM clause containing the entity object’s base table
  • Nothing else

So, for example, if the Employees entity object has the attributes EmployeeId, FirstName, LastName, Email, PhoneNumber, JobId, Salary, ManagerId, and DepartmentId, a default view object (which would be called “EmployeesView,” the default name for a default view object) would have the query:

SELECT Employees.EMPLOYEE_ID,
       Employees.FIRST_NAME,
       Employees.LAST_NAME,
       Employees.EMAIL,
       Employees.PHONE_NUMBER,
       Employees.SALARY,
       Employees.MANAGER_ID,
       Employees.DEPARTMENT_ID
FROM EMPLOYEES Employees

One of the things you learn pretty quickly in ADF is that default view objects are next to useless, except as a starting place, in real applications–they usually query too much, or the wrong things, and don’t include useful joins to reference information (I’ll talk about this in a bit more detail when I cover view objects). But what you’re much less likely to know is that the queries that would get used for default view objects are used by ADF in certain circumstances even if you never use a default view object. One of these circumstances has to do with associations, so we’ll discuss it later, but the other has to do with a feature called fault-in.

Entity Attribute Population from Data: The Standard Case

How does an entity object instance corresponding to an existing database row get created? In the most common case, it’s created when an entity-based view object instance executes its query. Entity object instances are created to store the data, and then view rows are created to point to the entity object instances.

Of course, if the view object isn’t a default view object, it likely won’t retrieve data for every entity object attribute. For example, suppose you have a view object, EmpSalary, with the following query:

SELECT Employees.EMPLOYEE_ID,
       Employees.FIRST_NAME,
       Employees.LAST_NAME,
       Employees.SALARY
FROM EMPLOYEES Employees

If you’re UI is only going to display each employee’s name and salary, then, most of the time, this query is going to be what you want. But it’s not going to populate all the attributes of the Employees entity object instances it creates. In particular, the attributes Email, PhoneNumber, JobId, Salary, ManagerId, and DepartmentId. That’s probably fine–you’re not going to display those attributes anyway, and you’re not going to give the user a chance to change one directly.

What Fault-In Is and How it Happens

But what if you have, say, a validation rule on Employees, requires every employee with a salary of less than 10000 to have a manager? Perhaps something implemented as a Script Expression entity-level validation rule (set to fire only if Salary isn’t null and Salary or ManagerId has been changed),

source.salary < 10000 ||
source.managerId != null

If a user tries to change, say, Employee 101’s Salary attribute to a non-null value and then triggers entity-level validation (by, for example, trying to commit the transaction or navigate to a new row), this rule will fire and will look at the Manager attribute…which hasn’t been populated yet. What will the framework do?

What it will do, in this case or any other case where business logic requires a not-yet-queried attribute value, is trigger the fault-in feature. This feature executes Employees’ default view object query, limited exclusively to employee 101:

SELECT Employees.EMPLOYEE_ID,
       Employees.FIRST_NAME,
       Employees.LAST_NAME,
       Employees.EMAIL,
       Employees.PHONE_NUMBER,
       Employees.SALARY,
       Employees.MANAGER_ID,
       Employees.DEPARTMENT_ID
FROM EMPLOYEES Employees
WHERE Employees.EMPLOYEE_ID = 101

This is pretty clever: Usually, your users will query a lot more rows than they change, and won’t actually require non-displayed attributes even when they do make a change. So the framework only queries VO attributes in its original big block, and brings in entire entity rows on a per-row basis when it appears that additional attributes are needed. This saves lots of memory up-front, particularly if some of the non-VO attributes have large values.

But there are cases where it’s really not such a great idea to go about things this way.

When to Avoid Fault-In

Suppose that the Employees entity object, instead of having 10 attributes, had 100. After all, 100-column tables aren’t really all that rare. Or suppose that some of those column values were really of considerable size or required substatial overhead to open a stream to–VARCHAR2(255)s, maybe some LOBs or interMedia types. And suppose that really, the only attribute value missing from EmpSalary that was likely to be needed for business logic was ManagerId.

In this case, every time fault-in is triggered, in addition to the data you need from the database, you’ll pull in a lot more data: All the data required to populate (or at least prepare streams for) every one of those Employees attributes.

Or here’s another example: Suppose it’s not the case that the user will be querying far more rows than they update, and that most updates will involve changing Salaries to non-null values. Say, in a typical transaction, 15 separate rows get their Salary value changed.

That’s 15 fault-in triggers: 15 separate database queries (in addition to the original one, which brought in all the rows). That’s too much query overhead.

How to Avoid Fault-In

If you’re in one of these situations, there are some steps you can take to make fault-in less likely.

One option is to just go ahead and add ManagerId to the EmpSalary view object, so that all values for EmpSalary get queried up front. Sure, you might never need those values, in which case you’ve wasted a tiny bit of memory (the memory required to store the Salary for all those rows where it won’t be needed). But this is likely to be vastly preferable to either running a high risk of having to query an entire huge row, or running a high risk of having to issue lots of separate fault-in queries.

The other possibilities are less likely to help with the “lots of separate fault-in queries” issue, but they may be very useful in cutting down the amount of data retrieved by each fault-in query, if the issue is just that each row is large:

  • Remove unnecessary entity object attributes. In general, you don’t have to worry too much about entity object attributes mapped to table columns that your application doesn’t really need. As long as they don’t show up in a view object, they won’t get queried, and the mere existence of the attributes creates only the tiniest amount of overhead. But fault-in is a different case: Since all attributes get retrieved in a fault-in situation, it might be worth your time, if fault-in is likely, to prune your entity objects of attributes you know you’ll never need in your Java EE applications.
  • Break up big entity objects. Usually, the mapping between tables and entity objects is 0..1-1. But it doesn’t have to be. To avoid serious performance problems with fault-in, you might want to try representing a single table as multiple entity objects with 1-1 associations. You could thus group related attributes together, limiting the chance that unnecessary attributes will be queried during fault-in. You want to be careful here, because this may make your VO queries much slower (with multiple aliases of each table, joined 1-1 by primary key attributes, when a single alias will do), but in a pinch it’s a possibility.
  • Override EntityImpl.doSelect(). This method gets called when fault-in is triggered. If you want to implement something fancy here to determine which attributes should be queried, you can.

That’s enough for now. Next week, we’ll look at how you can tune up the associations between your entity objects.