3-Tier 3/3: Implications of Changing the Application-Data Relationship
The previous post proposed a solution to the security issue detailed in the post before it.
This post looks at a number of implications of this change, in no particular order …
Can It Really Be Done??
In a word: yes. Work with which I was involved used this approach for a good decade before I retired.
At the same time, there is the “devil in the details”, as they say. Mainly it is changing the way systems are coded, which I touch on further below.
I have given presentations about some aspects of this at conferences and other organisations. These tended to be about implementing the strategy within a particular vendor’s database (Oracle) that I was using at the time. Perhaps the paper and presentation from one of those occasions provides further interesting insights. (As an aside, I discovered some years into it that my development team was calling this framework “Barry-cle” as a riff on the vendor’s name‽)
It was also nice to attend a conference subsequent to my presentation linked above and have someone tell me they had tried it at their site. And liked the results!
Out of curiosity, I’ve looked at adapting it to PostgreSQL. PostgreSQL has features that probably makes the implementation details simpler. I had implemented Oracle PL/SQL functionality for things that, I think, can become configuration settings in PostgreSQL.
ANSI SQL Security Model Flaw
One of the things this proposal highlights is a flaw in the way that privileges are assigned in Standard SQL: both the confidentiality (i.e., controlling SELECT
s) and the integrity (i.e., DELETE
s, INSERT
s,and UPDATE
s) privileges are GRANT
ed to users/ROLE
s.
We never want to trust people(/users) with individual DELETE
, INSERT
, and UPDATE
privileges!
Interestingly, Burroughs DMSII had a mechanism to limit selected integrity privileges by table (or “data set” in its parlance) to specific programs/routines as far back as the 1980s!
Database Organization
Management of database objects is key to making it work, so I offer some details about how I approached it.
The first thing to understand is that application code and its promotion process will impact the database. Any strategy that assumes developers stay out of the database needs to be rethought. The developers need to be given enough access and privilege to be able to develop and test the database-resident code while keeping them away from the data and its schema.
The simple, and obvious, way to do that is to split the data and the code into separate security domains (i.e., a different schema/security role for each). And then the privileges between them need to be well managed. A picture hopefully makes this clearer:
And some comments to help further:
Simple VIEWs are, essentially, based on
SELECT * FROM <table>
though I would recommend an explicit column list instead of the*
. Application code uses these for simple lookups, or multi-table SELECTs that are only used by a single piece of middle-tier code and hence not worth an “Application” VIEW. A very strong argument could made that these should also be served out of the Code Schema, and not out of the Data Schema. (Code outside the database should never access TABLEs directly.)Application VIEWs join multiple TABLEs in ways that are useful in multiple ways across the application code. Hence they belong in the application’s Code Schema along with the update routines (conforming to the requirements) and other functions an application opts to have in that space. Note that a VIEW can be thought of as a special type of AUTHID DEFINER function, with its parameters passed via the WHERE clause when it is called.
GRANTs from the Data Schema to the Code Schema should be as restricted as possible. This includes column-constrained INSERT and UPDATE GRANTs. An INSERT, for example, shouldn’t allow overriding of the DEFAULT on columns such as a PRIMARY KEY assigned from an “auto-incrementing sequence”, or an Insertion Timestamp that DEFAULTs to CURRENT_TIMESTAMP. (Oracle lacked column-constrained INSERT GRANTs when I was using it. To achieve something similar, I had a PL/SQL routine that generated pre-INSERT TRIGGER SQL statements to enforce setting of the DEFAULT value on fields expected to have that value at INSERT time. That TRIGGER would also set the Primary Key (PK) from a corresponding SEQUENCE since Oracle didn’t support that as part of the PK definition. Strict naming conventions are required to make that sort of thing work!)
DDL privileges available to a schema should also be as restrictive as possible. In particular, the Code Schema should have no CREATE TABLE privilege. If data needs to persist, then it must be in the Data Schema and under the custodianship of the data administrator.
The details will vary according to the database chosen. Things on which the framework relies may have different vendor implementations because they aren’t part of the SQL Standard or, perhaps, were implemented by a vendor before the Standard specified something different. For example:
Oracle does not have separate “user” and “schema” concepts, and “roles” are separate from “users”. A “schema” is an “user” that also owns database objects. My paper, referenced above, talks about creating a user account to own the objects, and then resorting to various things to lock that account down so that it becomes a “schema-only” account (i.e., can’t be used to log into the database). In research as part of writing this, I discovered that Oracle RDBMS 18c released in 2018 - ~15 years after my conference presentation, linked above, in 2003 - finally added the ability to create Schema-Only Accounts specifically.
In PostgreSQL, by contrast, “user” and “role” are the same thing and “schema” is something different. An “user” is a “role” with the LOGIN attribute enabled. An object has the attributes of both a “schema” in which it is exists and a “role” that owns it.
“Schema”, “user”, and “role” are pretty fundamental concepts when setting up a framework under this proposal. The differences in those concepts between Oracle and PostgreSQL, for example, will mean architectural differences in the way each might be used to implement this strategy. For example, when I implemented this in Oracle, I relied on AUTHID DEFINER PL/SQL procedures to manage some privileged actions that facilitated programmers’ needs. (It could be compared to providing Unix/Linux “sudo"-like access to otherwise risky functionality.) From my understanding of PostgreSQL, some of what that Oracle code did could instead be managed via ROLEs and the privileges they are GRANTed, amongst other things. And doing things by configuration is generally much safer than via coding.
Programming Frameworks
This is the biggest hurdle to change. Consider Jakarta Enterprise Edition (J2EE): it includes a framework for managing transactions in the middle tier, which becomes integral to the way code is written.
With our change, though, the transaction is controlled from - wholly resident in, in fact - the backend. So that needs to be worked out. My perspective is that the “programmers” evolved the “middle tier approach” to handling transactions without talking to “me,” the person responsible for the data. And it comes down to what’s more important: code, or data? If a company’s managers are given the choice of losing code versus losing data, they should opt to keep the data every time. It’s the recordkeeping, reflecting the state of the business at any point in time. It’s only by knowing the current state (e.g. how many parts are in stock) that allows a business to transact (e.g., parts can be supplied to fill an order). Code can always be rebuilt atop the data, and the company will have a chance of surviving. But rebuilding the data will probably be impossible, and the company will go out of business.
So if there’s a large “attack surface” over the data and that can be removed, I contend it’s in the interests of the business to adapt. Especially in today’s context of interconnected systems and threats from anywhere on the globe.
But there’s also another way to think about it that should be very appealing to programmers: the update routine in the database can be thought of as a “method” attached to its “object” (conceptually, the related rows in one or more tables). And modularity is all about the design pattern of attaching methods to the corresponding data.
(One day I’ll write about how normalization is for programmers too, refactoring is typically an application of Fourth Normal Form (4NF), and there’s a good reason Dr. Codd missed the connection of his ideas to the transient data of programs.)
See also the Performance and Reusability sections below.
Data Restructuring
Changing the database becomes more challenging as the amount of code against it grows. Especially important is to properly identify and remediate the code that maintains the data structures that are to change.
This assessment becomes much easier when we know all the update code is available in one, central location: the database.
It is particularly simple when the database catalogue exposes dependency tracking between objects, such as Oracle’s [All
|DBA
|User
]_Dependencies
catalogue view.
(PostgreSQL’s pg_depend
only seems to track procedure parameter dependencies, and not database object references within the procedure.)
Alas, such tracking will not identify references in dynamically-constructed SQLs. Separate “pattern matching” scans of the source code itself will be needed to find and evaluate those cases. Which is, perhaps, also an argument against allowing dynamic SQL!?
Auditability
Asserting correct configuration of the database components of the architecture is possible via the catalogue tables. This includes queries such as:
ensuring accounts with login privileges have only database privileges for SELECT on VIEWs and EXECUTE on stored procedures;
ensuring ROLEs attached to those accounts also only have those SELECT and EXECUTE database privileges;
ensuring DELETE, INSERT, and UPDATE privileges are only between a data schema and its corresponding code schema; and
ensuring INSERT and UPDATE privileges are based on only the columns that are appropriately settable in each context.
Other catalogue checks will become apparent as the architecture evolves.
In my experience, this transparency around the vastly improved data integrity framework finds favour with Information Security folk and Auditors. Once they understand it!
Performance
A typical update includes a number of SQL commands along with notifiying the START- and END-TRANSACTION boundaries. Being executed out of the middle tier and adding network overhead for each database statement adds a lot of overhead to the transaction. Regardless of the speed of the network connection.
An update should be faster with only one round-trip over the network to the database (i.e., calling the update function). Multiply that by the number of update transactions occurring and, even in a modestly busy system, it could represent significant savings of both computing resources and network traffic.
Any locks taken by the in-flight transaction, which may block access to resources by others, will also be held for a shorter period. Which increases concurrency. Especially if any of the locked resources are frequently touched by update, and perhaps even query, transactions.
And all of that, of course, means a faster response to the user. Not to mention supporting more users on the same resources.
Reusability
By moving an update procedure further “back” in the network, it becomes more reusable across applications running in different contexts: web services, phone apps, batch processes, …
And code quality is generally better when it is used more often.
Acknowledgements
Thank you to Anthony and Simon for feedback on earlier drafts of this 3-part post.
Thank you also to Johnetta and her team who took on my ideas early in my thinking, and worked with me to figure it out in practice.