Data Demythed

Debunking the "conventional wisdom" of data modelling, normalization, etc.

System Architecture

3-Tier 1/3: The Expanding Attack Surface of 3-Tier Architecture

Burroughs DMSII, Data Security, Oracle, Programming

When I started out in the IT industry (although we didn’t call it that at the time), systems were abstracted at the highest level as:

UserInterfaceApplicationCodeDataBaseRequest/ResponseCreate (INSERT)Read (SELECT)UPDATEDELETE
Generic 3-Tier Depiction

Different Tools and Technologies, Same Architecture

What I find interesting is that this has not changed even as the tools and technologies implementing it have gone through a number of iterations. Let’s put some labels on the diagram:

WebBrowserApplicationCodeDataBaseCRUDIBM 3270TerminalCICSCobolDB2CRUD
3-Tier: Proprietary (Left) Versus Open (Right) Technology

The version on the left represents the earlier days of computing, where picking a vendor (IBM in this case) largely dictated the tools used.

The current status represented on the right offers both a myriad of choices at every level of our abstracted arhictecture as well as a more complicated environment in which it operates. And that, I assert, should have resulted in changes to the way we architect our systems.

But first I want to explain my perspective of this evolution in more detail …

“Before”: Closed, Proprietary

As noted above, the left side has been labelled based on the selection of IBM as the proprietary platform of choice. Just from my experience, I could’ve annotated it with any of the following (among other things):

Vendor Proprietary Components
VendorTerminalMessage HandlerDatabase
BurroughsTD830, MT983GeMCoS, COMSDMSII
DECVT100, VT220ACMSOracle, RDB

In addition to the tools used, the way pieces connected was proprietary and hierarchical: the protocol used to connect terminals to computers (Burroughs BDLC, DEC RS-232, IBM SDLC) was different to that by which the computers talked to each other (Burroughs BNA, DEC DECNet, IBM SNA).

One consistency across them for me was Cobol, specifically COBOL-74. Indeed, the ANSI Standard itself was my COBOL reference for many years, spanning my work on the above systems. I would only refer to a vendor’s manual where the standard specified “implementation-defined.” And even the vendor manuals tended to be a customised version of the ANSI Standard (i.e., with the vendor’s implementation details inserted into the ANSI Standard text).

Interconnecting vendor platforms was challenging. I was part of some work that allowed Burroughs, DEC, and IBM to exchange messages and files. This used specialised (i.e., proprietary, expensive) hardware and software on the Burroughs and DEC machines that allowed them to “talk” the SNA protocols. We also layered a “home-grown” messaging system atop that which guaranteed delivery with no duplicates. It used a message format, in the 1980s, eerily similar to what is now known as JSON. It included a schema layer in which message formats were defined, and subsequently enforced at each end of the message’s source and destination. We also added our own implementation of the emerging public/private key mechanism to add a signature to the messages to guarantee their integrity in transit.

“After”: Open, Connected

In stark contrast to the closed nature of the earlier, proprietary networking, the Internet-based version to which we have evolved (on the right above) simplifies networking across vendor devices. At its foundation is the sheer simplicity of the Internet Protocol (IP) packet, which is basically the global address of the device sending the message, the global address of the device to receive it, and the message payload that will be processed by the receiver. This same IP protocol is used by every device - servers, desktop, laptops, mobile phones, and everything in-between - regardless of its role in the network. In other words, having a connection to the Internet opens up a potential “attack surface” of all devices attached to the Internet, across all equipment vendors. Let’s redraw our (very high level) diagram to reflect that:

IBM 3270TerminalCICSCobolDB2CRUDClosed,ProprietaryWebBrowserApplicationCodeDataBaseCRUDOpen,ConnectedInternetFirewall
3-Tier: Closed Versus Open Technology

Various mechanisms (firewalls, network address translation, private IP addresses, subnetting, …) can control access to devices, including(/especially) within an organisation’s internal network. But discovery of an implementation flaw will quickly result in devices globally being probed from, and potentially compromised by, a global array of sources. As an example, the device from which this page is served is still seeing probes from across the globe to port 5555, long after the 2018 discovery that Android devices were being shipped with that port erroneously enabled by default. And, of course, despite the device serving this page not being an Android device. Nor does it have a service on that port#.

Application Protocols

Another thing that I find interesting is how things are changing at the Application Layer, above the Transport Layer protocols (TCP, UDP, …) which themselves are layered atop IP. HTTP has evolved from a protocol for delivering HTML web pages to something capable of delivering a wide variety of payloads, from simple text to streaming media. The IETF HTTP Working Group even maintains a Best Current Practices (BCP56) document about it.

Amongst the separate, independent Transport Layer protocols (i.e., this is not intended to be exhaustive) that have been replaced by HTTP-based equivalents are:

Command Line Interface (CLI - telnet: port 23, rlogin, rsh, etc.: ports 512-514)
All of these are clear-text, and hence insecure, protocols. To be fair, they also predate the Internet as we know it today. Secure Shell (ssh: port 22) appeared in the 1990s as a secure-by-design alternative, and has replaced them. But a common reason for using a CLI is to manage and maintain a device’s configuration and services. And numerous web(/HTTP)-based server monitoring and control options have been developed aimed at doing that more easily.
Domain Name System (DNS: port 53, 853)
Even the fundamental Internet function of mapping hostnames to IP addresses has an HTTP-based option. Although there are legitimate (in my opinion) concerns about the way it works at the time of writing.
Email Reading (POP3: ports 110, 995; IMAP: ports 143, 993) and Sending (SMTP: ports 25, 465, 587)
Use of these typically requires an email application. But instead of installing one, a lot of people simply resort to browser-based “webmail”, using an HTTP-based service to read and send emails. The IMAP/POP3 and SMTP protocols have “disappeared” behind the web service although, to be fair and unlike many of the other protocols in this list, IMAP/POP3 and SMTP have not themselves been replaced by HTTP-based protocols. Yet?
File Transfer Protocol (FTP: ports 21, 22, 989, 990) with Archie
Sharing files, via both upload and download, was one of the earliest uses of the Internet. As FTP services proliferated, Archie was developed as a search engine across FTP services. Anonymous FTP simplified the download service, and was also easy to replace with an HTTP-based equivalent. WebDAV added a “file system” capability over HTTP, providing a somewhat-equivalent of FTP’s full upload/download functionality (although SFTP, file transfer over SSH and not related to FTP itself, is probably more widely used in this case). Web browsers originally supported both HTTP and FTP (among other protocols listed here), but the major browsers have now dropped FTP support.
Gopher(: port 70) with Jughead, Veronica
A Gopher client(/browser) allowed browsing through a hierarchy of text-based documents, and direct displaying of them with the client. Jughead allowed searching within a site, while Veronica searched across sites. Although a network of Gopher services still exist, HTTP has overwhelmingly replaced it, and the major web browsers have dropped support for it. (As an aside, I am amused - pleased, even - to see that, at the time of writing at least, the Australian Bureau of Meteorology still supports a Gopher-based service.)
Lightweight Directory Access Protocol (LDAP: ports 389, 636)
The need for “address books”, particularly in support of email services, yielded LDAP as another early Internet service. However the introduction of CardDAV, layered on WebDAV which itself is layered on HTTP, has taken away much of LDAP’s momentum in this role. (In larger organisations, LDAP may well be preferred because of its other capabilities, such as use for authentication and authorization management.)
Line Printer Daemon (LPR: port 515)
This protocol typically required configuration of a vendor’s proprietary printer drivers. Its complexity has been largely replaced by the Internet Printing Protocol (IPP: port 631), which uses the HTTP protocol. Unlike the other entries here, however, this is likely to be a dedicated, “special purpose” server running on the destination printer. But it is still part of the HTTP attack surface in an organisation.
Remote Procedure Call (RPC: port 111)
The need for programs to directly talk to each other across devices was recognised early in the evolution of networking. But the original RPC has largely been supplanted by a plethora of alternatives, many of which encapsulate the RPC payload inside an HTTP packet. Indeed, it was watching this start to grow that began my thinking, as documented here, around the turn of the millenium. Whether it was buried somewhere in the main web service, or moved out to other ports (like printing, in the previous item), I thought it represented a signicant escalation in the security risks. Time has only reaffirmed my belief as it has grown in scale and complexity.
Usenet and Network News Transfer Protocol (NNTP: ports 119, 433, 563)
This provided a space for global discussions to take place across a network of Usenet servers. A user would connect to an Usenet server using an Usenet newsreader, browse the topic hierarchies, and perhaps post something new or a response to someone else’s post. The Usenet servers would then “ripple” posts across the network to each other as the opportunity became available. At a time when Internet connectivity was less robust and bandwidth more limited, especially globally, this worked extremely well. But now this has been largely replaced by centralised, HTTP-based “forum” discussions easily accessed from anywhere.
Wide Area Information Service (WAIS) over Z39.50(: port 210)
This was based on the concept of searching library catalogue data as abstracted by the Z39.50 standard. That formalism adds a hurdle for people who “simply” want to put out their own information, without having to formally classify and categorise it. Of course, HTTP provides a space for that, supported by the web search engines that made it easier to find stuff via “brute force” text searching. Except that, ironically, search engines have evolved to look for “metadata” in a web page, argually similar to what cataloguing provided, on the assumption that it will better identify appropriate web pages in a search result.

This concentration of Internet services into a single Application Layer protocol (HTTP) puts a single, functionally-large Application Layer attack surface atop the already large attack service of global Internet addressing.

And it doesn’t matter whether it’s on port 80 (HTTP), 443 (HTTPS), or some other port discoverable by scanning tools. SSL/TLS secures Application Layer payloads transiting the Internet. It does not add any protection to the end-point services themselves.

It’s also unfortunate that we rely on payload encryption at the Application Layer level. It leaves each application protocol (see list above) to implement its own payload encryption mechanism. Many of the protocols eventually added it to their standard, generally using something akin to SSL/TLS. But HTTPS had the advantage of getting it early, quickly becoming supported in coding frameworks, and (as mentioned earlier) being adaptable to a wide variety of payloads.

But more significantly, it means packet inspection as a defensive security strategy has to be part of the Application-level infrastructure since that is where the packet contents become visible (i.e., are decrypted). And, as discussed next, that Application-level infrastructure has also become a very complex part of the architecture.

If IPSec was used more widely, and even IPv6 made it optional unfortunately, perhaps the separate protocols for different types of services might have survived because they would have been secure. Instead of being part of the complexity of the Application Layer, defensive packet inspection could then have been close to the packet’s entry into the device as the payload is decrypted at the IP level before moving up the communciation stack.

By providing payload encryption at the IP level, IPSec protects all Transport Layer (TCP, UDP, …) packets from snooping in transit. And hence protects all Application Layer packets (HTTP, DNS, SMTP, IMAP, …) embedded within those. It potentially simplifies the Application Layer by giving that layer a secure network protocol without any application coding for it … although there are obviously reasons why an Application Layer payload encryption mechanism might still be warranted in addition (e.g. protecting personal information even after it arrives in the device).

Application Code

While connectivity has become simpler and global, it’s quite the reverse for application code. The technology choices are not simple and obvious.

Application Languages
There are now numerous language choices. And COBOL is unlikely to be one of them. The choices span the range from compiled to interpreted. And, to complicate things further, an application will likely use a mix of languages.
Application Language Ecosystem
It is no longer sufficient to simply learn a programming language. Language proficiency now also requires using supporting libraries that provide additional functionality. The more popular the language, the larger and broader the array of functionality to discover and learn. And functionality between libraries may overlap, adding complexity to design decisions.
Application Language Frameworks
Methodologies, or “frameworks”, in how a particular language is used have developed and evolved over time. The number of such methodology choices for any language is a function of its age and popularity. A search for “<language> framework” will find “top 10” lists for at least the more popular <language> choices. Frameworks compete even while being based on the same underlying design patterns, with Model-View-Controller (MVC) a common one. Framework choice - including “none” - will affect factors such as development tools (again, with multiple choices), productivity, training needs, team organisation, infrastructure, and the choices of related technologies such as …
Application Servers
Similar to the proprietary days (with an obvious, vendor-supplied choice for the message handler), the early days of the Internet had few options for a web server. Apache quickly established itself as the obvious choice. As web services evolved from simple publishing to application serving, complexity grew from the addition of both features and alternatives. Initially much of it was centred on Java as that looked like becoming the concensus language choice for web-based applications. Java-specific application servers (e.g., Tomcat, JBoss) followed. But time has brought many more choices reflecting in part, perhaps, the range of languages and/or frameworks now used to implement “web services.”
Perhaps the earliest examples in this category (chroot?) were for testing changes in isolation. Operating system mechanisms (e.g., Sun’s zones, or kvm in Linux) provided stronger security and isolation mechanisms. Hypervisors allowed a single device to run multiple, potentially different operating systems in isolation from one another as Virtual Private Servers. As we moved into server farms and then cloud computing, application code was further modularised (e.g., Docker, microservices) to allow its pieces to “float around” to where resources were available to execute it.
Browser Code
Proprietary terminal programmability was limited, supporting things like colour and other highlighting, or defining input fields within the screen layout. Browsers, on the other hand, now have rich capabilities for directly supporting an user’s interaction with an application. JavaScript/ECMAScript are traditional coding languages that can be sent to the browser for execution as part of the application’s user interface to build responsiveness into a web page. HTML and CSS have also evolved to support a level of interactivity by configuration (i.e., without having to code it). Reformatting of the pages on this site for larger versus smaller screens, the use of drop-down menus, and adapting the content format for printing - all via CSS alone - is an example.
Non-Application Components
In addition to the application code and the way it is assembled and served, there are many other pieces mixed in these days. Instrumentation to measure performance and watch for security issues is essential. As mentioned above, checking the payload packet’s integrity before handing it off to the application code will(/should) also be part of this layer given the current state of the technology.

All of this means that if … or, surely more accurately, when someone finds a way into the system(s), they will also find a richly complex space in which to hide their presence.

In Summary

We have largely evolved the architecture from closed, proprietary systems into systems using a myriad of technology pieces to support a large range of functionality. That evolution has not changed the relationship between the application architecture tiers.

Given the increased complexity, unidentified flaws are much more likely.

And it is connected to a global network.

Has that been appropriate?