Debriefing Checklist?

I had the great experience of being technical lead on a multiyear user interface modernization project for an in-house enterprise system. My involvement was as a contractor, so when the project was done, it was time for me move on and leave the system in the capable hands of the permanent team. The project included just a few developers, a business analyst, and an IT manager. Even with such a tight little team, at the end of the project a review was in order. This was especially true since the client had recently hired a new developer.

I don’t know if there is such thing as a project debriefing checklist. If not, there ought to be one. I won’t attempt to build a checklist here, but I will describe some of the pieces that we found were useful to review and document.

It took some effort to document all the important bits of the project. It turned out to be a lot! Once I had a first draft written, we arranged for meetings to review it. We worked from an outline, and spent a couple of hours each morning in front of a whiteboard until we got through it all.

Audience

Obviously for our project we wanted summary reference material for the new hire and for anybody else who might end up working on these systems. Perhaps less obvious is that documents like this are for “future me”. I don’t know if I will ever work on this system again, but I do know that future me will appreciate any documentation that present me provides. I know this because present me appreciates it when past me did.

There’s another potential audience. Management at any level, from management in the trenches through the board of directors, could easily find the project summary useful or enlightening. I don’t think anybody has bothered yet to scrutinize our work, but there’s always hope. We’re proud of what we accomplished.

Project goals and constraints

The goal of our project was straightforward. We had to replace a very old character user interface with a modern Windows user interface.

The constraints were more interesting though. Documenting the constraints is not about making excuses. An extremely common question in software maintenance is: Why did the original programmers do it this way? Digging through the code is often a slow and unproductive way to answer this kind of question. Understanding the system’s history makes it much easier to move the system forward.

Choice of platform, frameworks, and libraries

What alternatives were considered, and why did the chosen technologies win out? Again, understanding the constraints can be a huge benefit to those maintaining the system in the future.

In our case, we had a very specific technical constraint which prevented us from using any of the Object Relational Mapping libraries that were available at the time. We effectively had to roll our own. However, that constraint was only specific to interoperating with the legacy user interface, which was to run in parallel while we gradually migrated one business module at a time from the old to the new user interface. Now that the legacy user interface has been retired, the constraint no longer exists at all! There is no reason why any new tasks (new business modules) being added to the system could not use an off-the-shelf ORM library.

Our case might be a bit of an extreme example of why it’s useful to understand the project constraints. Keep in mind that knowing the historic constraints may help future maintainers in ways you don’t anticipate.

There’s another question to address here. Of the technologies selected for use in the project, were there any which didn’t work out as expected? One of the frameworks we selected for our project became less used and less important over time. As a result, there are two slightly different techniques used in the code, depending on how early in the project the code was written. The maintenance programmers deserve to know why.

Databases and application servers

New developers need to know where everything is. More importantly, they need to know *how much* there is, to help them with task estimates and impact analysis. In other words, they need an asset inventory. By the end of our project, the asset inventory looked very different than it did at the start of the project, so this certainly deserved a section in our document. How many databases are there? What are they used for? What servers are they running on? Where are the maintenance scripts? Who in the organization is responsible for the various development, test, training, and production databases? We addressed all the same questions with regard to the application servers as well.

Peripheral build systems

Our environment required an integrated library which had to be updated and compiled using tools outside our regular development environment. This was seldom changed and didn’t get a lot of attention, so it was an important bit for us to document. The know-how for changing and re-compiling that little library could easily have been lost over time.

Deployment system

We went over a few discussion points regarding the deployment system we used and tailored for this project. We already had a simple deployment checklist, so this was not a “how to”, but instead an overview of the tools we chose and a few of the quirks and subtleties.

Version control

Our document discussed which code control tools we chose and why, as well as whether those would still be our first choice today. We provided an overview of the code repositories in use, and the general work flow we had settled into.

The code

Our document provided an overview of the source code, both in terms of physical arrangement as well as logical. There were sections discussing how we made use of different architectural design patterns, how we were able to build on top of the existing legacy back-end systems, and the cases where we had to create our own libraries when no existing library would work in our environment. We had sections with summary descriptions of:

  • Error handling
  • Logging
  • Validation system
  • Help system
  • View types (grids, document details, record lookups, etc.)
  • Reporting
  • User preferences
  • User access control
  • Integration to third party applications
  • Wrap up

At the end of the review, everybody got a copy of the document after I updated it to reflect our discussions in the meetings.

A project’s completion does not always get the project review it deserves. It was a relief that the team agreed to invest the time required for this. I hope that the next time I’m about to dive into an unfamiliar system, a document like this is available to help me along the way.

Frogitecture

Ribbit

What goes into a chatbot architecture? Well, here’s a description of the one running Fraser the Bot.

User -> Instant message server -> fraserthebot.com

Fraser the Bot currently works with five different instant messaging applications. Each instant messaging app has its own API, and they all have to be configured with the URL where they are expected to send messages. The URL must be HTTPS. My servers are running on Amazon Web Services.

Load Balancer -> Core server

The first stop on AWS is at the load balancer. The load balancer hands the request (the message) off to one of Fraser’s core servers, each of which is a Linux virtual machine instance. There can be any number of core servers up and running as necessary to deal with increasing user load.

Core server

Fraser is using a Python web server framework called Tornado for handling requests. The web request handlers and Fraser’s core functions sit within a running Tornado instance. The core server functions are all written in Python.

Request properties

Each instant messaging app sends requests to its configured URL on fraserthebot.com, and each URL is handled by a different request handler function. Each of those request handlers has to parse the arguments out of the request (sometimes HTTP POST, sometimes HTTP GET). Usually the request body is JSON, and the request arguments always include an ID for the chat group, an ID for the individual sender (one user within the chat group), and the sender’s message text. Each instant messaging app has its own additional message properties, but those three properties are common to all.

Core server -> Database server

With the chat group ID and sender ID, the core server has enough information to call a PostgreSQL database server instance, which is also running on Amazon Web Services. This database query goes to a server side function, which deals with creating records for new chats and gathering chat session information from a few different tables. There may be many core servers, but there is only one database server.

Core server -> Network file system server

Most messages to Fraser require a bit of natural language processing. Each Wikipedia article has been processed and the data results from that processing stored on a network filesystem server on AWS. As much as possible, the article data is cached in memory in the core server, but when there is a cache miss, the data is fetched from the network file system. There may be many core servers, but there is only one network filesystem server.

Core server -> Instant message server

Each user message to Fraser may result in one or more instant messages being sent from Fraser to the user or chat group.

Core server -> Database server

Another call to the database server is made, to store the updated chat session information. This is how the chat session state (the game play state) is preserved from one message to the next. The state is stored in the one database server, rather than on the core servers. That way, the load balancer may send an individual player’s messages to an arbitrary core server and game play works as expected.

Asynchronous I/O

Calls from the core server to the database, to the network file system, and to the instant messaging server are all done in an asynchronous fashion. That is, the I/O call is launched without sitting and waiting for the response. The core server immediately gets back to work dealing with other requests. Once the I/O call finishes, the core server returns to the task which triggered the I/O call in the first place.

What’s missing?

For one, a message queue. Some of the instant message servers expect the bot server to respond within five seconds. Fraser’s server doesn’t take anywhere near that long to respond, but if things got bogged down, some instant message server requests might time out. A message queue would allow the core server to respond to the HTTP requests immediately by storing the message on the queue rather than processing it right then. What else? An in-memory database like Redis, running on another AWS server, would allow the core servers to fetch all the article data from that fast, additional layer of cache rather than from the network file server.

Good grief. There must be an easier way.

Certainly! Have a look at Microsoft’s Bot Framework. It has come a very long way since I started this project. In other words, if I were starting a new project today I might do things differently.  🙂