Friday, December 14, 2007

Open architecture, open organizations

At first sight, these two topics seem totally unrelated. After all, isn't organizational openness a question of corporate strategy and responsibility of all these suits throwin buzzwords around? Isn't architecture the product of long-haired dudes with poor personal hygiene and little interest in anything else but stuff like exactly how clock signal propagates in a microchip. Let me explain.

Back in the days of Estonian Tax and Customs Board, we were gathered for a management meeting. One of the topics was whether or not (and if, then on what terms) should we grant another state agency a right to make a certain query towards our system. After a lengthy discussion it was decided that as the information was not really sensitive and didn't also fall under data privacy laws we should do it. The meeting progressed with other topics while I pinged one of our developers on MSN (didn't know Skype back then). He took a small query our own systems used, mapped it to WSDL, deployed the result into the x-tee infrastructure we had already in place, run some tests and was done. By the time it was time for meeting minutes I could report of the solution being live.

Why was the decision-making much harder than the actual development? Because, by nature, a tax organization is a closed one. It deals with data that absolutely needs to be kept private and runs systems that have to be shut away behind as many firewalls as feasible. It's core is secrecy, it's culture closed. Thus, it is naturally reluctant to move towards an open architecture even if it is technically relatively easy to build.

This observation aligns nicely with the holistic organizational view discussed in earlier posts. Organizational culture is part of the organizational architecture that is dependent on the technical architecture (and vice versa of course). Based on that model I'd claim that there is no way to build a really open architecture without having a really open culture in the organization.

Why is all this important? The other day I ran through another team behavior training and heard myself state once again that "open communication and more generally, the spirit of openness and trust, is the key to successful teamwork". Then it all clicked to place. You see, the greasy-haired guys are as much part of the whole organization as the people with ties. Neither will probably admit it, but they share the same culture, the same fundamental values and beliefs about how that organization should be run.

An example. When you build an API that is open for people to use, you take responsibility. Add an element to the payload XML document and all AXIS-based clients will break. Only an outwardly person, a person that cares, will voluntarily take responsibilities like this, the same applies to organizations. And even when there's some sort of drive to build that API and make it work, it will not yield the expected results. Documentation for developers? That's secondary. Support? Do we need to talk to strangers? Access? You sure you are not going to flood us with requests? Cooperation? You must be after our client base.

So an organization with a closed and inward-oriented culture is unlikely to have an open (and thus extensible) architecture.

Turns out the relationship is actually two-ways. Cultures can be changed and deliberately taking on those responsibilities, building an architecture that fosters openness and trust is a great tool for that. Why do you think Amazon opened up their queue APIs and does application hosting. In terms of business it must be minuscule. Only an organization with as open atmosphere as Google is able to create and support the amount of APIs they have. You can't really do any mashups if your basic assumption is that everybody out there is going to hurt you. Of course, there are organizations that are closed by nature (all sorts of financial institutions, certain state agencies and so forth) but the link between dominant organizational culture and qualities of the architecture in use is to be managed explicitly.

The conclusion? An organization needs to be open and outwardly on all levels to be successful. And you can't tell the IT organization to start "doing APIs". It takes a wee bit more than that.

Monday, June 4, 2007

The Big Picture

Chances are that at some point in your life as an architect you have come across UML. Maybe even had the dubious pleasure of working with a humongous all-problems-solved tool like Rational Rose. In which case you probably have suffered a life-long allergy towards all things UML and are now doing all of your designs with pen and paper.

Anyway, at some point I stumbled across Enterprise Architect, a neat UML tool with fairly low pretensions and was instantly hooked. It looked fairly neat and allowed you to throw together nice UML-compliant diagrams and even share them with your team without locking you into some sort of bizarre development process. Surely, it has major usability niggles, is unstable at times, its reporting module is a joke (why on earth can't you just provide proper schemas for your models so people can draw their own reports?), it runs awfully slow on remote databases and is Windows-only. However, it was wieldy enough to provide for years of good service.

Now and then, people would come to me to start an argument over UML-based design and how it can lead to endless tinkering with details that would be best solved by actually writing some code. And how useless a model is if it's not derived from the code and that code generation just does not work. I used to counter them with the statement that I was just drawing pictures and EA was providing me a consistent and hopefully universally comprehensible way to do so.
Until recently.

When I first started at Skype I took upon myself to put down everything I learn about its systems. And, being a devoted EA user, it was clear what would be the tool to do so. At the beginning it was pretty cool: the boxes started to pile up, dependencies appeared and the picture was there. However, as the time passed (a month or so) I would work less and less with that diagram. It seemed that nobody seemed to have enough detail on the connections and even worth, the details seemed to change daily. Also, the amount of components and their connections grew to a point that the whole thing resembled a modern art piece rather than a clear map of "what we have".

So I gave up. Afterwards, when somebody (usually a neophyte) approached me and asked for "the Big Picture", I would sometimes make another stab at it but would soon give up. So there it lay, dormant and outdated.

Until one of our team leads approached me with a very clear request. He had an off-site coming and needed presentation material for new members of his team. This time I decided to take a different route. I took Omni Graffle (an excellent Mac diagram tool recommended by Dan) and just drew away. A round-edged and shadowed box here, another there. Some arrows, coloring and there it was. Sure, it did not have all the details as you would need to be able to still grasp it, so I had to leave some out (Still ended up with a densely populated A2 sheet). Sure, it was not UML because sometimes an arrow was a SOAP call, sometimes meant that a module was linked in and sometimes just meant a database usage. And most of our database or queuing infrastructure was not pictured at all as it is in constant change and our DBAs generate their own diagrams from its configuration. After some minor correction (had forgotten some components) the picture was ready, did its first outing on that off-site and serves its purpose nicely.

So how come I was not able to do something in two years using a professional-grade UML tool but could cook up a useful diagram within hours using a dead-simple diagram drawer? The answer is simple: there's a tool for every purpose and UML is just not good for maintaining a high-level view. It's just too detailed. It's not very good for maintaining nitty-gritty details either as developers usually know much better how long and of what type a particular field should be, but that's another story.

The lesson learned from all this is to use the right tool for the task and if you have a really good and handy hammer, most things start to look like nails.

Friday, May 18, 2007

On fragmented layers

In a previous post I described a layered approach to an organization. This time I'd like to extend the model a little and give some areas where it might be a useful decision-making tool. You can see a figure of the stack on the left. The point of the model is that different layers deal with different aspects of how an organization is built and that they are highly interdependent with changes in any of them causing a cascade of changes both upwards and down. It's also worth noting that, almost by definition, all projects impact all of the layers. Usually new pieces are added to the ends so somebody needs to make sure that the picture stays consistent, new pieces fit with existing stuff and do not cause discrepancies with others. All of this is pretty straightforward for the technical architecture but is often disregarded for the other aspects of a business.

How would one use the model in real life? One useful application I have found is explaining people why they should consider other things (like new processes or even teams) besides functionality when they are setting up a project. It also helps to visualize responsibilities (who deals with the functional architecture in your company?).

One use I'd like to focus a little more on, is the fragmentation aspect of the model. In short, the message goes: if you are to crack a layer, you better align it with cracks in neighboring ones.

Consider, for example, a scenario when you have two web applications supported by two different business organizations on two different continents. Which means there's a division in both business and organizational layers. Of course, the crack runs all the way and the applications are not integrated in any way. Now what if somebody up in the management decides, very sensibly, that it really sucks that customers would need to go to two different stores to get their SkypeOut minutes and headsets. Makes perfect sense and just making two systems talk to each other is not a fundamental obstacle. However, could you imagine two teams 4 timezones apart sharing responsibility for what the same piece of code does (i.e. integrating the technical architecture)? Or could you imagine an actual purchase flow (functional architecture) where you buy a SkypeIN number with all of it's details and finesse of all the legal requirements we have there and at the same time compare 4 headsets? Quite difficult, isn't it? Of course all of this could be done actually, but just linking the infrastructure without thinking of the organizational (how is responsibility shared among the teams?), functional (how do the different purchase flows fit together?), business (what about revenues and, say, marketing costs of the banners in the store?) or support (do our, say, release cycles need to be synchronized with the ones of our partner) dependencies are handled makes little sense.

Monday, May 7, 2007

Time to review

There has been yet another case of rebellion in wonderland recently. Basically, a design decision was challenged long after it had been made and also implemented. A year ago, we had designed (and implemented) a system that had a substantial influence to some billing and destination resolution logic in our calling infrastructure. As this coincided with some other changes in the same modules a decision was made to start separating that logic from the actual signaling logic as the later is highly stable (and needs to be very robust) while the former is liable to change much more often.

At the same time, our developers sought to standardize communication (and load balancing, redundancy, configuration management etc. issues that come with it) between separately deployed components and stuck with ICE. So, keen to play around with the new technology, they conducted some tests and a decision was made to use it for the newly created lump of business logic.

Historically, most of our business logic has resided within our databases. Not a bad decision at all given the horizontal and vertical splitting technology plus Postgres know-how we have in-house. However, this also meant that most of the knowledge of billing and routing internals resided with people who knew databases and were not about to start writing C++ code overnight, especially when it usually did not make any sense to ship data to a remote component for decision-making.

As a result, we ended up with a fairly slim layer of logic between the calling infrastructure and the database that, at the first glimpse, did very little but call a bunch of stored procedures. Of course, come time to deploy the thing, our operations people came asking why the heck they needed to support (and make highly available) an additional component that didn't add any value at all. Which was the rebellion at the beginning of the story.

So we discussed. And ended up with an understanding that in terms of design, developers still find value in that layer as the decision _which_ procedures to call is quite significant. Also, the data structures that get passed between it and the calling infrastructure are complex and it would be unwise to build serialization into flat structures required by the database into all of the calling systems. Some of the supportability concerns (but not all) the ops guys had could actually be solved quite easily, too. No major change in the architecture, then.

The reason I'm writing about this event is that there are several very important conclusions to draw from this event
  • Your architectural decisions should take into account the organization you operate with. In some other situation, the very idea of moving logic from a middleware layer to a database would have been pure lunacy (most of the organizations struggle to do the opposite) but given the stuff our DBAs pull of on a regular basis it's not that bad
  • Challenges are valuable, regular ones are even better. No design decision should be cast to stone, no concept should be considered OK only because "this is the way things are done". Although, most of the cases you still end up retaining the original idea, sometimes you don't. And this is where architectural evolution happens
  • Work closely with the operations people. They provide very good reality check. Helps with deployment griefs, too

Friday, April 20, 2007

On re-use

Here's a situation for you. You have an information system that runs a substantial internet-based service. That service has web-based interfaces to it for billing and self-service purposes. Now, imagine, that there is a need to branch out, to build the services into different channels. For example, have one application that can fit into the client, one that works on regular browsers, one that is targeted with people with bad eyesight and one that works on handheld devices.

So the question is: how mych of the original web offering can you re-use? It is obvious, that all of the business logic like billing, core provisioning and customer management services need to be re-used, that's a no-brainer. But what about the application and presentation layers? The presentation layer is also quite trivial as the various new versions are meant for different devices and audiences they clearly need a different way to manifest themselves.

The application logic is the one that will cause you trouble. Let's stop pretending that we are talking about a random company here and admit that this is Skype. For example, the purchase flow of purchasing a SkypeIn number, is a very complex beast with tailored integration points for different countries, various ways to pick out the numbers etc. So how would one re-use that?

One way to do it was to use the same application logic everywhere and just re-skin it into WAP or slightly more compact HTML for various distribution channels. This, however, does not work, because different devices tend to have different requirements towards the page flow (which is a manifestation of the application logic), too. For WAP for example, you probably want to generate most of the pages into a deck from one requrest so you don't have to go back to the server every time user clicks "next". For the client-based version you probably want to ignore several corner cases and make the flow a couple of pages shorter. And so forth.

The other would be to re-write the whole thing and build a semi-intelligent fourth tier in place that can handle the workflow, decide when and how to talk to integration partners, give out number pools etc. This would work, but it adds additional layer of complexity, the existing stuff has to be re-written (re-writing something for technical reasons is always a bad idea) and there is no guarantee that this thing would actually contain any useful logic after you are done. Close, but no cigar.

Instead of these options, I'd say just take a deep breath and do not re-use. Lot's of people will not go "no, no, no! You will get yourself into a world of pain every time a common piece of logic changes as you need to go and make a change in all of those flows and you will surely forget something". In my mind, it is much worse to make a major business logic change without explicitly going over all the places that use it. And if you are already doing that, you might as well implement the change right there. Application logic is a combination of delivery channel requirements and business logic so any change to the latter is going to have an impact on the application logic that is specific to the channel. Meaning, that you would most probably go and tweak all the flows anyway. And if you forget any, you will be in trouble either way.

In summary: do not break your head about re-use. In case of the application logic, you sometimes need to gather your courage and not re-use at all.

Friday, April 6, 2007

Metaphor: drawing a horse

I found myself amid of a major discussion the other day. When defining the scope of a project, should one also state things that are _out_ of the scope? Some people said that this would lead to wishful thinking and describing the whole world in the "out of scope" section. The others said that everything _in_ scope could never be described in adequate detail anyway so describing the outer world is the only way to go. So I got into thinking (considering I also had to present on the issue), ended up with an analogy:

Say you wanted to draw a horse and did something like this (art lovers, look away now):
It ain't pretty, isn't it? One could also use a different approach:

Isn't going to win any prizes also. But combine the two:

And you get something that is still ugly but at least gives a sort of holistic view of the animal. So the point is that you need both the inside-out and outside-in perspectives on a project to get a sufficient understanding of its scope.

Wednesday, March 28, 2007

Go or nogo?

Whenever people make decisions on whether to go ahead with a project, they think if it makes business sense. Of course they do. What happens much more rarely is people thinking about how much sense it makes in terms of other architecture layers. The organizational architecture gets some attention but below that, it's all forgotten.

However, most of project/product failures come from the new stuff not fitting with the existing things in terms of functional, technical or support architecture. Of course, discrepancies there will mean changes to the original plan and can ultimately translated into dollars that end up in the business case, but wouldn't it be much cheaper to consider this in the beginning rather than all the way through the project?

This issue is also tied to the "10-minute-crap" problem: whenever people come up with an idea that is really easy to implement and makes some business sense they go "well, it does not bring massive business benefits but it's easy to do so let's just build it". Wrong. Because this is how you end up distorting all underlaying layers. Why would one do something stupid just because it takes very little effort?

To recap: go/no go decisions should be made by considering all the aspects of the project, not just the business top line.

Thursday, March 22, 2007

Aspects of an Architect

My friend and colleague Sergei made a comment the other day after listening to a job interview that an architect should really have either some managerial experience or possess the potential to become one.

This is entirely true, I think. Being a leader or having the potential to become one would be even better. Drawing UML diagrams of Things to Be is just a tiny piece of the work an architect has to do. He or she also has to make sure people actually understand the drawings, has to tackle resistance in build or buy situations, guarantee preservation of knowledge if the project is put on hold or team members change and so on. All of this requires some managerial pedigree to be successful at. The thing is, that, believe it or not, the organizations are made of people and the computers are just there to help.

So next time at a job interview when you are asked how to choose between two major technologies, you better mention the skills of people around you in the answer.

Tuesday, March 20, 2007

Metaphor: Dripping water

It is said that the best way for a leader to communicate his or her vision, values and beliefs is through telling stories. This is also true for architects who, being engineers and pragmatic and all, tend to cut the stories short and turn them into metaphors that help to explain complex concepts to fellow engineers and also customers. XP is big on metaphors and I have found them useful, too. This is why this series is created: every now and then I stumble upon one that seems to work and convey an idea particularly well. Which means that it just might be worth sharing. Let's see how many of those I can come up with.

We had a team at Skype that struggled with incoming feature requests. They were short on resources and their importance in the business was growing rapidly. So everybody was busy throwing business ideas, feature requests, project proposals, bug reports at them to a point where most of the key people spent most of their time reviewing, estimating and designing solutions for them. Which, clearly, had to stop. This is not an uncommon problem and the obvious solution was to introduce a quantization mechanism that would allow to split the constant flow into manageable chunks. the only problem was that the customer did not want to wait, they wanted their thing to happen yesterday. Enter the Dripping Water Torture metaphor. Everybody has probably heard that at medieval times people were tortured with water dripping on their sculls for lengthy periods of time. Even the Mythbusers did an episode on it. This is how it feels being constantly bombarded with all sorts of functional requests.
Now imagine that instead of water dripping on your forehead and slowly driving you bonkers, a man with would turn up once a day and threw a bucket of cold water at you. Unpleasant as that might be, it's not much of a torture. The message was heard loud and clear and now there is at least an understanding of what we are trying to solve with the recent process changes.

Sunday, March 18, 2007

Book: Skunk Works

The other day I got my hands on the copy of "Skunk Works" by Ben R. Rich and Leo Janos (thanks, Lauri!) that describes the internals of a unit of Lockheed that was (and still is) focused on developing high-tech weaponry for various parts of US military. Just breathed it in and have probably already bored everybody to death with quotes from it. This is because the book is just fantastic not only because it describes how some people get to build their own hundred-million-dollar gadgets and send them flying over remote parts of Russia but because it shows how to set up, maintain and sell a extremely efficient team of highly skilled people that spit out a constant flow of staggering innovation. Let's take a look at some of the main points the book makes:

Get isolated, concentrated and focused
Kelly Johnson, the legendary leader and founder of Skunk Works, was an engineer. He was also an autocratic and charismatic leader. The former gave him a very pragmatic mind and the latter the ability to enforce his vision of a perfect engineering environment. In addition, the unit was forced to work under heavy secrecy which lead to physical separation from the rest of the Lockheed organization and a very high clearance overhead for every person added to the team. Also, the premises allocated for the team were spartan to say the least as they were yet to prove its worth. All of this combined led to a very small intimately integrated team following a clear vision in a complete isolation from corporate inertia. Which, in my mind, is the receipt for a perfect task force. I think most of the Skype core functionality is written this way (of course, the guys did not have a large organization to separate from in the beginning but that has all changed now)

Get down and dirty
Designers of the aircraft worked as close as possible to the people actually assembling them. Which meant that any part not fitting, any design flaw and any change could be implemented immediately and both parties got instant feedback from each other. This is the way a system architect should work.

2/3HBS=BS
The author describes how he went to his boss, Kelly, to ask for a recommendation for the Harvard Business School. Kelly says:

... You don't need Harvard to teach you that it is more important to listen than to talk. You can get straight A's from all you Harvard profs, but you'll never make the grade unless you are decisive: even a timely wrong decision is better than no decision. The final thing you'll need to know is don't halfheartedly wound problems - kill them dead. That's all there is to it.
After graduating, Kelly asks Ben for his appraisal of the study and Ben writes down the above equation. Until now I thought that I was the only one finding that my MBA studies did little but confirm what I already had learned from experience and it is good to know much smarter people have got a similar impression from much better schools.

The rules.
The fourteen basic rules Kelly wrote down for the cooperation between Skunk Works and the military. They are too long to quote here but they encapsulate a perfect relationship between a technology contractor and its customer.

Always take two steps
After the U2 had flown over the USSR unscratched for about a year it was clear to Kelly Johnson that eventually the soviets will find a way to shoot it down (it eventually happened three years later) and that whatever improvements were made to it, the hostiles would, again, in a couple of years figure out countermeasures. Ergo, the only sensible thing to do is to take _two_ steps at a time and not one. So he went on and initiated what eventually became the Blackbird. That was 2030s technology in the 1960s. The point is, that you should really think big and not let yourself be hindered what is thought to be impossible.

All in all, the book was a true revelation: I have seen too many management books that are just full of crap. This one says it all on a couple of pages and fills the rest with examples of how it all was applied to create technology that some 40 years later is still to be surpassed. Read it.

Goodbye and welcome!

It's time I got serious. Well, sort of. This means that from now on, my serious thoughts and more thorough writings on software architecture and management in general go into a separate blog called Human Architecture and the rest of my passions and moans will stay at The Place for BelZaah. Let's just say that posts going "I had a crappy day" and "A software architect at Skype thinks this or that" really should go to two different places.

About the name. I used to be a technology-oriented person. In recent years, however, I have grown to understand that technology itself does not do anything, it always acts in the context of an organization and its people. Hence, system architecture should always be considered in relation to the people who implement, use and live with it.