Monday, January 7, 2008

On wars and focus

Just read a very interesting article in Wired about how the situation in Iraq can be traced back in neglecting certain aspects of the war and over-emphasizing others. What the author says is that US developed and deployed an extremely good technical solution for efficient killing only to find out that killing the opponent is the easy part of that particular war.

So why is that article, filed under politics slash security in Wired, important in this context? It is actually relevant in two ways.

Firstly, it illustrates very nicely how very different social networks work and how important they are. Apparently, westerners whose social networks have been rapidly transforming in the last ten years, are almost incapable of grasping the society that relies heavily on the concept of belonging to the exactly right branch of the right religion, belonging to a tribe, village etc.

Secondly, what US Army did is something that happens frequently everywhere all over the world. Companies and groups of people focus on a small part of the problem as opposed to the whole issue at hand. They do so because
  • Other aspects of the problem are surrounded by the SEP-Field. Getting a volleyball net to a village can't possibly be the job of the Army, can it? Whether or not the software we build actually fits the real business process and functional concepts can't be a problem for an engineer, can it? This, of course, does not mean one should be running around sticking their nose into other people's business. Just gently notifying that you see a problem is usually OK
  • People love to deal with stuff they know well and tend to subconsciously push back stuff they are not sure how to handle. This is the reason there are a ton of good ideas out there that are either implemented in a horrible way, lack any marketing or are not productized in any way: engineers come up with an impossibly cool technical solution but forget there is more to a good product than just a slinky way of sending packets around
  • People are static. If the world has worked a certain way "for ever" there is no reason it should be behaving oterwise all of the sudden. It really does not work this way. The game has moved on and the stuff that used to take care of the problem just fine (i.e. killing off all the people who do not comply) now addresses only one aspect of the issue. The problem has grown arms and legs.

The same phenomenon of dangerously narrow focus is related to the TMP-mentality (Technology-Money-People) that states that every problem can be solved by throwing either technology, more money or more people at it. The core problem, of course, does not go anywhere. Only in that case the focus is so narrow it does not even exist. "No, We Need a Neural Network" is a prime example of this.

Of course, going too far the other way and attempt to stop all disease to fix your aching throat but that is also dangerous. So you need to be very careful in defining what exactly is the problem that needs to be fixed.

Are we doomed? Can nothing be done? Relax, all is not lost. As long as somebody (it could be you!) keeps these tendencies in mind, tries to look at the thing as a whole and is able to voice the concerns in a way that gets the point across there is still hope.

Friday, December 14, 2007

Open architecture, open organizations

At first sight, these two topics seem totally unrelated. After all, isn't organizational openness a question of corporate strategy and responsibility of all these suits throwin buzzwords around? Isn't architecture the product of long-haired dudes with poor personal hygiene and little interest in anything else but stuff like exactly how clock signal propagates in a microchip. Let me explain.

Back in the days of Estonian Tax and Customs Board, we were gathered for a management meeting. One of the topics was whether or not (and if, then on what terms) should we grant another state agency a right to make a certain query towards our system. After a lengthy discussion it was decided that as the information was not really sensitive and didn't also fall under data privacy laws we should do it. The meeting progressed with other topics while I pinged one of our developers on MSN (didn't know Skype back then). He took a small query our own systems used, mapped it to WSDL, deployed the result into the x-tee infrastructure we had already in place, run some tests and was done. By the time it was time for meeting minutes I could report of the solution being live.

Why was the decision-making much harder than the actual development? Because, by nature, a tax organization is a closed one. It deals with data that absolutely needs to be kept private and runs systems that have to be shut away behind as many firewalls as feasible. It's core is secrecy, it's culture closed. Thus, it is naturally reluctant to move towards an open architecture even if it is technically relatively easy to build.

This observation aligns nicely with the holistic organizational view discussed in earlier posts. Organizational culture is part of the organizational architecture that is dependent on the technical architecture (and vice versa of course). Based on that model I'd claim that there is no way to build a really open architecture without having a really open culture in the organization.

Why is all this important? The other day I ran through another team behavior training and heard myself state once again that "open communication and more generally, the spirit of openness and trust, is the key to successful teamwork". Then it all clicked to place. You see, the greasy-haired guys are as much part of the whole organization as the people with ties. Neither will probably admit it, but they share the same culture, the same fundamental values and beliefs about how that organization should be run.

An example. When you build an API that is open for people to use, you take responsibility. Add an element to the payload XML document and all AXIS-based clients will break. Only an outwardly person, a person that cares, will voluntarily take responsibilities like this, the same applies to organizations. And even when there's some sort of drive to build that API and make it work, it will not yield the expected results. Documentation for developers? That's secondary. Support? Do we need to talk to strangers? Access? You sure you are not going to flood us with requests? Cooperation? You must be after our client base.

So an organization with a closed and inward-oriented culture is unlikely to have an open (and thus extensible) architecture.

Turns out the relationship is actually two-ways. Cultures can be changed and deliberately taking on those responsibilities, building an architecture that fosters openness and trust is a great tool for that. Why do you think Amazon opened up their queue APIs and does application hosting. In terms of business it must be minuscule. Only an organization with as open atmosphere as Google is able to create and support the amount of APIs they have. You can't really do any mashups if your basic assumption is that everybody out there is going to hurt you. Of course, there are organizations that are closed by nature (all sorts of financial institutions, certain state agencies and so forth) but the link between dominant organizational culture and qualities of the architecture in use is to be managed explicitly.

The conclusion? An organization needs to be open and outwardly on all levels to be successful. And you can't tell the IT organization to start "doing APIs". It takes a wee bit more than that.

Monday, June 4, 2007

The Big Picture

Chances are that at some point in your life as an architect you have come across UML. Maybe even had the dubious pleasure of working with a humongous all-problems-solved tool like Rational Rose. In which case you probably have suffered a life-long allergy towards all things UML and are now doing all of your designs with pen and paper.

Anyway, at some point I stumbled across Enterprise Architect, a neat UML tool with fairly low pretensions and was instantly hooked. It looked fairly neat and allowed you to throw together nice UML-compliant diagrams and even share them with your team without locking you into some sort of bizarre development process. Surely, it has major usability niggles, is unstable at times, its reporting module is a joke (why on earth can't you just provide proper schemas for your models so people can draw their own reports?), it runs awfully slow on remote databases and is Windows-only. However, it was wieldy enough to provide for years of good service.

Now and then, people would come to me to start an argument over UML-based design and how it can lead to endless tinkering with details that would be best solved by actually writing some code. And how useless a model is if it's not derived from the code and that code generation just does not work. I used to counter them with the statement that I was just drawing pictures and EA was providing me a consistent and hopefully universally comprehensible way to do so.
Until recently.

When I first started at Skype I took upon myself to put down everything I learn about its systems. And, being a devoted EA user, it was clear what would be the tool to do so. At the beginning it was pretty cool: the boxes started to pile up, dependencies appeared and the picture was there. However, as the time passed (a month or so) I would work less and less with that diagram. It seemed that nobody seemed to have enough detail on the connections and even worth, the details seemed to change daily. Also, the amount of components and their connections grew to a point that the whole thing resembled a modern art piece rather than a clear map of "what we have".

So I gave up. Afterwards, when somebody (usually a neophyte) approached me and asked for "the Big Picture", I would sometimes make another stab at it but would soon give up. So there it lay, dormant and outdated.

Until one of our team leads approached me with a very clear request. He had an off-site coming and needed presentation material for new members of his team. This time I decided to take a different route. I took Omni Graffle (an excellent Mac diagram tool recommended by Dan) and just drew away. A round-edged and shadowed box here, another there. Some arrows, coloring and there it was. Sure, it did not have all the details as you would need to be able to still grasp it, so I had to leave some out (Still ended up with a densely populated A2 sheet). Sure, it was not UML because sometimes an arrow was a SOAP call, sometimes meant that a module was linked in and sometimes just meant a database usage. And most of our database or queuing infrastructure was not pictured at all as it is in constant change and our DBAs generate their own diagrams from its configuration. After some minor correction (had forgotten some components) the picture was ready, did its first outing on that off-site and serves its purpose nicely.

So how come I was not able to do something in two years using a professional-grade UML tool but could cook up a useful diagram within hours using a dead-simple diagram drawer? The answer is simple: there's a tool for every purpose and UML is just not good for maintaining a high-level view. It's just too detailed. It's not very good for maintaining nitty-gritty details either as developers usually know much better how long and of what type a particular field should be, but that's another story.

The lesson learned from all this is to use the right tool for the task and if you have a really good and handy hammer, most things start to look like nails.

Friday, May 18, 2007

On fragmented layers

In a previous post I described a layered approach to an organization. This time I'd like to extend the model a little and give some areas where it might be a useful decision-making tool. You can see a figure of the stack on the left. The point of the model is that different layers deal with different aspects of how an organization is built and that they are highly interdependent with changes in any of them causing a cascade of changes both upwards and down. It's also worth noting that, almost by definition, all projects impact all of the layers. Usually new pieces are added to the ends so somebody needs to make sure that the picture stays consistent, new pieces fit with existing stuff and do not cause discrepancies with others. All of this is pretty straightforward for the technical architecture but is often disregarded for the other aspects of a business.

How would one use the model in real life? One useful application I have found is explaining people why they should consider other things (like new processes or even teams) besides functionality when they are setting up a project. It also helps to visualize responsibilities (who deals with the functional architecture in your company?).

One use I'd like to focus a little more on, is the fragmentation aspect of the model. In short, the message goes: if you are to crack a layer, you better align it with cracks in neighboring ones.

Consider, for example, a scenario when you have two web applications supported by two different business organizations on two different continents. Which means there's a division in both business and organizational layers. Of course, the crack runs all the way and the applications are not integrated in any way. Now what if somebody up in the management decides, very sensibly, that it really sucks that customers would need to go to two different stores to get their SkypeOut minutes and headsets. Makes perfect sense and just making two systems talk to each other is not a fundamental obstacle. However, could you imagine two teams 4 timezones apart sharing responsibility for what the same piece of code does (i.e. integrating the technical architecture)? Or could you imagine an actual purchase flow (functional architecture) where you buy a SkypeIN number with all of it's details and finesse of all the legal requirements we have there and at the same time compare 4 headsets? Quite difficult, isn't it? Of course all of this could be done actually, but just linking the infrastructure without thinking of the organizational (how is responsibility shared among the teams?), functional (how do the different purchase flows fit together?), business (what about revenues and, say, marketing costs of the banners in the store?) or support (do our, say, release cycles need to be synchronized with the ones of our partner) dependencies are handled makes little sense.

Monday, May 7, 2007

Time to review

There has been yet another case of rebellion in wonderland recently. Basically, a design decision was challenged long after it had been made and also implemented. A year ago, we had designed (and implemented) a system that had a substantial influence to some billing and destination resolution logic in our calling infrastructure. As this coincided with some other changes in the same modules a decision was made to start separating that logic from the actual signaling logic as the later is highly stable (and needs to be very robust) while the former is liable to change much more often.

At the same time, our developers sought to standardize communication (and load balancing, redundancy, configuration management etc. issues that come with it) between separately deployed components and stuck with ICE. So, keen to play around with the new technology, they conducted some tests and a decision was made to use it for the newly created lump of business logic.

Historically, most of our business logic has resided within our databases. Not a bad decision at all given the horizontal and vertical splitting technology plus Postgres know-how we have in-house. However, this also meant that most of the knowledge of billing and routing internals resided with people who knew databases and were not about to start writing C++ code overnight, especially when it usually did not make any sense to ship data to a remote component for decision-making.

As a result, we ended up with a fairly slim layer of logic between the calling infrastructure and the database that, at the first glimpse, did very little but call a bunch of stored procedures. Of course, come time to deploy the thing, our operations people came asking why the heck they needed to support (and make highly available) an additional component that didn't add any value at all. Which was the rebellion at the beginning of the story.

So we discussed. And ended up with an understanding that in terms of design, developers still find value in that layer as the decision _which_ procedures to call is quite significant. Also, the data structures that get passed between it and the calling infrastructure are complex and it would be unwise to build serialization into flat structures required by the database into all of the calling systems. Some of the supportability concerns (but not all) the ops guys had could actually be solved quite easily, too. No major change in the architecture, then.

The reason I'm writing about this event is that there are several very important conclusions to draw from this event
  • Your architectural decisions should take into account the organization you operate with. In some other situation, the very idea of moving logic from a middleware layer to a database would have been pure lunacy (most of the organizations struggle to do the opposite) but given the stuff our DBAs pull of on a regular basis it's not that bad
  • Challenges are valuable, regular ones are even better. No design decision should be cast to stone, no concept should be considered OK only because "this is the way things are done". Although, most of the cases you still end up retaining the original idea, sometimes you don't. And this is where architectural evolution happens
  • Work closely with the operations people. They provide very good reality check. Helps with deployment griefs, too

Friday, April 20, 2007

On re-use

Here's a situation for you. You have an information system that runs a substantial internet-based service. That service has web-based interfaces to it for billing and self-service purposes. Now, imagine, that there is a need to branch out, to build the services into different channels. For example, have one application that can fit into the client, one that works on regular browsers, one that is targeted with people with bad eyesight and one that works on handheld devices.

So the question is: how mych of the original web offering can you re-use? It is obvious, that all of the business logic like billing, core provisioning and customer management services need to be re-used, that's a no-brainer. But what about the application and presentation layers? The presentation layer is also quite trivial as the various new versions are meant for different devices and audiences they clearly need a different way to manifest themselves.

The application logic is the one that will cause you trouble. Let's stop pretending that we are talking about a random company here and admit that this is Skype. For example, the purchase flow of purchasing a SkypeIn number, is a very complex beast with tailored integration points for different countries, various ways to pick out the numbers etc. So how would one re-use that?

One way to do it was to use the same application logic everywhere and just re-skin it into WAP or slightly more compact HTML for various distribution channels. This, however, does not work, because different devices tend to have different requirements towards the page flow (which is a manifestation of the application logic), too. For WAP for example, you probably want to generate most of the pages into a deck from one requrest so you don't have to go back to the server every time user clicks "next". For the client-based version you probably want to ignore several corner cases and make the flow a couple of pages shorter. And so forth.

The other would be to re-write the whole thing and build a semi-intelligent fourth tier in place that can handle the workflow, decide when and how to talk to integration partners, give out number pools etc. This would work, but it adds additional layer of complexity, the existing stuff has to be re-written (re-writing something for technical reasons is always a bad idea) and there is no guarantee that this thing would actually contain any useful logic after you are done. Close, but no cigar.

Instead of these options, I'd say just take a deep breath and do not re-use. Lot's of people will not go "no, no, no! You will get yourself into a world of pain every time a common piece of logic changes as you need to go and make a change in all of those flows and you will surely forget something". In my mind, it is much worse to make a major business logic change without explicitly going over all the places that use it. And if you are already doing that, you might as well implement the change right there. Application logic is a combination of delivery channel requirements and business logic so any change to the latter is going to have an impact on the application logic that is specific to the channel. Meaning, that you would most probably go and tweak all the flows anyway. And if you forget any, you will be in trouble either way.

In summary: do not break your head about re-use. In case of the application logic, you sometimes need to gather your courage and not re-use at all.

Friday, April 6, 2007

Metaphor: drawing a horse

I found myself amid of a major discussion the other day. When defining the scope of a project, should one also state things that are _out_ of the scope? Some people said that this would lead to wishful thinking and describing the whole world in the "out of scope" section. The others said that everything _in_ scope could never be described in adequate detail anyway so describing the outer world is the only way to go. So I got into thinking (considering I also had to present on the issue), ended up with an analogy:

Say you wanted to draw a horse and did something like this (art lovers, look away now):
It ain't pretty, isn't it? One could also use a different approach:

Isn't going to win any prizes also. But combine the two:

And you get something that is still ugly but at least gives a sort of holistic view of the animal. So the point is that you need both the inside-out and outside-in perspectives on a project to get a sufficient understanding of its scope.