Monday, June 06, 2005

Threading .. the next revolution

I've been thinking for years that the next major language/runtime improvement to follow automatic memory management is automated synchronization. Not the heavy-handed top-level "synch all" but something that is smart about what needs synchronization and what doesn't. It will have to know about deadlocks and stale reads, and atomicity violations, and prevent them all with reliable accuracy.

Why this, and why now? It seems like a logical next step after memory management, and I'm far from being the first to have that notion. In the last couple of years, however, it's becoming apparent that some of the push for this kind of solution will come from the hardware that our systems are running on. Industry pundits have been saying that single-core speeds are going to stop increasing. However, this prediction has become real enough now that multi-core single dies and hyper-threading approaches are in production. To take advantage of these performance increases, you often must change the way your application functions to focus on multi-threaded capabilities.

Tools that have some of this knowledge already exist, mostly geared towards development-time analysis. What we'll need, however, is a run-time handling of the problem, backed by some kind of language adaptation (either new features, or new usage of old features)

At first glance, this problem looks a lot like the memory allocation problem. There will be run-time tradeoffs for development time gains. It should simplify the tedious and complicated business of determining where synchronization issues are. It takes a run-time tool (memory analyzers) and turns it into a run-time action (garbage collection).

However, there's a major difference. Where many developers were willing to put up with an overall runtime slowdown in order to gain an increase in developer efficiency, that won't be the case for automatic synchronization. The entire point of automatic synchronization is to take advantage of the potential performance gains in the new thread-heavy architecture. Add to that the fact that unlike most applications, the runtime synchroniztion must be 100% (possibly needing to be provably) correct, and you have a big task.

I've said something here that needs elaboration. It is my belief that most multithreaded applications today work, but are not correct. It's been my experience that if you examine a complex multithreaded system closely enough, you are almost guaranteed to find situations where there exists the potential for deadlock or dirty-read issues. In practice, these issues "almost never matter" because of outside guarantees: "That thread never runs when this thread is running" or of timing scenarios that, while possible, are very unlikely. An automated system would not have the luxury of being lucky, and as a result will incur overhead that hand-synchronized systems will not need to handle.

Speaking of handling, what would it have to handle? Here's a list of synchronization problems off the top of my head:
  • stale thread information
  • atomic reads/writes
  • deadlocks
  • transactions
The last one is the interesting one. It's not enough to have perfect field-level automatic synchronization. You must be able to designate a group of changes as being written atomically. A classic example is a bank balance transfer. The new values in 'checking' and 'savings' must be written at the same time, to prevent any user of the thread from getting a mismatched value. I think this is where new language features, or new usages of old features, are going to drive a new approach to programming.

Of course, nothing is new under the sun. That 'classic' example of transactions is just like writing to a database. Both operations need to be done atomically to the database, so that if they fail, they fail together. Approaches from relational database programming can certainly be applied here, leading us to think of shared memory like a database. Even so, I don't think there's a standard API or approach to database save transactions currently in wide use across languages, and probably not even within any open language. Closed languages like Delphi and others may likely have a single dictated approach.

Now, obviously I'm not the first to come up with the idea of automatic synchronization. There are academic papers written 15 years ago or more on the subject. Graph theory certainly has something to say about this kind of thing, and parts of its tenets are even older. There's a lot to read even before getting started, and it's a big project.

But big projects, in the XP way, should be reduced to smaller projects. So I ask: "What is the smallest amount of the above problem that we can solve, and still be useful?"

Tuesday, May 24, 2005

Combining Code Review with Paired Programming

Combining Code Review with Paired Programming:
An approach by Kevin Klinemeier and Charlie Sheppard

The first rule of the Code Review: You don't leave the code review with more work to do.
The second rule of the Code Review: You don't leave the code review with more work to do.
The third rule of the Code Review: If this is your first time, you have to ... uh... listen. We're not that hardcore.

Code reviews can be stressful, frustrating, and/or a waste of time. This approach attempts to avoid at lot of that by being more like a pair programming session (with keyboard and projector) than a review of a bunch of printed-out code. Take a laptop with your favorite IDE and the latest version of the codebase. Sit down with a group of 3-5developers and pick a few classes to review.

As you review a piece of code, rather than noting suggestions as tasks for later (see rule #1), actually make those changes to the codebase. If they're too large to be made in a single code review, pare the change down to what you *can* do, and do that. At the end, check in your changes and be done with it.

This approach tends to be much more interesting for the participants because there's more interaction, and is more productive because you can see your changes made as you suggest them.

It is important to be serious about the no-extra work rule. This keeps your developers from thinking, "Ugh, we have to go to an hour-long meeting from which I'll gain two more tasks to do this week. Screw that, I don't have time." Sacrifice your changes before you sacrifice the no-extra work clause. Adding extra work in the review causes the review to be skipped in 'crunch time', and ultimately lost along the wayside.

In general, when you start or amend your code review process, let your developers know that they should expect that they *will* find changes. Nobody is capable of keeping all the possible considerations in mind while they're writing software. That's why we do code reviews, and why paired programming is so powerful. Also, even if your software is very, very well written, you don't put several developers in a room and ask, "Hey, what do you think of this piece of code?" and get "No changes" as a result. It just doesn't happen.

The last thing, and this is obvious, but be polite. There may very well be a reason the code is the way it is, or even if there isn't (it really is that bad), it's likely the author and the rest of the team already knows better. Laugh at yourselves from time to time, and overall stay positive. Talk more about what can be fixed, rather than problems left behind.

A Better Mock Objects explanation

So, despite my February posts to the contrary, last week I finally understood what Mock Objects are. Or more directly, I finally understood what they aren't. Martin Fowler's article entitled Mocks Aren't Stubs really laid it out well. Stubs are used for state-based testing. Mocks are used for interaction-based testing. Using a Mock when what you want is a Stub generally results in the interaction-testing features getting "in your way". This is the case for the www.mockobjects.com mock implementation of the various HTTPxxx interfaces. They add a lot of useful methods for testing interaction, but when all I want is to set attributes, they're "in the way".

Martin Fowler's Article:

http://www.martinfowler.com/articles/mocksArentStubs.html

Friday, February 11, 2005

Mock Objects explanation

I wrote what I think was a very good explanation of the Mock Objects approach on the junit mailing list today, so I've copied it below.

One thing I found interesting was the way I ended the message. I said, "I think Mock Objects are fun".

What a curious thing to find fun. It's true, though. Writing Mock Objects gives me a special kind of gee-whiz satisfaction that I can't exactly explain. It may have something to do with the fact that writing Mock Objects frequently involves little "tricks" like scoping changes or subclassing and overriding major functionality. In normal code, I'd consider these tricks to be an abherration and a threat to readability and maintainability. For some reason, they're acceptable to me in Mock Objects.

I wonder now if the Mock Objects I've written are horrible monstrosities full of hacks and completely unusuable to those who come after me. I hope not, and in general I don't think so. If there's a readability issue I'm worried about it's when I make a mock object than needs to implement just one method out of an interface, and I let Eclipse just code-generate the rest of the class. Sure, I move the implemented methods to the top, but it isn't exactly clear.

This seems to raise the question, though: If my very-clever Mock Objects are maintainable, then why do I hate and avoid this kind of work in my regular projects? Is it a question of scale, perhaps? I could (but don't) declare most of my Mock Object implementations as final, nothing subclasses them. Is it that there is more subclassing going on in my actual project code, and hence more need to keep things simple? I have my doubts about that as well.

Perhaps I just need to kill my darlings, but ... they're so cute.

-Kevin

The original text:

Often, many of the inputs and outputs of your system are its interaction with other classes. Those classes frequently have states or take actions that are difficult to test (sniffing packets, throwing exceptions).

In order to avoid this problem, you use a class that appears to your unit under test as though it is the tool it usually calls, but under the covers you've stripped out its functionality and replaced it with something you can test with.

Let me assume that in your example, you pass in some kind of network-access object in your constructor, which is then used by various methods. You need to test what your code does when that network object reports that it cannot contact any network at all. This is a very difficult situation to simulate with the actual code.

You create (or if you're lucky, download) a class that subclasses your network access object. That class may be called "AlwaysFailingMockNetworkAccessObject". It is bone-stupid, and is in actuality not a network access object at all. The only thing it does is fail when its access methods are called.

So in your test, you construct your class with this AlwaysFailingMockNetworkAccessObject, and then make your assertions about your class' behavior.

Another test might use a PlaybackMockNetworkAccessObject (another class I made up). This object might not contact a network at all, but accept in its constructor a set of data to return as though it came from a network. Using this, you can write a set of assertions based on a set of data.

One common first reaction to the Mock Objects approach is that "It doesn't *really* test the system" since it doesn't involve some major component. This is when Unit Testers reply, "It tests everything I'm concerned about -- I don't need to test the Network Access implementation, it isn't my problem." You'll also be doing an integration test before you release. This, however, is a unit test, and the Mock Objects approach helps you test just your unit.

Monday, January 31, 2005

Puzzle Pirate Banker

I wonder if this would be a viable mini-business in Puzzle Pirates:

The "system" (aka the man) charges you a 10% or so fee to transfer your poe between islands. This fee goes up if you want to transfer between islands in different archipelegos.

Could a player or group of players who have significant idle holdings undercut this practice?

The problem is that the poe needs to get there. Either via an able crew, or just general accumulation, let's say for the sake of argument that the poe is at several locations arleady.

One could create a website (that is updated -- the not fun part) that knows how much poe is where. Users can enter the place their poe is and where they want it to go. The system will then tell them how much they can transfer (the lower of the poe onhand at those two destinations), and the fee.

The player would then have to arrange a time to meet with a trusted broker to close the deal.

Building trust is obviously an issue, as the broker cannot trust the customer. Ingame events like auctions and such should be enough to get the ball rolling. False accusations of fraud from dishonest players will be a problem, especially if they occur while the system is getting started.

It may be that the majority of transactions serve to centralize the poe, or worse, decentralize it to where it is scattered and not useful. Talented sailors will then have to be enlisted. Perhaps the sailors could be commissioned, without the need to trust them implicitly. If there exists a general trust in the organiztion, it could itself advertise for "poe delivery", paying a percentage on the income, perhaps at either isle. This percentage would need to be at least 5% lower than the 'transaction rate' charged to customers.

Scale is a problem, as a system of trust would need to be created among the brokers. However, once established, brokers could operate as independant agents to a large degree. They can maintain their own network of poe, post their own bounties for poe deliveries, etc. Indeed, they must maintain their own poe, as no-one can be trusted with the "communal pool". To this end, they will need to make their own deliveries. If agents are competing with each other to make transactions, this will tend to ensure that deliveries are as timely as possible.

Trust could be "ensured" by requiring new agents to provide a "Security Deposit". Agents are then allowed to make transactions up to their security deposit. This provision could be removed for longtime agents. Dishonest players, however, can abuse agents by falsely reporting fraud (certain to happen). This will be indistinguishable from actual fraud.

Agents would be easier to trust if the poe were paid to a central system rather than the agent himself. Perhaps deposits on private sloops or somesuch can be arranged. This would necessitate an investment of purchasing sloops for each new 'source'. Except that doesn't help at all. Dishonest agents can simply falsely report delivery of the poe and collect the replacement payment and their fee.

Is there any way for the agent to "prove" the transaction took place? Screenshots can be faked, chatlogs are laughable. Banking data is private. Perhaps ships can again be helpful? Their transaction logs can show who took the poe and how much, though poe can only be removed by crewmembers. A pirate may only be in one crew at a time, and the customer may not have the ability to add the Agent, nor the inclination to leave his crew to receive his shipment.

These troubles would be acceptable for large amounts (10k+ perhaps), and perhaps the risks are minor enough to dispense with these precautions for smaller amounts.

This entire idea could be expanded to any type of commodity, actually. Storage of anything but poe is difficult, however.

Alpha

Welcome to my head.