Friday, November 30, 2012

Real Discoverability

An important aspect of languages and toolkits is discoverability. The ability to answer questions like:

What other code (including display code, configuration) uses this method/class?
What code or behavior does this configuration option reflect?
Where is this text displayed?
Where is this data from this DTO used?  What business logic does it impact, or is it just display?

This is a property that dynamically typed languages can lose as a result of that dynamic typing. (That's not to say that statically typed languages preserve it in all, or even most, cases -- more on that later). One way to think about it is to compare the difference in result to searching for string matches on "getName(" and comparing that result with the reality of the codebase.  Some problems will include:

False positives: the text search picks up calls to methods on other objects that happen to have the same name
Stupid false negatives: you need to be a little sophisticated to get both "methodName(" and "methodName (" etc.
Method overloading causes more problems: If you're looking for the method that takes a string, not the one that takes a Customer, that's tricky to do by text searching alone. You'll have to go through them by hand.
Metaprogramming totally wrecks your string search. Assigning the method to a variable then passing that off to something else to be executed hoses your text search.

Statically typed languages aren't a whole lot better, actually. As soon as you add an interface, you've abstracted the call to the method. Even though it's a useful abstraction, you can't tell what happens in practice. And static languages' reflection attributes wreck things as surely as function pointers.

What if the virtual machine recorded things for you, and piped that data back to your IDE? Assuming you're running this in either production or a test environment that has good coverage, that could make it clear what paths are frequent and what sections are never called -- in production meaning they're unused, and in test code meaning untested. For example, a Ruby IDE could use this to generate many of the features of statically typed langauges, like automated refactoring, and to the point of this article: call hierarchy discovery, without all the headaches above.

This same approach might be viable at the framework level. Rather than trying to derive from the code and the configuration where data is displayed, use testing or production to record it in practice. The former is like proving code correct -- technically possible, but very very difficult. The latter is much like the real-world testing we do: rather than trying to construct a proof of correctness of the button, just push the button and see what happens. In this case, we record what did happen and use it to inform the rest of the system and our tools.