Friday, December 21, 2012

Scalatron: The best language tutorial / learning environment ever made

Let me expand on how great the Scalatron initial experience has been.

Scalatron has this fantastic introduction in several steps.  Each step produces a bot that does something, and by pressing a couple of buttons in the in-browser UI, you can see it work as well as tweak it yourself.  Here's a screenshot:


On the left is the tutorial, in the middle is the code, and on the right is the sandbox that allows you to actually see your little 'bot running around, limited to what it's allowed to "see".  All this is regular in-browser stuff, no plugins.  And the highest praise I can give it is that it "just works".  Edit the code in the middle, press 'run in sandbox' and it does.  Compile errors?  The build step pops up an error box at the bottom with line numbers.  Smooth.

And then the interaction with the tutorial:  Every block in the tutorial has a "load into editor button" drops the code directly into the editor pane.   Which means no copy/paste errors, and it works in blocks, so if you're working on the missile-launching section and you messed up the movement section with a half-baked idea, the tutorial lets you reset just one part.  This is really polished, catching use cases like, "you asked to load it into the editor, but you haven't saved the work there.  Save first?"

The tutorials were the foundation of the thoughts and opinions on the language that you can read in my previous post.  They take you through the major language features by introducing real problems you need to solve: how do I parse a string?  How do I re-use a function?  What are vars and vals?  None of it feels contrived just to make a point.

And this IDE has room for growth.  There's a little [<<] button on the tutorial that lets you take off those training wheels, and continue on with your bot development.  Save your work, give it a label, and load it into the Scalatron battle instance that loaded up when you started Scalatron.  Instant gratification.

One complaint I have is that it's not very clear from this IDE how to restart the tournament without restarting the Scalatron process, which is running your IDE is running the tournament.  It doesn't always interrupt the IDE, but when it does...

My other complaint, is that there's no way to do TDD with it.  That's a show stopper for me, if I write too much code without a test I start to get physically ill.  So now it's time to get a Scala environment up and running.  This has often been the stopping point for me in other languages/environments as I really have very little patience for cobbling together build scripts or downloading disparate parts.  More on that in the next post.

Thursday, December 20, 2012

If Socrates was a Scrummaster

This blog moved to my company's site:  http://www.davisbase.com/if-socrates-was-a-scrummaster/

Thanks for reading!

Wednesday, December 19, 2012

Getting Scala with Scalatron

I've always liked Scala, but like many of my early dating experiences, it has been from afar... wistfully looking at Scala projects on github, thinking that one day I'd take the plunge and ask one of those nice projects out.

Then I found a geek that was easy for me to connect with: Scalatron.  Here ends the dating analogy.

Scalatron is a framework for writing competitive AI bots that compete in an online arena.  But Scalatron is also super easy (hey, the dating analogy is over, remember?).  Download the zipfile, extract it, type java -jar Scalatron.jar and jump into the very nice tutorials, and get quick visual feedback on the code you're writing.  Plus, it's fun and you can kill stuff.

Scala's Goals


Scala's goals remind me a lot of Java's goals when it first came out.  Java was a Real OO language, where everything is possible (mostly), but the easiest way to do things was the OO way.  At least as compared to C++ or interpreted languages at the time.  I think it's maybe being eclipsed by C#, but that's a different article.

Scala's goals are to be a Real Functional language where everything is possible, but the easiest way to do things is the Functional way.  So it has all the tools to make it really obvious when you're being non-functional.  val vs var, for example really prompts you to say, "Hey man, why you gotta go and make this mutable?  You trying to make problems for yourself?"

And it wants to have a type system that doesn't get in the way, or leave you in the dark.  This addresses my big complaint with dynamic languages.  Dynamic languages are nice right up until you're working with unfamiliar codebases, which unfortunately is immediately.  The Ruby framework routinely made me upset by having no types defined at all, not even named.  What does this method take?  A hash.  Ok, not helpful.  So I open up the code and ask, "what does it do with that hash?  Oh, it passes it to another method.  great."  And pretty soon you figure out that it's hashes all the way down.


"But Kevin!" the dynamic programmers say, "That's the flexibility!  You can pass it anything, that's the point!"  Anything?  Yeah, what if I pass it some drek?  Will it like it?  I suspect it won't be good for either of us.

Type Inference to Tuples

The middle ground.  Strong types and a compiler that cares (vs leaving us to write unit tests to do what a typing system could do for us) but no more of Collection collection = collection collectioncollectioncollectioncollectioncollection.  It's interesting to work with a type inference system, because it's such a perfect window into why we had no inference before.  I find myself looking around the code saying, "Wait, what type is this again?"  The compiler knows, but I don't.  It's like one of those logic puzzles that is 10 steps deep in transitive logic.  And if I were writing code in vi or something, it would be a pain.  But I don't do that anymore.  I'm not using an IDE right now, but I'm sure that will be my main way of working, and I have full confidence that the Scala IDEs will let me hover over something and see its type.

But Functional programming's no side effects rules means that you're frequently going to have multiple return values, which in many languages generates a lot of types.  But this is Functional programming, so Tuples are built right in with both type inference and static constructors to keep the code from being redundant.  Syntactic sugar?  That's a funny way to spell 'readability'.
val myTuple = ('something','somethingElse')
Except I do have a complaint:  _1 and _2 aren't particularly good names for the contents of a Tuple.  Time will tell if this is a problem, but I suspect it might be.  The typing system will tell me I get three strings, but what are they?  There's some more syntax so that you can assign things directly to meaningful names, which is nice:
val (firstName, lastName) = someTupleReturningFunc()
But how did I know what that function did in the first place?  Especially if it's cleverly passing tuples from its component parts?  Ideally I'd like for the API to be discoverable, and type information isn't always enough.

I guess at that point you create an object (via Scala's object keyword, which is distinct from class) and give everything proper names.  I read online about using a case class to extend a tuple, which looks nice but apparently has some real complications.  I'm just a novice, of course, perhaps there's a better solution still that doesn't fall all the way back to making pojo beans.

The first touch that I have with real Functional programming is almost always with some kind of mapping function.  And that's even the famous Function Heard Round The World, Google's MapReduce.  I like how it looks in Scala, returning a Tuple for a given input.

No return statements?  What about readability and accidents?

And here again is an "aha" moment for me.  The Scala 'way' is to have no return statements.  This is an element of consistency when looking at quick function definitions.  And Scala, like Javascript, loooves functions.  And if most functions are going to be one-liners, having a return keyword in there is just clutter.  So the quick-closure definition of a function doesn't use the return keyword.  And neither should regular functions.

Language Habits: names for everything

I can see that my Java habits are going to give my initial Scala code a strong accent.  Like the charming Eastern European habit of dropping articles like "a" and "the", I'm going to probably name lots of stuff that I don't need to in the beginning.  Right from Scalatron's example:
val rest = tokens(1).dropRight(1)               
val params = rest.split(',')                    
val strPairs = params.map(s => s.split('='))    
val kvPairs = strPairs.map(a => (a(0),a(1)))   
val paramMap = kvPairs.toMap       
That's familiar code right there.  But we don't need all those names, technically.  The way all the cool kids are doing it these days is this:
val paramMap =
    tokens(1)
    .dropRight(1)
    .split(',')
    .map(_.split('='))
    .map(a => (a(0),a(1)))
    .toMap

Get off my lawn!  I mean, uh, what a charming way to string everything together.  But what it represents here is the Functional point of view.  These are all simultaneous operations we are invoking on the original value.  The intermediate names aren't useful.  I have to admit I'm not convinced.  But then I can't explain to my Lithuanian friends why "a" and "the" are important, either.

Function scoped Functions

And hey, if we're going to have functions that we pass around like variables, there's no reason we can't have locally-scoped functions is there?  In fact, this is pretty key.  In OO, methods exist to change the state of the object.  So we had only a couple of levels of visibility of methods.  If you felt like you wanted to hide some methods from some other methods, this was a pretty good hint that you actually need to divide the object into parts.

But now we don't usually have state to deal with, and we're going to be using a LOT more functions.  So there you go, dawg.   

Objects vs Classes vs Case Classes.

This distinction seems a little fine on first blush.  First, terminology.  Just like in Java and other OO languages, objects are things that are instantiated, and classes are definitions and names for things that you may choose to instantiate (or not, I suppose).  And Case Classes are a special thing that ... some people apparently hate.  I think that's where I'm going to have to pick up next time.

Friday, November 30, 2012

Real Discoverability

An important aspect of languages and toolkits is discoverability. The ability to answer questions like:

What other code (including display code, configuration) uses this method/class?
What code or behavior does this configuration option reflect?
Where is this text displayed?
Where is this data from this DTO used?  What business logic does it impact, or is it just display?

This is a property that dynamically typed languages can lose as a result of that dynamic typing. (That's not to say that statically typed languages preserve it in all, or even most, cases -- more on that later). One way to think about it is to compare the difference in result to searching for string matches on "getName(" and comparing that result with the reality of the codebase.  Some problems will include:

False positives: the text search picks up calls to methods on other objects that happen to have the same name
Stupid false negatives: you need to be a little sophisticated to get both "methodName(" and "methodName (" etc.
Method overloading causes more problems: If you're looking for the method that takes a string, not the one that takes a Customer, that's tricky to do by text searching alone. You'll have to go through them by hand.
Metaprogramming totally wrecks your string search. Assigning the method to a variable then passing that off to something else to be executed hoses your text search.

Statically typed languages aren't a whole lot better, actually. As soon as you add an interface, you've abstracted the call to the method. Even though it's a useful abstraction, you can't tell what happens in practice. And static languages' reflection attributes wreck things as surely as function pointers.

What if the virtual machine recorded things for you, and piped that data back to your IDE? Assuming you're running this in either production or a test environment that has good coverage, that could make it clear what paths are frequent and what sections are never called -- in production meaning they're unused, and in test code meaning untested. For example, a Ruby IDE could use this to generate many of the features of statically typed langauges, like automated refactoring, and to the point of this article: call hierarchy discovery, without all the headaches above.

This same approach might be viable at the framework level. Rather than trying to derive from the code and the configuration where data is displayed, use testing or production to record it in practice. The former is like proving code correct -- technically possible, but very very difficult. The latter is much like the real-world testing we do: rather than trying to construct a proof of correctness of the button, just push the button and see what happens. In this case, we record what did happen and use it to inform the rest of the system and our tools.

Thursday, July 05, 2012

Acceptance Testing: what is it good for?

Overview

Acceptance Testing is an approach to providing fast feedback based on business scenarios.  It helps teams avoid the brittleness and long delays associated with automated GUI testing.  I'll also look at a specific tool, Gherkin, and explore in detail how its approach allows us to build automation steps that are business-readable, very reusable, and potentially even allow non-technical people to author tests with no development involvement.

What puts the "Acceptance" in "Acceptance Testing?"

First, let's get the context right: the word Acceptance here is the same as the Acceptance in Acceptance Criteria, the "back of the card" additional details on on a User Story that the team needs to estimate the work.  It's unrelated to the Acceptance in User Acceptance Testing, except maybe at some overarching goal of making people happy.

The Acceptance in Acceptance Criteria does relate to the ceremony at the end of the work where the Product Owner decides whether to accept the work or not.   Our Acceptance Criteria then, are the Criteria that the Product Owner will use to make that decision.  (Or at least some of the criteria -- not intended to be a contract).

Acceptance Testing is expressing those criteria as tests.  At this level, it could be anything that the Product Owner can dream up.  All of that should be tested, but not all of it can be automated.  This puts it squarely in Q2 of the testing quadrants that are at the heart of Agile Testing by Lisa Crispin and Janet Gregory:






Behavior Driven Development with Gherkin

With all this in mind, let's look at a User Story, some Acceptance Criteria, and a Gherkin test.  We'll use the User Story format of "As a .. I want to ... so that ..", and add Acceptance Criteria as bullet points afterward.
As a nurse,
I want to scan a patient's id and get a list of prescribed medications
so that I give the correct medication to the correct patient
    • Patient's name and current room number is displayed
    • List of all currently prescribed medication list is displayed, ordered alphabetically
    • Medications already administered are indicated in red.
    • If the patient is not found, a message is displayed with the info from the scanned id
Let's look at another format, the Gherkin BDD format.
Given precondition
When actor + action
Then observable result
This standard format, like the standard User Story format, is a lot like the tool pegboard that you have in your garage:  It tells you where to put things so you can find them quickly, and tells you when something is missing.

Extending the garage metaphor, Gherkin itself is only a tool -- it can be used for many purposes including Unit Testing, GUI testing, load testing, etc.  Let's look at how to use Gherkin to create  Acceptance Tests from our Acceptance Criteria.

Our first Criteria is that the patient's current room and floor is displayed.  Let's first re-word that in the Gherkin format, called a Scenario:

Given a patient is in a particular room
When the nurse scans the patient's id
Then the patient's room and floor is displayed.
Okay, big whoop.  This isn't very helpful -- it's just wordier.  In this case, when the actor and action are the same as the user story, it's not very interesting.  But let's add some specific examples to it:
Given a patient with id 1234 named Kevin Klinemeier is in room 305B
When the nurse scans patient id 1234
Then Kevin Klinemeier and 305B is displayed
Now that we have specifics in place, this is an Acceptance Test.  This looks like the kind of thing we could actually automate, and it begins to suggest some other scenarios:  What happens if the wrong patient id is scanned?  Can there be two patients with the same ID?  Can there be two patients in the same room?  But first, this is just English-- how does it get automated?

Each step in the BDD Scenario is implemented by a developer.  If we were working with a user interface, it might look like this:

First choose the variable parts, described here by underlines:
Given a patient with id 1234 named Kevin Klinemeier is in room 305B
Then write some "wiring and glue" code to put the variables into place, described here as a UI-automation pseudocode:

 public void givenAPatient(patientID, patientName, roomNumber) {  
   window.open(PATIENT_ENTRY_SCREEN)  
   writeIntoTextBox("patient_id", patientID)  
   writeIntoTextBox("patient_name",patientName)  
   writeIntoTextBox("room_number", roomNumber)  
 }  

Great, that makes sense.  Except it stinks!  Our goal as Agile Testers is to provide fast feedback, right?  But now we can't run our test until the GUI is finished.  Furthermore, automated GUI testing is itself notoriously brittle.  Instead of a simple writeIntoTextBox method, we're more likely to have something that looks like this:

 writeIntoTextBox("xpath:=/html/body/div/div[2]/div/div/form/label/input[1]",patientID);  

That's xpath in there, and it's specifying an input box on a particular place on the page.  If the page changes the order of the fields, the test breaks.  If it changes the location of the fields, the test breaks, if it removes something that is above the fields, the test breaks.  You get the idea.

Instead, we avoid the problems of GUI automation by testing in the "middle".  The unit test level is too early, and as a result is too technical and doesn't speak to the business.  The GUI layer is too late, it doesn't allow the team feedback on both behavior and design at a time when it can be resolved.  The sweet spot is to test services, and hence design services for this kind of testing.  This is best described by Mike Cohn's testing pyramid:


Part of the feedback we are providing is on whether a given software design is Testable.  There are many roadblocks to testability in GUI, but much less at the service level.  Furthermore, when we provide these BDD Scenarios as the starting place for testing at the service level, our services as a result are not only testable, but based on the underlying business concepts, and that's just what we want out of a good service layer.

To complete the example, let's look at a service-layer implementation for the step we have above:

Given a patient with id 1234 namedKevin Klinemeier is in room 305B
 public void givenAPatient(patientID, patientName, roomNumber) {  
   patientEntryService.createPatient(patientID,patientName,roomNumber);  
 }  

When the nurse scans patient id 1234
 public void scanPatientId(patientID) {  
    result = patientScanService.scan(patientID);  
 }  

Then Kevin Klinemeier and 305B is displayed
 public void shouldContainPatientNameAndRomNumber(patientName, roomNumber) {  
    assert.that(result.getName).isEqualTo(patientName);  
    assert.that(result.getRoomNumber).isEqualTo(roomNumber);  
 }  

BDD and Building Blocks

We now have one working example: the patient scan.  It looks like maybe a lot of work.  The exciting part of BDD, and the thing that makes this all a scalable part of long-term software development processes, is what happens when we move on to the next Scenario.  Let's pick one of the other Scenarios we came up with before: scanning a patient that doesn't exist.  First let's write it with Given .. When .. Then:

Given a patient with id 5678 named Joe Justice is in room 205A
When the nurse scans patient id 0001
Then an error is displayed

 I've highlighted the first two steps in green: they're free!  Well, we already paid for them, but they're free to re-use in this new scenario.  The only new wiring to write is looking at the error that is returned.

Once we have a few of these basic building blocks, we can really take that development effort and amplify it through everyone else on the team who can create these near-english Scenarios:  QA, BA, Product Owner, perhaps even Customers and Users.  As an example, with just the four different steps we've completed so far, we can create and automatically run all the following test Scenarios without further development help:

Scenario:  Duplicate patient ID
Given a patient with id 2838 named Bill Bates is in room 203D
And a patient with id 2838 named Clay Cummings is in room 405F

When the nurse scans patient id 2838
Then an error is displayed 
Scenario:  Multiple patients in room
Given a patient with id 8432 named Amelia Anderson is in room 403B
And a patient with id 9392 named Nelly Newborn is in room 403B

When the nurse scans patient id 2838

Then Amelia Anderson and 403B is displayed

And Nelly Newborn and 403B is displayed

These scenarios might also drive discussion:  should we be able to tell the difference between an error for a patient id that is missing vs an error for a patient id with multiple entries?  And that's just the kind of business-level input we are looking to provide feedback for, delivered before the GUI is even started.  

Summary

Acceptance tests that express business concepts are powerful, but automation at the GUI level is problematic and waiting for the GUI layer to be built introduces long delays for testing.  Instead, provide faster feedback by automating these tests at the "middle" layer (service layer) where the business concepts can be expressed, but the GUI details are not yet in the way.
Using the BDD Gherkin format to write tests creates tests that are not only automatable, but also creates small building blocks that reduces the overall cost of automation and allows non-technical team members to extend the automated test suite. 

Wednesday, April 11, 2012

BDD and big datasets

One of the challenges that I hear in my classes on Agile Testing is around Behavior Driven Development and big datasets.  The intro to a lot of BDD tools looks something like this:

Given a customer named 'John Smith' who is 45 years old
When I execute a check for retirement eligibility
Then the result should be false

This is, of course, the Gherkin language from Cucumber, which has implementations in many languages including Ruby, Java, and .Net.  It's pretty compelling -- succinctly describes preconditions, actions, and expected results.  Until you think about applying it, then the trouble starts.

In this post, I'll talk about big datasets.  In the above example, assuming that we can insert a customer with only two attributes (name and age) is what stands out as being too simple.  In practice, our customer records have dozens or in some cases almost a hundred fields.  Gherkin has the ability to do tables by using the vertical pipe symbol, so do we create something like this?

Given a customer with these fields:
     |  name  |  age | street 1 | street 2 | city | state | zip | ssn | credit card | ccExpiry | signupDate | currentBalance | blah | blah | blah |
    | john smith | 42 | 4823 third | | seattle | wa | 98173 | 592-93-5382 | 4324 4322 3345 2838 | 11/15 | 04/12 | 438.27 | moo | beep | foo |

When I execute a check for retirement eligibility
Then the result should be false

I hope not, because that stinks.  It pretty quickly strips away most of the readability benefit that we were getting from this tool in the first place.  From a data point of view, it's also a mess because the tabular format requires us to flatten a lot of our dataset, making it harder to maintain.  And from a communication point of view, it requires us to really pan for gold -- some of these columns have an impact on whether the customer is eligible for retirement, and some of them are just boilerplate junk that we have to provide in order to create a customer.  Which is which?

The distinction (impactful vs boilerplate) is especially important when we try to maintain the test.  If 80% of the fields are boilerplate, we get into the habit of changing data to make the test pass.  Which makes it very easy to make a 'maintenance' change to the test that actually makes the test invalid.  In our example above, it might be reasonable to assume that it's just name and age.  But what if it was address, too?  It's possible that retirement eligibility (whatever that means) varies by state or county.  We need to  remove these attributes from the test, and have the test focus only on the details that impact the outcome.

The first step is to listen.  How do our product owner or businesspeople talk about the customers?  I mean beyond calling them big fat jerks when they're at the bar after work.  Are there certain classes of customers that we can specify?  Let's try that:

Given a West Coast customer aged 45,

When I execute a check for retirement eligibility
Then the result should be false

But what does West Coast customer mean?  This definition is contained in the step definition, and includes a set of Reasonable Defaults.  Every customer needs a name, so the system makes one.  And within the context of our West Coast Customer class, it makes up an address in one of the western coastal states.  This makes it clear what the real dependencies are in our test:  The address should be somewhere on the West Coast, and a particular age.  Everything else about the customer is irrelevant to this test.

On the step definition side, there are some decisions to be made.  First is what to parameterize.  In the beginning, do the Simplest Thing That Could Possibly work and parameterize none of it:

     @Given("a West Coast customer aged (.*)") ...

If, as I would expect, we end up with several different additional datapoints that we want to 'override' about our West Coast Customer, then the Builder pattern will be useful.  Specifically Builder rather than Factory, so that we avoid combinatorial explosion of the options for overrides.  The WestCoastCustomerBuilder might look like this after some time:

   public class WestCoastCustomerBuilder {
      public void BuildAndInsertCustomer();
      public void SetAge(int age);
      public void SetSSN(string ssn);
      public void SetCurrentBalance(double balance);
   }

And here's how it might be used in several step definitions:


    @Given("a West Coast customer aged (.*)")
        public void BuildWestCoastWithAge(int age) {
           builder.SetAge(age);
           builder.BuildAndInsertCustomer();
        }

    @Given("a West Coast customer aged (.*) with ssn (.*)")
    public void BuildWestCoastWithSSN(int age, string ssn) {
            builder.SetSSN(ssn);
            builder.SetAge(age);
            builder.BuildAndInsertCustomer();
    }

This shows the re-use of the builder pattern.  Critics will point out that we could also just have the second definition call the first, or avoid the need altogether with an optional clause in the step definition.  True, but doesn't provide as good an example of re-use.

The defaults could either be static data (all WestCoast addresses are my house), a random selection of a given dataset (choose a random address among these 50) or a pure random generation (make up an address on a numbered street in a west coast city).  As with all things, do the simplest first and see if that works for you.

In summary: whenever our 'required' datasets start to harm the readability (and hence maintainability) we should refactor the test to define only those inputs that affect the observed result.  All the boilerplate should be handled by generalized step definitions that are capable of supplying reasonable defaults.  This ensures that the test stays clear and focused, while creating clear and reusable step definitions.