How do you evaluate testers? A question from StackExchange

I frequent many of online forums. It’s a great way to grow and learn, as the discussions are fantastic opportunities to challenge us with questions we don’t face on our day to day work. I also enjoy the questions that I do encounter in my day to day, because when answering these (or reading answers) for a context different from mine, I discover new ways of thinking about matters I do out of routine.

In English, an interesting site is the Testing.StackExchange” Questions&Answers site. Unfortunately the site is less active than it could be, but there are good questions and very good answers in the list. Names like Alan Page, Michael Bolton and BJ Rollison answer questions there, so it’s a good place for you to ask yours.

This week an interesting question popped at the Testing.StackExchange forum: How can you evaluate the performance of a tester? How can you compare testers’ efficiency?
This is a question that entertains me once a year, when my company does employee evaluations, and my trial at answering it is in this post.
Can you suggest other points of view? What else you recommend to consider? How do you evaluate testers?




The question goes similar to this:

If I assign one requirement to two testing engineers, one tester will come up with 10 test cases for the given requirement and the other one will come up with 15 test cases. Both of them affirm to cover all the scenarios with their tests.

How can I decide which one is better? Apart from considering the minimum amount of test cases…

Are there other factors to decide who the most efficient tester is?

My answer:
If you don’t want to consider the minimum number of test cases, you can consider the one with the most number of test cases then…

Jokes aside, trying to determine efficiency of testers based on numbers like that will lead you very far from the answer you look for. For instance, to be simplistic, both testers can be doing the exact same tests, just by dividing them differently in 10 or 15 cases. Or both testers can be doing a terrible job, but the mere numbers won’t tell you that. The number of tests executed or planned does not show the contribution of a tester to a project. By counting test cases you are looking at something very superficial.
The same will happen with any other measurable simple dimensions.

In order to decide which tester is better, we have to do it in the same way we decide which of our friends is the best friend. It takes time and intimate acquaintance.
Which tester gives better and more useful information to managers? To programmers? To peers? Which one communicates better? Which one is funnier? Which one makes the job easier for the rest of the team?
There’s no one discipline where you can rate the testers fairly. One may find the most bugs, the other prevents most bugs, another may help the more peers, other has a better relationship with programmers. And the other one, that one in the corner that looks less efficient than all the other testers… well, that one is the one doing the support work so all the other testers can shine.

Think of all the dimensions where testers affect a product/project. You have to study subjectively the performance of the testers in all these dimensions in order to compare them.
Testing skills, communication skills, teaching skills, writing skills, team skills, truth skills, learning skills…

Yes, it sounds difficult. Maybe because it is difficult.
But for any other arbitrary measurement you pick to make the task easy instead, you may as well randomly draw name slips from a hat.

.

While we are at that, do we really need to know which one is the most efficient? Does this comparison benefits our judgment and decision (and consequently the team and product)?
Done wrongly, you may end up with a group of all non-efficient testers. Again, think of the similarities with the group of your friends: If you propose a clear-cut competition between your pals to determine which one is your best friend, some of them will stop being your friend just for proposing it, and others will become worse friends precisely for trying to answer the defined criteria.

If you have to rate, think about evaluating team mates individually rather than comparatively. May show you facets of your team you hadn’t notice. A conversation with each tester about his own performance can do wonders, too.

And do you really care which one is more efficient? Would you prefer efficiency over honesty? Over willingness? Over results?

.

(And BTW, if any of the two testers really tells you he “covered all the scenarios“, then he is not a very efficient tester… An efficient tester knows it is impossible to cover all scenarios, and even more with just looking at a requirement (before actually running the app to feel it).)

Well, that’s my point of view, and I invite dissent and discussion.
Write your own opinion below!

20 thoughts on “How do you evaluate testers? A question from StackExchange”

  1. A testing job is influenced by the tester’s personality and it is very subjective. That’s why you can’t find two persons who can test in the same way. It is a very difficulty task to evaluate or to compare two testers. I think testers should be evaluated as a team, not as individuals because every person improves the team in a unique way.

    1. Hi James!
      Even comparison between teams carry the same problems as comparisons between individuals. How about assessing the value of the organization by what it is, with all teams inside?

  2. One other issue is when you are in a situation where you do not have a co-signer then you may genuinely wish to try to exhaust all of your educational funding options. You’ll find many grants or loans and other scholarships and grants that will supply you with funds to assist with university expenses. Thanks for the post.

  3. Pingback: inspired minds
  4. I have had the experience of both the military and civilian rating systems. What I have found is that the civilian system is not as clear. They will rate you by the numbers, but judge you by subjectivity which you will never see. All you see is the numbers. On the military side, they are forced to document exactly why a person holds a specific subjectivity rating. For instance; Shmuel, through the use of humor and clear communications, focused the team to acheive efort A in a short time. Team members commented how easy and fun it was to be on his team.
    The numbers have a place, but the subjectivity must always carry the day.

    1. Gary, this military system is interesting. So they do put a lot of weight in the subjective part of evaluation…

      I wonder what makes the big difference between the military and civilian systems. Is it that the civilian system is blindly following results to the point they forget where the results come from?

  5. Nice post. I think efficieny doesnt have to have numbers. There are two types of testers – You can test or you cannot test. …. and I am sure a good manager can soon identify this. And the hard part comes to know among the ones that test.. who is more efficient… and tat comes down to what makes them stand apart… and these cannot be answered by number of defects logged or tests executed. Its more subjective.

  6. Shmuel,

    Good, thought-provoking post.

    I’m reminded of the sentiment I read recently that resonated with me: “When a metric becomes a target, it is no longer valid as an objective metric.” I know I’m paraphrasing someone, but I’m not sure who. I think there is a lot of truth to it.

    I’m not suggesting that numerical metrics would be perfect were it not for conflicts of interest that encourage people to game the system. I’m saying that given how strong the incentives are to game the system, I agree with you that it is a tricky question to try to answer.

    – Justin

    1. Thanks for the comment, Justin!

      I liked your quote. Now, one observation regarding it:
      When you use a metric to measure something ‘dead’, like the height of a building, you can believe you’re doing a true measurement.
      But when you use a metric to measure (or worse when you call it ‘evaluate’) a living, feeling and caring person, you are always transforming the metric system into a target. People will behave in whatever way you measure them.
      Coming to think of it, it’s like measuring a child by the door. Kids will always stretch their body to get a higher pencil mark, even if they’re not getting any explicit or tangible reward for being tall… The feeling of progress makes children work involuntarily towards that different measurement. When you measure a person, you are interfering with the measurement you think you are doing.

  7. Thank you Shmuel. I should bring you together with one of my acquaintances. He is an senior official in in a workers union and claims that since there is no exact way to measure the workers, they shouldn’t be measured at all.

    1. Erez, I’d be very interested in hearing his perspective and learning from his experience.
      (If you have friends that think the opposite, I’ll still be interested to hear and learn too).

  8. So what you are saying is that there is no systematic way to tell how good a tester is.
    When you work in an organization you do need a way to value the employee, in this case, the tester, in order to motivate better results and reward the better. How will you do it?

    1. Thanks for the comment, Erez.

      What I am saying is that there is a systematic way to tell how good a tester is. But it is not an easy “paint by numbers” method. In fact, including numbers is a step for the result to go awry.

      The systematic method involves trust and respect, and like I wrote, it requires intimacy and subjectiveness (don’t know why people are afraid of sujectivity when you can’t avoid it, just maybe mask it).
      The systematic method involves constant feedback and conversations about value. People usually know (or can talk about) if/how they can do the details better if they have a clear picture of their general situation.

      Tom DeMarco (yes, the one that once said “You can’t control what you can’t measure) wrote in a recent article that “Most things that really matter—honor, dignity, discipline, personality, grace under pressure, values, ethics, resourcefulness, loyalty, humor, kindness—aren’t measurable.“. These things are not measurable in numbers, but they are easy to feel.
      Link to the article: http://www2.computer.org/cms/Computer.org/ComputingNow/homepage/2009/0709/rW_SO_Viewpoints.pdf .

      Talking with the tester and with the team, you can get a sense of who provides value in his own eye and in the eyes of others. And this value can be of a kind that surprises you – if you set a specific defined criteria beforehand, you’re sure miss these kinds.

      I also recommend you to read Michael Bolton’s comment on Matthew Heusser metrics post: http://xndev.blogspot.com/2009/05/metrics-schmetrics.html?showComment=1241714160000#c4345701353180041559

      This came out long. Maybe I’ll expand it and turn it into a follow-up blog post :)

      1. Yes Shmuel, That’s a tough one…

        Most managers tend to do that 2 days after the company evaluation deadline (that’s for test managers, developers usually take longer :-) ),
        But to be fair, one needs not only to track & weight so many factors – but also do that the whole period since last review.

        Again & Again, I find myself having to explain to R&D and Project managers that the amount of bugs one finds are not a measure.
        One can work on a new GUI application feature and find many trivial bugs, his colleague may work for same 2 months on a most crucial embedded feature and find only 2 Critical bugs.
        Who made a better job?
        No one could tell from these details.

        Kobi

        1. Kobi, thanks for the comments.
          I’d like to hear more about the conversation with managers. How do that explanations go? Can you share some of your experience with these conversations? I’d love to learn more.

      2. Shmuel,
        I generally agree with your argument that subjective parameters are important part of evaluation of a tester. But shouldn’t we use objective parameters as well? Don’t you think that, for example, counting bugs number and weight, or other numerical values can provide valid input for that purpose. You may be able to find some combination of such numbers that can be added to the subjective impression from the employee, don’t you think so?

        1. Yes, they can provide inputs, but the value they add is minimal, so we may as well ignore them — specially since they are dangerous…

          It does not make a difference if a tester found 35 or 50 bugs, it’s the same number… But if he found 0 or 2 bugs, it is a good number to look at, the guy may be a lousy tester, right? On the other hand, if he found 1 or 2 bugs because he is a lousy tester, but the manager did not perceive that without looking at the numbers, then the manager is a lousy manager too. :)

Leave a Reply