Joseph Mate's Blog

Unit Testing Common Requirements

Posted in Java, Programming, Unit Testing by Joseph on August 28, 2016

Suppose you wanted to unit test java.util.Map. How would you verify each implementation? Take a quick look at the Map javadoc, and you’ll find 20 implementing classes. Each Map implementation has common unit tests for requirements that you want to avoid copying. In this post I illustrate two techniques you can use for testing common requirements over implementations by testing my simplified version of java.util.Map.

I also encountered a similar problem when taking the Coursera Scala Parallel Programming course. The course’s assignments required sequential and parallel implementations. I noticed I ended up copying and pasting my tests of the sequential implementation to the parallel one. I came up with these solutions to de-duplicate the code.

Introduction to Example

In this example we implement a simplified Map interface with three methods, get, put, and getKeys:

public interface Map<K,V>  {
	void put(K key, V value);
	V get(K key);
	List getKeys();
}

We provide two implementations of the interface. InefficientMap uses two ArrayLists to store the keys and values and SortedMap which guarantees that getKeys() returns the keys in sorted order. The implementations are purposely naive because the focus of this article is the testing aspect, not performance.

Method 1: Abstract Class

Abstract classes can be used to test the common requirements of differing implementations. This method provides the least duplicated code. You have all the common test cases in the base class which makes use of a factory method for generating the implementation.

To apply the inheritance you follow these steps:

  1. Create an abstract class for the common unit tests
  2. Add an abstract method that expects the extender provide a new instance the real implementation
  3. Add @Tests the verify the requirements of the interface
  4. Create a unit test class for each implementation and extend from the abstract unit test class
  5. Provide the real implementation expected by the abstract method
  6. Add any additional @Tests that test requirements specific to the implementation

Step 1 and 2: Creating the Abstract Class

You create an abstract class with an abstract factory method for returning the real implementation.

public abstract class InheritingMapTest {
	protect abstract Map<K,V> makeMap();
}

Step 3: Add @Tests

In the abstract unit test class, we include requirements that are common to all maps. The tests make use of the factory method for getting an instance of the implementation.

Here we look at only one of the many test cases for the Map interface. The factory provides us with a mechanism for obtaining a new instance of the implementation.

Getting the value using a key that was replaced is should return the most recent value.

@Test
public void testValueReplaced() {
	Map<Integer,Integer> map = makeMap();
	map.put(0, 10);
	map.put(0, 12);
	assertEquals(Integer.valueOf(12), map.get(0));
}

Step 4 and 5: Create a Unit Test Class for each Implementation

Now for each implementation we extend from the abstract class and implement the factory method.

public class InheritingSortedMapTest extends InheritingMapTest {
	protected Map<Integer,Integer> makeMap() {
		return new SortedMap<>(IntComparator.intComparator);
	}
}

Step 6: Add Additional Tests of Uncommon Requirements to Subclass

The sorted map comes with the additional requirement that the getKeys() method returns the keys in sorted order. We add tests of this new requirement into the subclass.

@Test
public void testKeysSorted() {
	SortedMap<Integer,Integer> map = new SortedMap<>(IntComparator.intComparator);
	map.put(2,12);
	map.put(1,11);
	map.put(3,13);
	List keys = map.getKeys();
	for(int i = 0 ; i < 3; i++) {
		assertEquals(Integer.valueOf(i+1),keys.get(i));
	}
}

Method 2: Composition

An alternative to this pattern is to include all the tests in a helper class and have each unit test call the testAll() method. The problem with this solution is that you end up having a method which calls all your test methods which duplicates some code. More importantly, it makes it more difficult to figure out which test failed because you have to inspect the stack trace instead of click on the failed test in an IDE like eclipse.

  1. Create a factory interface
  2. Create class containing that accepts the factory in the constructor
  3. Add tests the verify the requirements of the interface
  4. Call each test method from a public ‘testAll()’ method
  5. Create a unit test class for each implementation that creates and instance of the test class
  6. Provide an implementation of the factory to the test class
  7. Add any additional @Tests that test requirements specific to the implementation

Step 1 and 2: Create the Factory and Test Class

The ‘Testing Class’ has a Factory for creating new instances of the implementation when it needs it.

public class CompositionMapTest {
	public interface MapFactory {
		/**
		 * Return a new instance of your implementation of Map<>;
		 */
		Map<Integer,Integer> make() ;
	}

	private MapFactory mapFactory;

	public CompositionMapTest(MapFactory mapFactory) {
		this.mapFactory = mapFactory;
	}
}

Step 3 and 4: Add Tests

We make use of the factory to create an instance of the implementation and verify the object matches the specification.

private void testValueReplaced() {
	Map<Integer,Integer> map = mapFactory.make();
	map.put(0, 10);
	map.put(0, 12);
	assertEquals(Integer.valueOf(12), map.get(0));
}

The ‘Testing Class’ invokes all the test methods in a “testAll()” method which the real test classes will call.

public void testAll() {
	...
	testValueReplaced();
	...
}

Step 5 and 6: Create a class for each implementation

To reuse the existing tests of the ‘Test Class’ we create an instance of it, providing it with a Factory for creating implementations. We invoke the test class’s “testAll()” method to check our implementation.

public class CompositionSortedMapTest {
	@Test
	public void testBasicRequirements() {
		CompositionMapTest basicTests = new CompositionMapTest(
			new CompositionMapTest.MapFactory() {
				public Map<Integer,Integer> make() {
					return new SortedMap<>(IntComparator.intComparator);
				}
			});
		basicTests.testAll();
	}
}

Step 7: Add any additional tests for uncommon requirements

We write tests as we would normally for requirements that are not common to all Maps.

@Test
public void testKeysSorted() {
	SortedMap<Integer,Integer> map = new SortedMap<>(IntComparator.intComparator);
	map.put(2,12);
	map.put(1,11);
	map.put(3,13);
	List keys = map.getKeys();
	for(int i = 0 ; i < 3; i++) {
		assertEquals(Integer.valueOf(i+1),keys.get(i));
	}
}

Conclusion

If you notice you’re copying and pasting unit test code for common requirements, consider one of these two techniques. However, these techniques are only guides because you will encounter cases that do not fit cleanly. As an exercise for the reader: how would you extend these techniques to apply to java.util.WeakHashMap? If you want to have a go I’ve made the source available on my github.

Tagged with: , ,

How To Avoid Busy Waiting

Posted in C++, Computer Science, Java, Job Hunt, Programming by Joseph on February 4, 2016

Update 2021-04-08: Viliam spotted an error. Thank you again! I called notify() and wait() on the implicit this rather than the lock! This adds to my argument that the best way to wait for something is to use a Future. There are so many ways to get it wrong.

Programming Interviews Exposed asks us to describe busy waiting and how to avoid it. The book provides a decent solution, but not the best solution. I’ll first introduce the problem and then build upon the solution until we have our best solution.

Busy Waiting

You have busy waiting When one thread, waits for a result from another thread and you use and NOOP/empty loop to wait for that result. We will explore waiting for a result from two sub tasks to demonstrate and solve this problem.

    Thread thread1 = new Thread(new Subtask1Runnable());
    Thread thread2 = new Thread(new Subtask2Runnable());
    thread1.start();
    thread2.start();
    while( waitinigForThread1() && waitingForThread2() ) {
        // empty loop
    }
    // do something with result from thread1 and thread2

You want to avoid this pattern because it wastes CPU cycles:

  1. Other threads you have running could of made use of those cycles
  2. Other processes sharing the same machine could have also used those cycles
  3. If the machine only has one CPU, thread2 could have made use of those cycles

Poor Solution: Sleep

So the issue is that we’re wasting CPU cycles by repeatedly checking the condition. So what if we made the thread sleep for a bit.

    while( waitinigForThread1() && waitingForThread2() ) {
        Thread.sleep(1000); // Sleep for 1 second
    }

This is an improvement over busy waiting as we’re now wasting less cycles. However
there are still some draw backs to this pattern:

  1. You’re still wasting cycles. It may seem like you’re wasting one cycle every
    second. However, every time the thread wakes up it has to reload the thread’s stack, registers, and it has to recompute if the other thread is done. There’s still significant wastage.
  2. How do you set the sleep duration?
    1. Set it too long, and there is a long delay between the threads finishing and
      the main thread recognizing that it’s done
    2. Set it too short, and you increase the amount of wasted cycles

Better But Still Poor Solution: Exponential Backoff

To take away a bit of the tuning, we can add an exponential backoff. To describe the process, imagine that your plane gets delayed for some unknown period of time. What do you do? Well first you might go to the washroom for a few minutes. Then you might get something to eat for half an hour. Then you might play a game on your cellphone for an hour. Finally, you’ll read a book for the next couple of hours. Your waiting tasks become progressively larger as you wait more. That’s the idea behind exponential backoff. You sleep for progressively longer durations as you wait for longer periods of time.

    private static final int MAX_BACKOFF=1000*60*60; // 1 hour
    private static final int INITIAL_BACKOFF=100; // 100ms
    int backOff=INITIAL_BACKOFF;
    while( waitinigForThread1() && waitingForThread2() ) {
        Thread.sleep(backOff); // Sleep for 1 second
        backOff=backOff<<2; //bit shift to the left to multiply by 2
        if(backOff < MAX_BACKOFF) {
            backOff = MAX_BACKOFF;
        }
    }

Now this solution isn’t the best possible solution yet for a couple of reasons:

  1. Ideally we want the the main thread to wake up only once or twice
    1. Once when the first sub task completes
    2. A second time when the second sub tasks completes
  2. We still have to play with the starting back off, and max back off parameters to get decent performance

Decent Solution: Locking or Monitors (Based on Programming Interviews Exposed)

As I hinted in the previous section, is there a way you can just wake up the sleeping thread once each task finishes? This is the solution that Programming Interview Exposed proposes. You can setup up some a lock and have the main thread wait on the lock. Then the two threads ‘notify’ when they have finished. Each ‘notify’ wakes the main thread and the main thread check to see if both threads finished.

This method uses the least possible cycles, but it is error prone. If not coded properly, you can end up with race conditions or deadlocks. Consider our two sub task example:

    Object lock = new Object(); // shared object to synchronize on
    int doneCount = 0;
    Thread thread1 = new Thread(new Runnable() {
        public void run() {
            // ...
            // do sub task 1
            // ...
            synchronized(lock) {
                doneCount++;
                lock.notify();
            }
        }
    });
    Thread thread2 = new Thread(new Runnable() {
        public void run() {
            // ...
            // do sub task 2
            // ...
            synchronized(lock) {
                doneCount++;
                lock.notify();
            }
        }
    });

    // this is the main thread again
    synchronized(lock) {
        thread1.start();
        thread2.start();

        while(doneCount < 2) {
            lock.wait();
        }
    }

This solution is great as the main thread only wakes up twice. Once when sub task 1 finishes, and a second time when sub task 2 finishes.

However, in this example we can mess up in many ways:

  1. If I started the threads outside the synchronized block, the two threads
    could finish before the main thread calls ‘wait()’. This would leave the main thread sleeping forever and no future notifies are called.
  2. If I put the count outside of the synchronized block, the two threads might
    update them at the same time and I’ll get doneCount==1 instead of 2!
  3. If I forget to call notify() on one of the threads, then when the main thread
    might never wake up because it didn’t get one of the notifications.
  4. Placing the entire sub task in the synchronized block instead of just the
    shared data accesses would of only let one sub task run at a time resulting in
    no parallelism.
  5. In a previous version of the above pseudocode I wrote notify() and wait() instead of lock.notify() and lock.wait()! If you use a shared Object lock monitor, you have to remember to call notify() and wait() on that lock instead of this.

As a result, I do not recommend the solution presented in Programming Interviews Exposed.

Good Solution: Futures and Other Concurrency Libraries

Instead of trying to carefully code this pattern every time, you can use libraries that have already correctly implemented these patterns. In our example where we’re waiting for two results from two threads so we can apply Futures.

    ExecutorService executor = ...
    Future<String> future1 = executor.submit( new Callable<String>() {
        public String call() {
            /* do work for first result here */
        }
    });
    Future<String> future2 = executor.submit( new Callable<String>;() {
        public String call() {
            /* do work for second result here */
        }
    } );
    String result1 = future1.get();
    String result2 = future2.get();


From the example we can immediately see that it’s the most elegant solution, and less bug prone. But more importantly, we still use the least possible cycles as execution of the main thread only resumes when each of the futures complete.

Here we were looking at an example waiting on two sub tasks to complete. However, there are many different waiting scenarios and the standard libraries of C# and Java provide many concurrency facilities to make developers’ lives easier. Before trying to code up your own monitor or locking, see if you problem has been solved before.

Summary

If you want to avoid busy waiting you can apply the write concurrent library for your problem. If the library doesn’t exist, you can try using monitors and locks, but be warned that there are many opportunities to mess up. Finally, don’t use busy loops or sleep to wait on tasks.

Preparing For Interviews

Posted in Job Hunt, Programming by Joseph on January 29, 2016

I have completed my Master’s Thesis and all that’s left is paper work. As a result, it’s time to prepare for interviews. I’m going through ‘Programming Interviews Exposed‘ as practice. You can follow my progress at my github repository. Wish me luck in the interview process! I will occasionally make posts if I find something interesting, or can provide deeper insight than the book.

So far I have learned the following things, unrelated to the book, by going through the book:

  • How to properly use github
    • For the longest time, I’ve been committing to the master branch instead of merging into master from a separate branch
  • How difficult it is to handle Unicode characters in C++
  • How to use Gradle for your java builds of small projects
    • I’ve found Gradle far more convenient that using ant/maven and ivy
  • How to do unit testing in C++

Also, by implementing the solutions I’ve been able to practice many things which I would not get exposure to by working through the problems in my head.

Firstly, I reviewed my C/C++ knowledge which hasn’t been used intensely since undergrad. I’m now comfortable with things like manipulating pointers, differences between Java and C++, and the standard library. One interesting caveat is that when you want to used a HashTable data structure, do not used a map<>. That’s actually map backed by tree data structure and some interviews will notice this. You’ll want to use unordered_map<>, which was added in the C++11 standard.

Secondly, by implementing and testing the solution, you discover edge cases you might not have considered by working the problem through your head. I hit many more edge cases than I expected to with my heap implementation.

Finally, during interviews you will be asked to code up the solution. You do not want to waste your precious time recalling C++ or Java libraries. You want to be able to focus as much of your energy on tackling the problem, instead of tackling the language.

Wish me luck!

Tagged with: , , , ,

Pitfalls of Integer.hashCode() and Long.hashCode() With Partitioning

Posted in Java by Joseph on April 16, 2012

Problem

So we want to partition our data into nice even chunks. If you use Integer.hashCode(), Long.hashCode(), or modding by the number of your partitions then the size of each partition will depend on the distribution of your data and the number of partitions. Here’s a quick example that I wrote up to demonstrate.

Here we have the partition size as 8 and all the numbers are even. I place each number into a bucket by taking the modulus of it. Here are the results:

import java.util.Random;

public class HashCodeIssue {

    private static Random random;

    public static void main(String [] args) {
        int partitions = Integer.parseInt( args[0] );
        long seed = Integer.parseInt( args[1] );
        random = new Random(seed);
        int [] buckets = new int[partitions];

        for(int c = 0; c < 100000; c++) {
            buckets[hash(getNextNumber()) % partitions]++;
        }

        for(int c = 0; c< buckets.length; c++) {
            System.out.println("size of bucket " + c + ": " + buckets[c]);
        }

    }

   /**
    * Using a distribution of numbers divided evenly over
    * even numbers.
    */
    private static Integer getNextNumber() {
        return new Integer(random.nextInt(Integer.MAX_VALUE/2) * 2);
    }

   /**
    * Returning the default hashCode() implementation.
    */
    public static int hash(Integer i) {
        return i.hashCode();
    }
}
jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 8 890453985
size of bucket 0: 24781
size of bucket 1: 0
size of bucket 2: 24959
size of bucket 3: 0
size of bucket 4: 25163
size of bucket 5: 0
size of bucket 6: 25097
size of bucket 7: 0
jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 8 234985345
size of bucket 0: 25250
size of bucket 1: 0
size of bucket 2: 24735
size of bucket 3: 0
size of bucket 4: 24866
size of bucket 5: 0
size of bucket 6: 25149
size of bucket 7: 0

Root Cause

Lets take a look at the source code of Integer.hashCode() to find out why the numbers are not being evenly distributed, despite using a hash.

java/lang/Integer.java

/**
* Returns a hash code for this {@code Integer}.
*
* @return  a hash code value for this object, equal to the
*          primitive {@code int} value represented by this
*          {@code Integer} object.
*/
public int hashCode() {
    return value;
}

java/lang/Long.java

/**
 * Returns a hash code for this {@code Long}. The result is
 * the exclusive OR of the two halves of the primitive
* {@code long} value held by this {@code Long}
* object. That is, the hashcode is the value of the expression:
*
* <blockquote>
*  {@code (int)(this.longValue()^(this.longValue()>>>32))}
* </blockquote>
*
* @return  a hash code value for this object.
*/
public int hashCode() {
    return (int)(value ^ (value >>> 32));
}

These hash codes are implemented for speed, as a result they don’t do a good job of jumbling up the numbers. Unfortunately, for developers who want partitions of uniform sizes, this is not a good thing. The issue is that all the values share a common divisor with the the number of partitions (4*2 and y=x*2). As a result, all the values end up in the same buckets.

Solution

There are a couple of solutions to this problem.

1. Adjust The Number Of Partitions

This seems unintuitive at first, but REDUCING the number of partitions to 7 will flatten out the sizes of our partitions because there are a lot less even numbers that share a common divisor with the prime number seven.

jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 7 890453985
size of bucket 0: 14153
size of bucket 1: 14335
size of bucket 2: 14377
size of bucket 3: 14265
size of bucket 4: 14364
size of bucket 5: 14179
size of bucket 6: 14327
jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 7 234985345
size of bucket 0: 14193
size of bucket 1: 14256
size of bucket 2: 14374
size of bucket 3: 14362
size of bucket 4: 14267
size of bucket 5: 14321
size of bucket 6: 14227

2. Use A Better Hash

Now a lot of times you have no idea what distribution of numbers you have but you still want to be able to distribution the numbers uniformly over your partitions. Or, you need a particular number of partitions that will have a common divisor with a lot of numbers coming in. A safe way to do this is to use a hash function that jumble up the numbers better. In the code above, instead of using the default hash implementation, I use the md5 hash function.

/**
* Hash function that uses MD5.
*/
private static int hash(Integer i, int partitions)
throws java.security.NoSuchAlgorithmException {
    MessageDigest m = MessageDigest.getInstance("MD5");
    m.reset();
    m.update(toBytes(i));
    byte[] digest = m.digest();
    BigInteger bigInt = new BigInteger(1,digest);
    return bigInt.mod(new BigInteger(1,toBytes(partitions))).abs().intValue();
}

private static byte[] toBytes(Integer ii) {
    int i = ii.intValue();
    byte[] result = new byte[4];

    result[0] = (byte) (i >> 24);
    result[1] = (byte) (i >> 16);
    result[2] = (byte) (i >> 8);
    result[3] = (byte) (i /*>> 0*/);

    return result;
}
jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 8 890453985
size of bucket 0: 12629
size of bucket 1: 12510
size of bucket 2: 12550
size of bucket 3: 12470
size of bucket 4: 12689
size of bucket 5: 12339
size of bucket 6: 12311
size of bucket 7: 12502
jmate@jmate-Satellite-C650D:~/Desktop/hashcode-issue$ java HashCodeIssue 8 234985345
size of bucket 0: 12472
size of bucket 1: 12533
size of bucket 2: 12632
size of bucket 3: 12634
size of bucket 4: 12489
size of bucket 5: 12352
size of bucket 6: 12616
size of bucket 7: 12272

Conclusion

If your performance depends on the size of your partitions, you should take special care when using Integer.hashCode() or Long.hashCode(). By carefully selecting the number of partitions, you can avoid the problem. If you uncertain about the distribution of numbers you have or you cannot change the number of partitions, you should use a hash function that does a better job of jumbling up the numbers.

Now I realize this example was really contrived because the numbers being partitioned were even. I chose it because it’s really simple to understand and illustrate that  if you don’t use a good hash function or have the correct number of partitions, this problem is going bite you when your numbers have a special distribution. The last time I saw this problem in the real world is when the numbers we were partitioning on usually had a particular end like XXXXX7831. If you have encountered this problem, post a comment! I am interesting in hearing about it.