Java Volatile Keyword

The Java volatile keyword is used to mark a Java variable as “being stored in main memory”. More precisely that means, that every read of a volatile variable will be read from the computer’s main memory, and not from the CPU cache, and that every write to a volatile variable will be written to main memory, and not just to the CPU cache.

Actually, since Java 5 the volatile keyword guarantees more than just that volatile variables are written to and read from main memory. I will explain that in the following sections.

The Java volatile Visibility Guarantee

The Java volatile keyword guarantees visibility of changes to variables across threads. This may sound a bit abstract, so let me elaborate.

In a multithreaded application where the threads operate on non-volatile variables, each thread may copy variables from main memory into a CPU cache while working on them, for performance reasons. If your computer contains more than one CPU, each thread may run on a different CPU. That means, that each thread may copy the variables into the CPU cache of different CPUs. This is illustrated here:

java-volatile-1

 

With non-volatile variables there are no guarantees about when the Java Virtual Machine (JVM) reads data from main memory into CPU caches, or writes data from CPU caches to main memory. This can cause several problems which I will explain in the following sections.

Imagine a situation in which two or more threads have access to a shared object which contains a counter variable declared like this:

public class SharedObject {

    public int counter = 0;

}

Imagine too, that only Thread 1 increments the counter variable, but both Thread 1 and Thread 2 may read the counter variable from time to time.

If the counter variable is not declared volatile there is no guarantee about when the value of the countervariable is written from the CPU cache back to main memory. This means, that the counter variable value in the CPU cache may not be the same as in main memory. This situation is illustrated here:

java-volatile-2

 

The problem with threads not seeing the latest value of a variable because it has not yet been written back to main memory by another thread, is called a “visibility” problem. The updates of one thread are not visible to other threads.

By declaring the counter variable volatile all writes to the counter variable will be written back to main memory immediately. Also, all reads of the counter variable will be read directly from main memory. Here is how the volatile declaration of the counter variable looks:

public class SharedObject {

    public volatile int counter = 0;

}

Declaring a variable volatile thus guarantees the visibility for other threads of writes to that variable.

The Java volatile Happens-Before Guarantee

Since Java 5 the volatile keyword guarantees more than just the reading from and writing to main memory of variables. Actually, the volatile keyword guarantees this:

  • If Thread A writes to a volatile variable and Thread B subsequently reads the same volatile variable, then all variables visible to Thread A before writing the volatile variable, will also be visible to Thread B after it has read the volatile variable.
  • The reading and writing instructions of volatile variables cannot be reordered by the JVM (the JVM may reorder instructions for performance reasons as long as the JVM detects no change in program behaviour from the reordering). Instructions before and after can be reordered, but the volatile read or write cannot be mixed with these instructions. Whatever instructions follow a read or write of a volatile variable are guaranteed to happen after the read or write.

These statements require a deeper explanation.

When a thread writes to a volatile variable, then not just the volatile variable itself is written to main memory. Also all other variables changed by the thread before writing to the volatile variable are also flushed to main memory. When a thread reads a volatile variable it will also read all other variables from main memory which were flushed to main memory together with the volatile variable.

Look at this example:

Thread A:
    sharedObject.nonVolatile = 123;
    sharedObject.counter     = sharedObject.counter + 1;

Thread B:
    int counter     = sharedObject.counter;
    int nonVolatile = sharedObject.nonVolatile;

Since Thread A writes the non-volatile variable sharedObject.nonVolatile before writing to the volatilesharedObject.counter, then both sharedObject.nonVolatile and sharedObject.counter are written to main memory when Thread A writes to sharedObject.counter (the volatile variable).

Since Thread B starts by reading the volatile sharedObject.counter, then both the sharedObject.counterand sharedObject.nonVolatile are read from main memory into the CPU cache used by Thread B. By the time Thread B reads sharedObject.nonVolatile it will see the value written by Thread A.

Developers may use this extended visibility guarantee to optimize the visibility of variables between threads. Instead of declaring each and every variable volatile, only one or a few need be declared volatile. Here is an example of a simple Exchanger class written after that principle:

public class Exchanger {

    private Object   object       = null;
    private volatile hasNewObject = false;

    public void put(Object newObject) {
        while(hasNewObject) {
            //wait - do not overwrite existing new object
        }
        object = newObject;
        hasNewObject = true; //volatile write
    }

    public Object take(){
        while(!hasNewObject){ //volatile read
            //wait - don't take old object (or null)
        }
        Object obj = object;
        hasNewObject = false; //volatile write
        return obj;
    }
}

Thread A may be putting objects from time to time by calling put(). Thread B may take objects from time to time by calling take(). This Exchanger can work just fine using a volatile variable (without the use of synchronized blocks), as long as only Thread A calls put() and only Thread B calls take().

However, the JVM may reorder Java instructions to optimize performance, if the JVM can do so without changing the semantics of the reordered instructions. What would happen if the JVM switched the order of the reads and writes inside put() and take()? What if put() was really executed like this:

while(hasNewObject) {
    //wait - do not overwrite existing new object
}
hasNewObject = true; //volatile write
object = newObject;

Notice the write to the volatile variable hasNewObject is now executed before the new object is actually set. To the JVM this may look completely valid. The values of the two write instructions do not depend on each other.

However, reordering the instruction execution would harm the visibility of the object variable. First of all, Thread B might see hasNewObject set to true before Thread A has actually written a new value to the object variable. Second, there is now not even a guarantee about when the new value written to objectwill be flushed back to main memory (well – the next time Thread A writes to a volatile variable somewhere…).

To prevent situations like the one described above from occurring, the volatile keyword comes with a “happens before guarantee“. The happens before guarantee guarantees that read and write instructions of volatile variables cannot be reordered. Instructions before and after can be reordered, but the volatile read/write instruction cannot be reordered with any instruction occurring before or after it.

Look at this example:

sharedObject.nonVolatile1 = 123;
sharedObject.nonVolatile2 = 456;
sharedObject.nonVolatile3 = 789;

sharedObject.volatile     = true; //a volatile variable

int someValue1 = sharedObject.nonVolatile4;
int someValue2 = sharedObject.nonVolatile5;
int someValue3 = sharedObject.nonVolatile6;

The JVM may reorder the first 3 instructions, as long as all of them happens before the volatile write instruction (they must all be executed before the volatile write instruction).

Similarly, the JVM may reorder the last 3 instructions as long as the volatile write instruction happens before all of them. None of the last 3 instructions can be reordered to before the volatile write instruction.

That is basically the meaning of the Java volatile happens before guarantee.

volatile is Not Always Enough

Even if the volatile keyword guarantees that all reads of a volatile variable are read directly from main memory, and all writes to a volatile variable are written directly to main memory, there are still situations where it is not enough to declare a variable volatile.

In the situation explained earlier where only Thread 1 writes to the shared counter variable, declaring the counter variable volatile is enough to make sure that Thread 2 always sees the latest written value.

In fact, multiple threads could even be writing to a shared volatile variable, and still have the correct value stored in main memory, if the new value written to the variable does not depend on its previous value. In other words, if a thread writing a value to the shared volatile variable does not first need to read its value to figure out its next value.

As soon as a thread needs to first read the value of a volatile variable, and based on that value generate a new value for the shared volatile variable, a volatile variable is no longer enough to guarantee correct visibility. The short time gap in between the reading of the volatile variable and the writing of its new value, creates an race condition where multiple threads might read the same value of the volatilevariable, generate a new value for the variable, and when writing the value back to main memory – overwrite each other’s values.

The situation where multiple threads are incrementing the same counter is exactly such a situation where a volatile variable is not enough. The following sections explain this case in more detail.

Imagine if Thread 1 reads a shared counter variable with the value 0 into its CPU cache, increment it to 1 and not write the changed value back into main memory. Thread 2 could then read the same countervariable from main memory where the value of the variable is still 0, into its own CPU cache. Thread 2 could then also increment the counter to 1, and also not write it back to main memory. This situation is illustrated in the diagram below:java-volatile-3

 

Thread 1 and Thread 2 are now practically out of sync. The real value of the shared counter variable should have been 2, but each of the threads has the value 1 for the variable in their CPU caches, and in main memory the value is still 0. It is a mess! Even if the threads eventually write their value for the shared counter variable back to main memory, the value will be wrong.

When is volatile Enough?

As I have mentioned earlier, if two threads are both reading and writing to a shared variable, then using the volatile keyword for that is not enough. You need to use a synchronized in that case to guarantee that the reading and writing of the variable is atomic. Reading or writing a volatile variable does not block threads reading or writing. For this to happen you must use the synchronized keyword around critical sections.

As an alternative to a synchronized block you could also use one of the many atomic data types found in the java.util.concurrent package. For instance, the AtomicLong or AtomicReference or one of the others.

In case only one thread reads and writes the value of a volatile variable and other threads only read the variable, then the reading threads are guaranteed to see the latest value written to the volatile variable. Without making the variable volatile, this would not be guaranteed.

The volatile keyword is guaranteed to work on 32 bit and 64 variables.

Performance Considerations of volatile

Reading and writing of volatile variables causes the variable to be read or written to main memory. Reading from and writing to main memory is more expensive than accessing the CPU cache. Accessing volatile variables also prevent instruction reordering which is a normal performance enhancement technique. Thus, you should only use volatile variables when you really need to enforce visibility of variables.

Comparable and Comparator in Java Example

Comparable and Comparator in Java are very useful for sorting collection of objects. Java provides some inbuilt methods to sort primitive types array or Wrapper classes array or list. Here we will first learn how we can sort an array/list of primitive types and wrapper classes and then we will use java.lang.Comparableand java.util.Comparator interfaces to sort array/list of custom classes.

Let’s see how we can sort primitive types or Object array and list with a simple program.

package com.journaldev.sort;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class JavaObjectSorting {

    /**
     * This class shows how to sort primitive arrays, 
     * Wrapper classes Object Arrays
     * @param args
     */
    public static void main(String[] args) {
        //sort primitives array like int array
        int[] intArr = {5,9,1,10};
        Arrays.sort(intArr);
        System.out.println(Arrays.toString(intArr));
        
        //sorting String array
        String[] strArr = {"A", "C", "B", "Z", "E"};
        Arrays.sort(strArr);
        System.out.println(Arrays.toString(strArr));
        
        //sorting list of objects of Wrapper classes
        List<String> strList = new ArrayList<String>();
        strList.add("A");
        strList.add("C");
        strList.add("B");
        strList.add("Z");
        strList.add("E");
        Collections.sort(strList);
        for(String str: strList) System.out.print(" "+str);
    }
}

Output of the above program is:

[1, 5, 9, 10]
[A, B, C, E, Z]
 A B C E Z

Now let’s try to sort an array of objects.

package com.journaldev.sort;

public class Employee {

    private int id;
    private String name;
    private int age;
    private long salary;

    public int getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    public long getSalary() {
        return salary;
    }

    public Employee(int id, String name, int age, int salary) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.salary = salary;
    }

    @Override
    //this is overriden to print the user friendly information about the Employee
    public String toString() {
        return "[id=" + this.id + ", name=" + this.name + ", age=" + this.age + ", salary=" +
                this.salary + "]";
    }

}

Here is the code I used to sort the array of Employee objects.

//sorting object array
Employee[] empArr = new Employee[4];
empArr[0] = new Employee(10, "Mikey", 25, 10000);
empArr[1] = new Employee(20, "Arun", 29, 20000);
empArr[2] = new Employee(5, "Lisa", 35, 5000);
empArr[3] = new Employee(1, "Pankaj", 32, 50000);

//sorting employees array using Comparable interface implementation
Arrays.sort(empArr);
System.out.println("Default Sorting of Employees list:\n"+Arrays.toString(empArr));

When I tried to run this, it throws following runtime exception.

Exception in thread "main" java.lang.ClassCastException: com.journaldev.sort.Employee cannot be cast to java.lang.Comparable
	at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:290)
	at java.util.ComparableTimSort.sort(ComparableTimSort.java:157)
	at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
	at java.util.Arrays.sort(Arrays.java:472)
	at com.journaldev.sort.JavaSorting.main(JavaSorting.java:41)

Comparable and Comparator

Java provides Comparable interface which should be implemented by any custom class if we want to use Arrays or Collections sorting methods. Comparable interface has compareTo(T obj) method which is used by sorting methods, you can check any Wrapper, String or Date class to confirm this. We should override this method in such a way that it returns a negative integer, zero, or a positive integer if “this” object is less than, equal to, or greater than the object passed as argument.

After implementing Comparable interface in Employee class, here is the resulting Employee class.

package com.journaldev.sort;

import java.util.Comparator;

public class Employee implements Comparable<Employee> {

    private int id;
    private String name;
    private int age;
    private long salary;

    public int getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    public long getSalary() {
        return salary;
    }

    public Employee(int id, String name, int age, int salary) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.salary = salary;
    }

    @Override
    public int compareTo(Employee emp) {
        //let's sort the employee based on id in ascending order
        //returns a negative integer, zero, or a positive integer as this employee id
        //is less than, equal to, or greater than the specified object.
        return (this.id - emp.id);
    }

    @Override
    //this is required to print the user friendly information about the Employee
    public String toString() {
        return "[id=" + this.id + ", name=" + this.name + ", age=" + this.age + ", salary=" +
                this.salary + "]";
    }

}

Now when we execute the above snippet for Arrays sorting of Employees and print it, here is the output.

Default Sorting of Employees list:
[[id=1, name=Pankaj, age=32, salary=50000], [id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000]]

As you can see that Employees array is sorted by id in ascending order.

But, in most real life scenarios, we want sorting based on different parameters. For example, as a CEO, I would like to sort the employees based on Salary, an HR would like to sort them based on the age. This is the situation where we need to use Java Comparator interface because Comparable.compareTo(Object o)method implementation can sort based on one field only and we can’t chose the field on which we want to sort the Object.

Java Comparator

Comparator interface compare(Object o1, Object o2) method need to be implemented that takes two Object argument, it should be implemented in such a way that it returns negative int if first argument is less than the second one and returns zero if they are equal and positive int if first argument is greater than second one.

Comparable and Comparator interfaces uses Generics for compile time type checking, learn more about Java Generics.

Here is how we can create different Comparator implementation in the Employee class.

/**
     * Comparator to sort employees list or array in order of Salary
     */
    public static Comparator<Employee> SalaryComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return (int) (e1.getSalary() - e2.getSalary());
        }
    };

    /**
     * Comparator to sort employees list or array in order of Age
     */
    public static Comparator<Employee> AgeComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return e1.getAge() - e2.getAge();
        }
    };

    /**
     * Comparator to sort employees list or array in order of Name
     */
    public static Comparator<Employee> NameComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return e1.getName().compareTo(e2.getName());
        }
    };

All the above implementations of Comparator interface are anonymous classes.

We can use these comparator to pass as argument to sort function of Arrays and Collections classes.

//sort employees array using Comparator by Salary
Arrays.sort(empArr, Employee.SalaryComparator);
System.out.println("Employees list sorted by Salary:\n"+Arrays.toString(empArr));

//sort employees array using Comparator by Age
Arrays.sort(empArr, Employee.AgeComparator);
System.out.println("Employees list sorted by Age:\n"+Arrays.toString(empArr));

//sort employees array using Comparator by Name
Arrays.sort(empArr, Employee.NameComparator);
System.out.println("Employees list sorted by Name:\n"+Arrays.toString(empArr));

Here is the output of the above code snippet:

Employees list sorted by Salary:
[[id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000], [id=1, name=Pankaj, age=32, salary=50000]]
Employees list sorted by Age:
[[id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000], [id=1, name=Pankaj, age=32, salary=50000], [id=5, name=Lisa, age=35, salary=5000]]
Employees list sorted by Name:
[[id=20, name=Arun, age=29, salary=20000], [id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=1, name=Pankaj, age=32, salary=50000]]

So now we know that if we want to sort java object array or list, we need to implement java Comparable interface to provide default sorting and we should implement java Comparator interface to provide different ways of sorting.

We can also create separate class that implements Comparator interface and then use it.

Here is the final classes we have explaining Comparable and Comparator in Java.

package com.journaldev.sort;

import java.util.Comparator;

public class Employee implements Comparable<Employee> {

    private int id;
    private String name;
    private int age;
    private long salary;

    public int getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    public long getSalary() {
        return salary;
    }

    public Employee(int id, String name, int age, int salary) {
        this.id = id;
        this.name = name;
        this.age = age;
        this.salary = salary;
    }

    @Override
    public int compareTo(Employee emp) {
        //let's sort the employee based on id in ascending order
        //returns a negative integer, zero, or a positive integer as this employee id
        //is less than, equal to, or greater than the specified object.
        return (this.id - emp.id);
    }

    @Override
    //this is required to print the user friendly information about the Employee
    public String toString() {
        return "[id=" + this.id + ", name=" + this.name + ", age=" + this.age + ", salary=" +
                this.salary + "]";
    }

    /**
     * Comparator to sort employees list or array in order of Salary
     */
    public static Comparator<Employee> SalaryComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return (int) (e1.getSalary() - e2.getSalary());
        }
    };

    /**
     * Comparator to sort employees list or array in order of Age
     */
    public static Comparator<Employee> AgeComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return e1.getAge() - e2.getAge();
        }
    };

    /**
     * Comparator to sort employees list or array in order of Name
     */
    public static Comparator<Employee> NameComparator = new Comparator<Employee>() {

        @Override
        public int compare(Employee e1, Employee e2) {
            return e1.getName().compareTo(e2.getName());
        }
    };
}

Here is the separate class implementation of Comparator interface that will compare two Employees object first on their id and if they are same then on name.

package com.journaldev.sort;

import java.util.Comparator;

public class EmployeeComparatorByIdAndName implements Comparator<Employee> {

    @Override
    public int compare(Employee o1, Employee o2) {
        int flag = o1.getId() - o2.getId();
        if(flag==0) flag = o1.getName().compareTo(o2.getName());
        return flag;
    }

}

Here is the test class where we are using different ways to sort Objects in java using Comparable and Comparator.

package com.journaldev.sort;

import java.util.Arrays;

public class JavaObjectSorting {

    /**
     * This class shows how to sort custom objects array/list
     * implementing Comparable and Comparator interfaces
     * @param args
     */
    public static void main(String[] args) {

        //sorting custom object array
        Employee[] empArr = new Employee[4];
        empArr[0] = new Employee(10, "Mikey", 25, 10000);
        empArr[1] = new Employee(20, "Arun", 29, 20000);
        empArr[2] = new Employee(5, "Lisa", 35, 5000);
        empArr[3] = new Employee(1, "Pankaj", 32, 50000);
        
        //sorting employees array using Comparable interface implementation
        Arrays.sort(empArr);
        System.out.println("Default Sorting of Employees list:\n"+Arrays.toString(empArr));
        
        //sort employees array using Comparator by Salary
        Arrays.sort(empArr, Employee.SalaryComparator);
        System.out.println("Employees list sorted by Salary:\n"+Arrays.toString(empArr));
        
        //sort employees array using Comparator by Age
        Arrays.sort(empArr, Employee.AgeComparator);
        System.out.println("Employees list sorted by Age:\n"+Arrays.toString(empArr));
        
        //sort employees array using Comparator by Name
        Arrays.sort(empArr, Employee.NameComparator);
        System.out.println("Employees list sorted by Name:\n"+Arrays.toString(empArr));
        
        //Employees list sorted by ID and then name using Comparator class
        empArr[0] = new Employee(1, "Mikey", 25, 10000);
        Arrays.sort(empArr, new EmployeeComparatorByIdAndName());
        System.out.println("Employees list sorted by ID and Name:\n"+Arrays.toString(empArr));
    }

}

Here is the output of the above program:

Default Sorting of Employees list:
[[id=1, name=Pankaj, age=32, salary=50000], [id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000]]
Employees list sorted by Salary:
[[id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000], [id=1, name=Pankaj, age=32, salary=50000]]
Employees list sorted by Age:
[[id=10, name=Mikey, age=25, salary=10000], [id=20, name=Arun, age=29, salary=20000], [id=1, name=Pankaj, age=32, salary=50000], [id=5, name=Lisa, age=35, salary=5000]]
Employees list sorted by Name:
[[id=20, name=Arun, age=29, salary=20000], [id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000], [id=1, name=Pankaj, age=32, salary=50000]]
Employees list sorted by ID and Name:
[[id=1, name=Mikey, age=25, salary=10000], [id=1, name=Pankaj, age=32, salary=50000], [id=5, name=Lisa, age=35, salary=5000], [id=10, name=Mikey, age=25, salary=10000]]

The java.lang.Comparable and java.util.Comparator are powerful interfaces that can be used to provide sorting objects in java.

Comparable vs Comparator

  1. Comparable interface can be used to provide single way of sorting whereas Comparator interface is used to provide different ways of sorting.
  2. For using Comparable, Class needs to implement it whereas for using Comparator we don’t need to make any change in the class.
  3. Comparable interface is in java.lang package whereas Comparator interface is present in java.utilpackage.
  4. We don’t need to make any code changes at client side for using Comparable, Arrays.sort() or Collection.sort() methods automatically uses the compareTo() method of the class. For Comparator, client needs to provide the Comparator class to use in compare() method.

What does RandomAccess mean?

RandomAccess is a marker interface, like the Serializable and Cloneable interfaces. All of these marker interfaces do not define methods; instead, they identify a class as having a particular capability.

In the case of Serializable, the interface specifies that if the class is serialized using the serialization I/O classes, then a NotSerializableException will not be thrown (unless the object contains some other class that cannot be serialized). Cloneable similarly indicates that the use of theObject.clone() method for a Cloneable class will not throw a CloneNotSupportedException.

The RandomAccess interface identifies that a particular java.util.List implementation has fast random access. A more accurate name for the interface would have been FastRandomAccess. This interface tries to define an imprecise concept: how fast is fast? The documentation provides a simple guide: if repeated access using the List.get() method is faster than repeated access using the Iterator.next() method, then the List has fast random access. The two types of access are shown in the following code examples:

Object o;
for (int i=0, n=list.size(); i < n; i++)
  o = list.get(i);
Object o;
for (Iterator itr=list.iterator(); itr.hasNext(); )
  o = itr.next();

There is a third loop that combines the previous two loops to avoid the repeated Iterator.hasNext() test on each loop iteration.

Object o;
Iterator itr=list.iterator();
for (int i=0, n=list.size(); i < n;!
 i++)
  o = itr.next();

This last loop relies on the normal situation, where List objects cannot change in size while they are being run without an exception of some sort occuring. So, since the loop size remains the same, you can simply count the accessed elements without testing at each iteration whether the end of the list has been reached. This last loop is generally faster than the one in Example 2. In the context of the RandomAccess interface, the first loop using List.get() should be faster than both of the loops that use Iterator.next() for a list to implement RandomAccess.

How is RandomAccess used?

So now that we know what RandomAccess means, how do we use it? With the other two marker interfaces, Serializable and Cloneable, there are two aspects to using them:

  • Defining classes which implement them, and
  • Using their capabilities via ObjectInput/ObjectOutput and Object.clone().

RandomAccess is a little different. Of course, we still need to decide whether any particular class implements it, but the possible classes are severely restricted: RandomAccess should only be implemented in java.util.List classes. And most such classes are created outside of projects; e.g., the SDK provides the most frequently used implementations, and subclasses of the SDK classes do not need to implement RandomAccess, as they will automatically inherit the capability where appropriate.

The second aspect, using the RandomAccess capability, is also different. Whether a class is Serializable or Cloneable is automatically detected when you use ObjectInput/ObjectOutput and Object.clone(). But RandomAccess has no such automatic support. You need to explicitly check whether a class implements RandomAccess using the instanceof operator:

if (listObject instanceof RandomAccess)

Then you must explicitly choose the appropriate access method, List.get() or Iterator.next(). Clearly, if we test for RandomAccess on every loop iteration, we would be making a lot of redundant calls, and probably losing the benefit of RandomAccess as well. So the pattern to follow in using RandomAccess makes the test outside the loop. The canonical pattern looks like:

Object o;
if (listObject instanceof RandomAccess)
{
  for (int i=0, n=list.size(); i < n; i++)
  {
    o = list.get(i);
    //do something with object o
  }

}
else
{
  Iterator itr = list.iterator();
  for (int i=0, n=list.size(); i < n; i++)
  {
    o = itr.next();
    //do something with object o

  }
}

The speedup from using RandomAccess

I tested the four code loops shown in this article, using the 1.4 beta release, separately testing the -client and -server options. To test the effect of the RandomAccess interface, I used the java.util.ArrayList and java.util.LinkedList classes. ArrayList implements RandomAccess, whileLinkedList does not. ArrayList has an underlying implementation consisting of an array with constant access time for any element, so using the ArrayList iterator is equivalent to using the ArrayList.get() method, but with some additional overhead. LinkedList has an underlying implementation consisting of linked node objects, so it has access time proportional to the shortest distance of the element from either end of the list; iterating sequentially through the list can shortcut the access time by traversing one node after another.

Times shown are the average of three runs, and all times have been normalized to the first table cell; i.e., the time taken by the ArrayList to iterate the list using the List.get() method, using java -client.

Table 1: Access times for loop types and access methods

Loop type (loop test) and access method ArrayList
java -client
LinkedList
java -client
ArrayList
java -server
LinkedList
java -server
loop counter (i<n) and list.get() 100% too long 77.5% too long
iterator (Iterator.hasNext()) and Iterator.next() 141% 219% 109% 213%
iterator (i<n) and iterator.next() 121% 205% 98% 193%
RandomAccess test with loop from row 1 or 3 100% 205% 77.5% 193%

Note that HotSpot is capable of optimizing away accesses that are unnecessary, so the test accessed and operated on the list elements in a way that could not eliminate the list element access. The test code is available here.

The most important results are in the last two rows of the table. The last row shows the times obtained by making full use of the RandomAccess interface; the row before that shows the most optimal general technique for iterating lists, if RandomAccess were not available. The size of the lists I used for the test (and consequently, the number of loop iterations required to access every element) was sufficiently large that the instanceof test had no measurable cost in comparison to the time taken to run the loop. Consequently, we can see that that there was no cost (but also no benefit) in adding the instanceof RandomAccess test when iterating the LinkedList; whereas the ArrayList was iterated more than 20% quicker when the instanceof test was included.

Forward and backward compatibility

What should you do if you are implementing code now? Obviously, you can start developing with a 1.4 (beta) release, but this is not an option everywhere. There are three aspects to using RandomAccess if you are developing code now:

  1. You may want to include code referencing RandomAccess without moving to 1.4; many development environments cannot be upgraded rapidly or to a beta release.
  2. Many projects need their code to be able to run in any JVM, so the code needs to be backwards-compatible to run in JVMs using releases earlier than 1.4, where RandomAccess does not exist.
  3. You will want to make your code forward-compatible so that it will automatically take advantage of RandomAccess when running in a 1.4+ JVM.

Making RandomAccess available to your development environment is the first issue, and this can be as simple as adding the RandomAccess interface to your classpath. Any version of the SDK can create the RandomAccess interface. The definition for RandomAccess is

package java.util;
public interface RandomAccess {}

This interface can be created using javac, as follows:

  1. Create a directory called temp
  2. In temp, create a directory called java
  3. In java, create a directory called util
  4. In util, create a file called RandomAccess.java, containing the definition just given
  5. Compile RandomAccess.java, using javacjavac RandomAccess.java

Now including temp in your classpath should enable classes that refer to RandomAccess to be compiled.

Some Java integrated development environments (IDEs) can make it difficult to add a class to the core SDK packages. If this is the case for your IDE, your only hope is that it accepts an external classpath for compilation purposes, in which case the custom-generated RandomAccess class will need to be held in that external classpath.

We also need to handle RandomAccess in the runtime environment. For pre-1.4 environments, the test if (listObject instanceof RandomAccess) will generate a NoClassDefFoundError at runtime, when the JVM tries to load the RandomAccess class. For the instanceof test to be evaluated, the class has to be loaded; however, we can guard the test so that it is only executed if RandomAccess is available. The simplest way to do this is to check if RandomAccess exists, setting a boolean guard as the outcome of that test:

static boolean RandomAccessExists;
..

  //execute this as early as possible after the 
  //application starts
  try
  {
    Class c =  Class.forName("java.util.RandomAccess"); RandomAccessExists = true; } catch (ClassNotFoundException e) { RandomAccessExists = false; }

Then, finally, we will need to change our instanceof tests to use the RandomAccessExists variable as a guard:

if (RandomAccessExists && (listObject instanceof RandomAccess) )

Now we have the solution for all three aspects mentioned at the beginning of this section:

  1. The RandomAccess interface can be created and compiled easily with any SDK, and this manually-compiled version can be used at compilation time as a stand-in, to compile any code which refers to RandomAccess.
  2. The guarded instanceof test will automatically revert to the Iterator loop if RandomAccess does not exist, and should avoid throwing a NoClassDefFoundError in pre-1.4 JVMs.
  3. The guarded instanceof test will also automatically use the faster loop branch when RandomAccess does exist and the list object implements it.

Overview of Method References

Method references are a feature of Java 8. They are effectively a subset of lambda expressions, because if a lambda expression can be used, then it might be possible to use a method reference, but not always. They can only be used to call a singular method, which obviously reduces the possible places they can be used, unless your code is written to cater for them.

It would be a good idea if you knew the notation for a method reference. In fact, you have probably already seen it assuming you read the title. If not then just look below.

Person::getName 

The example above is the equivalent of writing person.getName(), where person is an instance of Person. Let me tell you a bit more about when you can use method references and show some examples as it makes a lot more sense with them.

Types of Method References

Type Syntax Method Reference Lambda expression
Reference to a static method Class::staticMethod String::valueOf  s -> String.valueOf(s)
Reference to an instance method
of a particular object
instance::instanceMethod s::toString  () -> “string”.toString()
Reference to an instance method
of an arbitrary object of a particular type
Class:instanceMethod String::toString  s -> s.toString()
Reference to a constructor Class::new String::new  () -> new String()

Reference to a Static Method

public class StaticMethodReference{
    public static void main(String args[]) {
        List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
        // Method reference
        list.forEach(StaticMethodReference::print);
        // Lambda expression
        list.forEach(number -> StaticMethodReference.print(number));
        // normal
        for(int number : list) {
            StaticMethodReference.print(number);
        }
    }
    public static void print(final int number) {
        System.out.println("I am printing: " + number);
    }
}

Here, it calls the static method StaticMethodReference.print. This example is pretty simple. There is a static method, and for each element in the list, it calls this method using the element as the input.

Reference to an Instance Method of a Particular Object

public class ParticularInstanceMethodReference {
    public static void main(String args[]) {
        final List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
        final MyComparator myComparator = new MyComparator();
        // Method reference
        Collections.sort(list, myComparator::compare);
        // Lambda expression
        Collections.sort(list, (a,b) -> myComparator.compare(a,b));
    }

    private static class MyComparator {
        public int compare(final Integer a, final Integer b) {
            return a.compareTo(b);
        }
    }
}

Here, it calls the instance method myComparator.compare, where myComparator is a particular instance of MyComparator.

Reference to an Instance Method of an Arbitrary Object of a Particular Type

public class ArbitraryInstanceMethodReference {
    public static void main(String args[]) {
        final List<Person> people = Arrays.asList(new Person("dan"), new Person("laura"));
        // Method reference
        people.forEach(Person::printName);
        // Lambda expression
        people.forEach(person -> person.printName());
        // normal
        for (final Person person : people) {
            person.printName();
        }
    }
 private static class Person {
        private String name;
        public Person(final String name) {
            this.name = name;
        }
        public void printName() {
            System.out.println(name);
        }
    }
}

This calls the method Person.getName for each Person object in the list. Person is the particular type, and the arbitrary object is the instance of Person that is used during each loop. This looks very similar to a reference to a static method, but the difference is how the object is passed to the method reference. Remember, a static reference passes the current object into the method, whereas an arbitrary method reference invokes a method onto the current object.

Reference to a Constructor

public class ConstructorMethodReference {
    public static void main(String args[]) {
        final List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
        // Method Reference
        copyElements(null, ArrayList<Integer>::new);
        // Lambda expression
        copyElements(list, () -> new ArrayList<Integer>());
    }
    private static void copyElements(final List<Integer> list, final Supplier<Collection<Integer>> targetCollection) {
        // Method reference to a particular instance
        list.forEach(targetCollection.get()::add);
    }
}

This is the example I had the most trouble trying to make, as no matter how hard I thought, I couldn’t think of a way this could be used in something complicated. I am sure my opinion would change if I used Java 8 while at work, but for now, I do not see why this type of method reference is particularly useful. The example uses the Supplier functional interface to pass Integer::new into the copyElements method.

Conclusion

In conclusion, method references can be used to make your code even more concise, but they have some restrictions on when they can be used for and what they can do. If you simplify your code by using a lambda expression, then you might be able to make it even shorter by using a method reference. Eventually, your code will be so short your bosses will wonder what you have even been doing as you have only written a few lines of code!

Design of the Shutdown Hooks API

The following Q&A addresses some of the design issues of the Shutdown Hooks API.

Why don’t you provide information as to why the VM is shutting down?

On some platforms a native process can’t distinguish a shutdown due to exit from a shutdown due to termination. Other platforms provide much richer capabilities, in some cases including notification of system suspension and restart or of imminent power failure. In short, it’s impossible to generalize such information in a portable way.

Will shutdown hooks be run if the VM crashes?

If the VM crashes due to an error in native code then no guarantee can be made about whether or not the hooks will be run.

Why are shutdown hooks run concurrently? Wouldn’t it make more sense to run them in reverse order of registration?

Invoking shutdown hooks in their reverse order of registration is certainly intuitive, and is in fact how the C runtime library’s atexit procedure works. This technique really only makes sense, however, in a single-threaded system. In a multi-threaded system such as Java platform the order in which hooks are registered is in general undetermined and therefore implies nothing about which hooks ought to be run before which other hooks. Invoking hooks in any particular sequential order also increases the possibility of deadlocks. Note that if a particular subsystem needs to invoke shutdown actions in a particular order then it is free to synchronize them internally.

Why are hooks just threads, and unstarted ones at that? Wouldn’t it be simpler to use Runnable objects, or Beans-style event and listener patterns?

The approach taken here has two advantages over the more obvious, and more frequently suggested, callback-oriented designs based upon Runnable objects or Beans-style event listeners.

First, it gives the user complete control over the thread upon which a shutdown action is executed. The thread can be created in the proper thread group, given the correct priority, context, and privileges, and so forth.

Second, it simplifies both the specification and the implementation by isolating the VM from the hooks themselves. If shutdown actions were executed as callbacks then a robust implementation would wind up having to create a separate thread for each hook anyway in order for them to run concurrently. The specification would also have to include explicit language about how the threads that execute the callbacks are created.

Aren’t threads pretty expensive things to keep around, especially if they won’t be started until the VM shuts down?

Most implementations of the Java platform don’t actually allocate resources to a thread until it’s started, so maintaining a set of unstarted threads is actually very cheap. If you look at the internals of java.lang.Thread you can see that its various constructors just do security checks and initialize private fields. The native start()method does the real work of allocating a thread stack, etc., to get things going.

What about Personal and Embedded Java? Won’t starting threads during shutdown be too expensive on those platforms?

This API may not be suitable for the smaller Java platforms. Threads in the Java 2 Platform carry more information than threads in JDK 1.1 and p/eJava. A thread has a class loader, it may have some inherited thread-local variables, and, in the case of GUI apps, it may be associated with a specific application context. Threads will come to carry even more information as the platform evolves; for example, the security team is planning to introduce a notion of per-thread user identity in their upcoming authentication framework.

Because of all this contextual information, shutdown hooks would be harder to write and maintain if they were just Runnable objects or Beans-style event listeners. Suppose that a Runnable shutdown hook, or an equivalent event listener, needed a specific bit of thread-contextual information in order to carry out its operations. Such information could be saved in some shared location before the hook is registered. While this is merely awkward, suppose further that threads acquire some new type of contextual information in a future release. If an operation invoked by the hook also evolves to need that information then the code that registers the hook would have to be amended to save that information as well. Making hooks be threads instead of Runnables or event listeners insulates them from this sort of future change.

Okay, but won’t I have to write a lot of code just to register a simple shutdown hook?

No. Simple shutdown hooks can often be written as anonymous inner classes, as in this example:

Runtime.getRuntime().addShutdownHook(new Thread() {
    public void run() { database.close(); }
});

This idiom is fine as long as you’ll never need to cancel the hook, in which case you’d need to save a reference to the hook when you create it.

What about security? Can an untrusted applet register a shutdown hook?

If there is a security manager installed then the addShutdownHook and removeShutdownHook methods check that the caller’s security context permits RuntimePermission("shutdownHooks"). An untrusted applet will not have this permission, and will therefore not be able to register or de-register a shutdown hook.

What happens if a shutdown hook throws an exception and the exception is not caught?

Uncaught exceptions are handled in shutdown hooks just as in any other thread, by invoking the uncaughtException method of the thread’s ThreadGroup object. The default implementation of this method prints the exception’s stack trace to System.err and terminates the thread. Note that uncaught exceptions do not cause the VM to exit; this happens only when all non-daemon threads have finished or when the Runtime.exit method is invoked.

Why did you add the Runtime.halt method? Isn’t it pretty dangerous?

The halt method is certainly powerful, and it should be used with the utmost caution. It’s provided so that applications can insulate themselves from shutdown hooks that deadlock or run for inordinate amounts of time. It also allows applications to force a quick exit in situations where that is necessary.

What happens if finalization-on-exit is enabled? Will finalizers be run before, during, or after shutdown hooks?

Finalization-on-exit processing is done after all shutdown hooks have finished. Otherwise a hook may fail if some live objects are finalized prematurely.

How To Stop A Thread In Java?

How do you stop a thread in java? now-a-days, this has been the popular question in the java interviews. Because, stop() method has been deprecated for some safety reasons. As stop() method has been deprecated, interviewer will be interested in what logic you will be using to stop a thread. There are two ways through which you can stop a thread in java. One is using boolean variable and second one is using interrupt() method. In this post, we will discuss both of these methods.

How To Stop A Thread In Java Using A boolean Variable?

In this method, we declare one boolean variable called flag in a thread. Initially we set this flag as true. Keep the task to be performed in while loop inside the run() method by passing this flag. This will make thread continue to run until flag becomes false. We have defined stopRunning() method. This method will set the flag as false and stops the thread. Whenever you want to stop the thread, just call this method. Also notice that we have declared flag as volatile. This will make thread to read its value from the main memory, thus making sure that thread always gets its updated value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class MyThread extends Thread
{
    //Initially setting the flag as true
    
    private volatile boolean flag = true;
    
    //This method will set flag as false
    
    public void stopRunning()
    {
        flag = false;
    }
    
    @Override
    public void run()
    {
        //Keep the task in while loop
        
        //This will make thread continue to run until flag becomes false
        
        while (flag)
        {
            System.out.println("I am running....");
        }
        
        System.out.println("Stopped Running....");
    }
}
public class MainClass
{  
    public static void main(String[] args)
    {
        MyThread thread = new MyThread();
        
        thread.start();
        
        try
        {
            Thread.sleep(100);
        }
        catch (InterruptedException e)
        {
            e.printStackTrace();
        }
        
        //call stopRunning() method whenever you want to stop a thread
        
        thread.stopRunning();
    }   
}

Output :

I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
Stopped Running….

How To Stop A Thread In Java Using interrupt() Method?

In this method, we use interrupt() method to stop a thread. Whenever you call interrupt() method on a thread, it sets the interrupted status of a thread. This status can be obtained by interrupted() method. This status is used in a whileloop to stop a thread.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class MyThread extends Thread
{   
    @Override
    public void run()
    {
        while (!Thread.interrupted())
        {
            System.out.println("I am running....");
        }
        
        System.out.println("Stopped Running.....");
    }
}
public class MainClass
{  
    public static void main(String[] args)
    {
        MyThread thread = new MyThread();
        
        thread.start();
        
        try
        {
            Thread.sleep(100);
        }
        catch (InterruptedException e)
        {
            e.printStackTrace();
        }
        
        //interrupting the thread
        
        thread.interrupt();
    }   
}

Output :

I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
I am running….
Stopped Running…..

9 Anti-Patterns Every Programmer Should Be Aware Of

A healthy dose of self-criticism is fundamental to professional and personal growth. When it comes to programming, this sense of self-criticism requires the ability to detect unproductive or counter-productive patterns in designs, code, processes, and behaviour. This is why a knowledge of anti-patterns is very useful for any programmer. This article is a discussion of anti-patterns that I have found to be recurring, ordered roughly based on how often I have come across them, and how long it took to undo the damage they caused.

Some of the anti-patterns discussed have elements in common with cognitive biases, or are directly caused by them. Links to relevant cognitive biases are provided as we go along in the article. Wikipedia also has a nice list of cognitive biases for your reference.

And before starting, let’s remember that dogmatic thinking stunts growth and innovation so consider the list as a set of guidelines and not written-in-stone rules. And if I missed anything that you consider to be important, feel free to comment below!

1   Premature Optimization

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.Donald Knuth

Although never is often better than *right* now.Tim Peters, The Zen of Python

What is it?

Optimizing before you have enough information to make educated conclusions about where and how to do the optimization.

Why it’s bad

It is very difficult to know exactly what will be the bottleneck in practice. Attempting to optimize prior to having empirical data is likely to end up increasing code complexity and room for bugs with negligible improvements.

How to avoid it

Prioritize writing clean and readable code that works first, using known and tested algorithms and tools. Use profiling tools when needed to find bottlenecks and optimize the priorities. Rely on measurements and not guesses and speculation.

Examples and signs

Caching before profiling to find the bottlenecks. Using complicated and unproven “heuristics” instead of a known mathematically correct algorithm. Choosing a new and untested experimental web framework that can theoretically reduce request latency under heavy loads while you are in early stages and your servers are idle most of the time.

The tricky part

The tricky part is knowing when the optimization is premature. It’s important to plan in advance for growth. Choosing designs and platforms that will allow for easy optimization and growth is key here. It’s also possible to use “premature optimization” as an excuse to justify writing bad code. Example: writing an O(n2) algorithm to solve a problem when a simpler, mathematically correct, O(n) algorithm exists, simply because the simpler algorithm is harder to understand.

tl;dr

Profile before optimizing. Avoid trading simplicity for efficiency until it is needed, backed by empirical evidence.

2   Bikeshedding

Every once in a while we’d interrupt that to discuss the typography and the color of the cover. And after each discussion, we were asked to vote. I thought it would be most efficient to vote for the same color we had decided on in the meeting before, but it turned out I was always in the minority! We finally chose red. (It came out blue.)Richard Feynman, What Do You Care What Other People Think?

What is it?

Tendency to spend excessive amounts of time debating and deciding on trivial and often subjective issues.

Why it’s bad

It’s a waste of time. Poul-Henning Kamp goes into depth in an excellent email here.

How to avoid it

Encourage team members to be aware of this tendency, and to prioritize reaching a decision (vote, flip a coin, etc. if you have to) when you notice it. Consider A/B testing later to revisit the decision, when it is meaningful to do so (e.g. deciding between two different UI designs), instead of further internal debating.

Richard Feynman teaching a lecture.

Richard Feynman was not a fan of bikeshedding.

Examples and signs

Spending hours or days debating over what background color to use in your app, or whether to put a button on the left or the right of the UI, or to use tabs instead of spaces for indentation in your code base.

The tricky part

Bikeshedding is easier to notice and prevent in my opinion than premature optimization. Just try to be aware of the amount of time spent on making a decision and contrast that with how trivial the issue is, and intervene if necessary.

tl;dr

Avoid spending too much time on trivial decisions.

3   Analysis Paralysis

Want of foresight, unwillingness to act when action would be simple and effective, lack of clear thinking, confusion of counsel […] these are the features which constitute the endless repetition of history.Winston Churchill, Parliamentary Debates

Now is better than never.Tim Peters, The Zen of Python

What is it?

Over-analyzing to the point that it prevents action and progress.

Why it’s bad

Over-analyzing can slow down or stop progress entirely. In the extreme cases, the results of the analysis can become obsolete by the time they are done, or worse, the project might never leave the analysis phase. It is also easy to assume that more information will help decisions when the decision is a difficult one to make ― see information bias and validity bias.

How to avoid it

Again, awareness helps. Emphasize iterations and improvements. Each iteration will provide more feedback with more data points that can be used for more meaningful analysis. Without the new data points, more analysis will become more and more speculative.

Examples and signs

Spending months or even years deciding on a project’s requirements, a new UI, or a database design.

The tricky part

It can be tricky to know when to move from planning, requirement gathering and design, to implementation and testing.

tl;dr

Prefer iterating to over-analyzing and speculation.

4   God Class

Simple is better than complex.Tim Peters, The Zen of Python

What is it?

Classes that control many other classes and have many dependencies and lots of responsibilities.

Why it’s bad

God classes tend to grow to the point of becoming maintenance nightmares ― because they violate the single-responsibility principle, they are hard to unit-test, debug, and document.

How to avoid it

Avoid having classes turn into God classes by breaking up the responsibilities into smaller classes with a single clearly-defined, unit-tested, and documented responsibility. Also see “Fear of Adding Classes” below.

Examples and signs

Look for class names containing “manager”, “controller”, “driver”, “system”, or “engine”. Be suspicious of classes that import or depend on many other classes, control too many other classes, or have many methods performing unrelated tasks.

God classes know about too many classes and/or control too many.

The tricky part

As projects age and requirements and the number of engineers grow, small and well-intentioned classes turn into God classes slowly. Refactoring such classes can become a significant task.

tl;dr

Avoid large classes with too many responsibilities and dependencies.

5   Fear of Adding Classes

Sparse is better than dense.Tim Peters, The Zen of Python

What is it?

Belief that more classes necessarily make designs more complicated, leading to a fear of adding more classes or breaking large classes into several smaller classes.

Why it’s bad

Adding classes can help reduce complexity significantly. Picture a big tangled ball of yarns. When untangled, you will have several separated yarns instead. Similarly, several simple, easy-to-maintain and easy-to-document classes are much preferable to a single large and complex class with many responsibilities (see the God Class anti-pattern above).

Tangled ball of yarn

A tangled ball of yarn. Large classes have a tendency to turn into the software equivalent of this. (Photo by absolut_feli on Flickr)

How to avoid it

Be aware of when additional classes can simplify the design and decouple unnecessarily coupled parts of your code.

Examples and signs

As an easy example consider the following:

class Shape:
    def __init__(self, shape_type, *args):
        self.shape_type = shape_type
        self.args = args

    def draw(self):
        if self.shape_type == "circle":
            center = self.args[0]
            radius = self.args[1]
            # Draw a circle...
        elif self.shape_type == "rectangle":
            pos = self.args[0]
            width = self.args[1]
            height = self.args[2]
            # Draw rectangle...

Now compare it with the following:

class Shape:
    def draw(self):
        raise NotImplemented("Subclasses of Shape should implement method 'draw'.")

class Circle(Shape):
    def __init__(self, center, radius):
        self.center = center
        self.radius = radius

    def draw(self):
        # Draw a circle...

class Rectangle(Shape):
    def __init__(self, pos, width, height):
        self.pos = pos
        self.width = width
        self.height = height

    def draw(self):
        # Draw a rectangle...

Of course, this is an obvious example, but it illustrates the point: larger classes with conditional or complicated logic in them can, and often should, be broken down into simpler classes. The resulting code will have more classes but will be simpler.

The tricky part

Adding classes is not a magic bullet. Simplifying the design by breaking up large classes requires thoughtful analysis of the responsibilities and requirements.

tl;dr

More classes are not necessarily a sign of bad design.

6   Inner-platform Effect

Those who do not understand Unix are condemned to reinvent it, poorly.Henry Spencer

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.Greenspun’s tenth rule

What is it?

The tendency for complex software systems to re-implement features of the platform they run in or the programming language they are implemented in, usually poorly.

Why it’s bad

Platform-level tasks such as job scheduling and disk cache buffers are not easy to get right. Poorly designed solutions are prone to introduce bottlenecks and bugs, especially as the system scales up. And recreating alternative language constructs to achieve what is already possible in the language leads to difficult to read code and a steeper learning curve for anyone new to the code base. It can also limit the usefulness of refactoring and code analysis tools.

How to avoid it

Learn to use the platform or features provided by your OS or platform instead. Avoid the temptation to create language constructs that rival existing constructs (especially if it’s because you are not used to a new language and miss your old language’s features).

Examples and signs

Using your MySQL database as a job queue. Reimplementing your own disk buffer cache mechanism instead of relying on your OS’s. Writing a task scheduler for your web-server in PHP. Defining macros in C to allow for Python-like language constructs.

The tricky part

In very rare cases, it might be necessary re-implement parts of the platform (JVM, Firefox, Chrome, etc.).

tl;dr

Avoid re-inventing what your OS or development platform already does well.

7   Magic Numbers and Strings

Explicit is better than implicit.Tim Peters, The Zen of Python

What is it?

Using unnamed numbers or string literals instead of named constants in code.

Why it’s bad

The main problem is that the semantics of the number or string literal is partially or completely hidden without a descriptive name or another form of annotation. This makes understanding the code harder, and if it becomes necessary to change the constant, search and replace or other refactoring tools can introduce subtle bugs. Consider the following piece of code:

def create_main_window():
    window = Window(600, 600)
    # etc...

What are the two numbers there? Assume the first is window width and the second in window height. If it ever becomes necessary to change the width to 800 instead, a search and replace would be dangerous since it would change the height in this case too, and perhaps other occurrences of the number 600 in the code base.

String literals might seem less prone to these issues but having unnamed string literals in code makes internationalization harder, and can introduce similar issues to do with instances of the same literal having different semantics. For example, homonyms in English can cause a similar issue with search and replace; consider two occurrences of “point”, one in which it refers to a noun (as in “she has a point”) and the other as a verb (as in “to point out the differences…”). Replacing such string literals with a string retrieval mechanism that allows you to clearly indicate the semantics can help distinguish these two cases, and will also come in handy when you send the strings for translation.

How to avoid it

Use named constants, resource retrieval methods, or annotations.

Examples and signs

Simple example is shown above. This particular anti-pattern is very easy to detect (except for a few tricky cases mentioned below.)

The tricky part

There is a narrow grey area where it can be hard to tell if certain numbers are magic numbers or not. For example the number 0 for languages with zero-based indexing. Other examples are use of 100 to calculate percentages, 2 to check for parity, etc.

tl;dr

Avoid having unexplained and unnamed numbers and string literals in code.

8   Management by Numbers

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.Bill Gates

What is it?

Strict reliance on numbers for decision making.

Why it’s bad

Numbers are great. The main strategy to avoid the first two anti-patterns mentioned in this article (premature optimization and bikeshedding) was to profile or do A/B testing to get some measurements that can help you optimize or decide based on numbers instead of speculating. However, blind reliance on numbers can be dangerous. For example, numbers tend to outlive the models in which they were meaningful, or the models become outdated and no longer accurately represent reality. This can lead to poor decisions, especially if they are fully automated ― see automation bias.

Pryzbylewski from the show "The Wire" teaching a classroom.

Do you find yourself commiserating with Pryzbylewski from the HBO show The Wire, Season 4?

Another issue with reliance on numbers for determining (and not merely informing) decisions is that the measurement processes can be manipulated over time to achieve the desired numbers instead ― see observer-expectancy effect. Grade inflation is an example of this. The HBO show The Wire (which, by the way, if you haven’t seen, you must!) does an excellent job of portraying this issue of reliance on numbers, by showing how the police department and later the education system have replaced meaningful goals with a game of numbers. Or if you prefer charts, the following one showing the distribution of scores on a test with a passing score of 30%, illustrates the point perfectly.

Score distribution of the 2013 high school exit exam in Poland with passing score of 30%.

Score distribution of the high school exit exam in Poland with passing score of 30%. Source in Polish, and the the Reddit post that I first saw this in.

How to avoid it

Use measurements and numbers wisely, not blindly.

Examples and signs

Using only lines of code, number of commits, etc. to judge the effectiveness of programmers. Measuring employee contribution by the numbers of hours they spend at their desks.

The tricky part

The larger the scale of operations, the higher the number of decisions that will need to be made, and this means automation and blind reliance on numbers for decisions begins to creep into the processes.

tl;dr

Use numbers to inform your decisions, not determine them.

9   Useless (Poltergeist) Classes

It seems that perfection is attained, not when there is nothing more to add, but when there is nothing more to take away.Antoine de Saint Exupéry

What is it?

Useless classes with no real responsibility of their own, often used to just invoke methods in another class or add an unneeded layer of abstraction.

Why it’s bad

Poltergeist classes add complexity, extra code to maintain and test, and make the code less readable—the reader first needs to realize what the poltergeist does, which is often almost nothing, and then train herself to mentally replace uses of the poltergeist with the class that actually handles the responsibility.

How to avoid it

Don’t write useless classes, or refactor to get rid of them. Jack Diederich has a great talk titled Stop Writing Classes that is related to this anti-pattern.

Examples and signs

A couple of years ago, while working on my master’s degree, I was a teaching assistant for a first-year Java programming course. For one of the labs, I was given the lab material which was to be on the topic of stacks and using linked lists to implement them. I was also given the reference “solution”. This is the solution Java file I was given, almost verbatim (I removed the comments to save some space):

import java.util.EmptyStackException;
import java.util.LinkedList;

public class LabStack<T> {
    private LinkedList<T> list;

    public LabStack() {
        list = new LinkedList<T>();
    }

    public boolean empty() {
        return list.isEmpty();
    }

    public T peek() throws EmptyStackException {
        if (list.isEmpty()) {
            throw new EmptyStackException();
        }
        return list.peek();
    }

    public T pop() throws EmptyStackException {
        if (list.isEmpty()) {
            throw new EmptyStackException();
        }
        return list.pop();
    }

    public void push(T element) {
        list.push(element);
    }

    public int size() {
        return list.size();
    }

    public void makeEmpty() {
        list.clear();
    }

    public String toString() {
        return list.toString();
    }
}

You can only imagine my confusion looking at the reference solution, trying to figure what the point of the LabStack class was, and what the students were supposed to learn from the utterly pointless exercise of writing it. In case it’s not painfully obvious what’s wrong with the class, it’s that it does absolutely nothing! It simply passes calls through to the LinkedList object it instantiates. The class changes the names of a couple of methods (e.g. makeEmpty instead of the commonly used clear), which will only lead to user confusion. The error checking logic is completely unnecessary since the methods in LinkedListalready do the same (but throw a different exception, NoSuchElementException, yet another possible source of confusion). To this day, I can’t imagine what was going through the authors’ minds when they came up with this lab material. Anytime you see classes that do anything similar to the above, reconsider whether they are really needed or not.

Update (May 23rd, 2015): There were interesting discussions over whether the LabStack class example above is a good example or not on Hacker News as well below in the comments. To clarify, I picked this class as a simple example for two reasons: firstly, in the context of teaching students about stacks, it is (almost) completely useless; and secondly, it adds unnecessary and duplicated code with the error-handling code that is already handled by LinkedList. I would agree that in other contexts, such classes can be useful but even in those cases, duplicating the error checking and throwing a semi-deprecated exception instead of the standard one and renaming methods to less-commonly-used names would be bad practice.

The tricky part

The advice here at first glance looks to be in direct contradiction of the advice in “Fear of Adding Classes”. It’s important to know when classes perform a valuable role and simplify the design, instead of uselessly increasing complexity with no added benefit.

tl;dr

Avoid classes with no real responsibility.