A brief overview of the Fork/Join Framework in Java

Introduction

The Fork/Join framework is a framework to solve a problem using a concurrent divide-and-conquer approach. They were introduced to complement the existing concurrency API. Before their introduction, the existing ExecutorService implementations were the popular choice to run asynchronous tasks, but they work best when the tasks are homogenous and independent. Running dependent tasks and combining their results using those implementations were not easy. With the introduction of the Fork/Join framework, an attempt was made to address this shortcoming. In this post, we will take a brief look at the API and solve a couple of simple problems to understand how they work.

Solving a non-blocking task

Let’s jump directly into code. Let’s create a task which would return the sum of all elements of a List. The following steps represent our algorithm in pseudo-code:

01. Find the middle index of the list

02. Divide the list in the middle

03. Recursively create a new task which will compute the sum of the left part

04. Recursively create a new task which will compute the sum of the right part

05. Add the result of the left sum, the middle element, and the right sum

Here is the code –

01 @Slf4j
02 public class ListSummer extends RecursiveTask<Integer> {
03   private final List<Integer> listToSum;
04
05   ListSummer(List<Integer> listToSum) {
06     this.listToSum = listToSum;
07   }
08
09   @Override
10   protected Integer compute() {
11     if (listToSum.isEmpty()) {
12       log.info("Found empty list, sum is 0");
13       return 0;
14     }
15
16     int middleIndex = listToSum.size() / 2;
17     log.info("List {}, middle Index: {}", listToSum, middleIndex);
18
19     List<Integer> leftSublist = listToSum.subList(0, middleIndex);
20     List<Integer> rightSublist = listToSum.subList(middleIndex + 1, listToSum.size());
21
22     ListSummer leftSummer = new ListSummer(leftSublist);
23     ListSummer rightSummer = new ListSummer(rightSublist);
24
25     leftSummer.fork();
26     rightSummer.fork();
27
28     Integer leftSum = leftSummer.join();
29     Integer rightSum = rightSummer.join();
30     int total = leftSum + listToSum.get(middleIndex) + rightSum;
31     log.info("Left sum is {}, right sum is {}, total is {}", leftSum, rightSum, total);
32
33     return total;
34   }
35 }

Firstly, we extend the RecursiveTask subtype of the ForkJoinTask. This is the type to extend from when we expect our concurrent task to return a result. When a task does not return a result but only perform an effect, we extend the RecursiveAction subtype. For most of the practical tasks that we solve, these two subtypes are sufficient.

Secondly, both RecursiveTask and RecursiveAction define an abstract compute method. This is where we put our computation.

Thirdly, inside our compute method, we check the size of the list that is passed through the constructor. If it is empty, we already know the result of the sum which is zero, and we return immediately. Otherwise, we divide our lists into two sublists and create two instances of our ListSummer type. We then call the fork() method (defined in ForkJoinTask) on these two instances –

1 leftSummer.fork();
2 rightSummer.fork();

Which cause these tasks to be scheduled for asynchronous execution, the exact mechanism which is used for this purpose will be explained later in this post.

After that, we invoke the join() method (also defined in ForkJoinTask) to wait for the result of these two parts

1 Integer leftSum = leftSummer.join();
2 Integer rightSum = rightSummer.join();

Which are then summed with the middle element of the list to get the final result.

Plenty of log messages have been added to make the example easier to understand. However, when we process a list containing thousands of entries, it might not be a good idea to have this detailed logging, especially logging the entire list.

That’s pretty much it. Let’s create a test class now for a test run –

01 public class ListSummerTest {
02
03   @Test
04   public void shouldSumEmptyList() {
05     ListSummer summer = new ListSummer(List.of());
06     ForkJoinPool forkJoinPool = new ForkJoinPool();
07     forkJoinPool.submit(summer);
08
09     int result = summer.join();
10
11     assertThat(result).isZero();
12   }
13
14   @Test
15   public void shouldSumListWithOneElement() {
16     ListSummer summer = new ListSummer(List.of(5));
17     ForkJoinPool forkJoinPool = new ForkJoinPool();
18     forkJoinPool.submit(summer);
19
20     int result = summer.join();
21
22     assertThat(result).isEqualTo(5);
23   }
24
25   @Test
26   public void shouldSumListWithMultipleElements() {
27     ListSummer summer = new ListSummer(List.of(
28         123456789
29     ));
30     ForkJoinPool forkJoinPool = new ForkJoinPool();
31     forkJoinPool.submit(summer);
32
33     int result = summer.join();
34
35     assertThat(result).isEqualTo(45);
36   }
37 }

In the test, we create an instance of the ForkJoinPool. A ForkJoinPool is a unique ExecutorService implementation for running ForkJoinTasks. It employs a special algorithm known as the work-stealing algorithm. Contrary to the other ExecutorService implementations where there is only a single queue holding all the tasks to be executed, in a work-stealing implementation, each worker thread gets its work queue. Each thread starts executing tasks from their queue.

When we detect that a ForkJoinTask can be broken down into multiple smaller subtasks, we do break them into smaller tasks, and then we invoke the fork() method on those tasks. This invocation causes the subtasks to be pushed into the executing thread’s queue. During the execution, when one thread exhausts its queue/has no tasks to execute, it can “steal” tasks from other thread’s queue (hence the name “work-stealing”). This stealing behaviour is what results in a better throughput than using any other ExecutorService implementations.

Earlier, when we invoked fork() on our leftSummer and rightSummer task instances, they got pushed into the work queue of the executing thread, after which they were “stolen” by other active threads in the pool (and so on) since they did not have anything else to do at that point.

Pretty cool, right?

Solving a blocking task

The problem we solved just now is non-blocking in nature. If we want to solve a problem which does some blocking operation, then to have a better throughput we will need to change our strategy.

Let’s examine this with another example. Let’s say we want to create a very simple web crawler. This crawler will receive a list of HTTP links, execute GET requests to fetch the response bodies, and then calculate the response length. Here is the code –

01 @Slf4j
02 public class ResponseLengthCalculator extends RecursiveTask<Map<String, Integer>> {
03   private final List<String> links;
04
05   ResponseLengthCalculator(List<String> links) {
06     this.links = links;
07   }
08
09   @Override
10   protected Map<String, Integer> compute() {
11     if (links.isEmpty()) {
12       log.info("No more links to fetch");
13       return Collections.emptyMap();
14     }
15
16     int middle = links.size() / 2;
17     log.info("Middle index: {}", links, middle);
18     ResponseLengthCalculator leftPartition = new ResponseLengthCalculator(links.subList(0, middle));
19     ResponseLengthCalculator rightPartition = newResponseLengthCalculator(links.subList(middle + 1, links.size()));
20
21     log.info("Forking left partition");
22     leftPartition.fork();
23     log.info("Left partition forked, now forking right partition");
24     rightPartition.fork();
25     log.info("Right partition forked");
26
27     String middleLink = links.get(middle);
28     HttpRequester httpRequester = new HttpRequester(middleLink);
29     String response;
30     try {
31       log.info("Calling managedBlock for {}", middleLink);
32       ForkJoinPool.managedBlock(httpRequester);
33       response = httpRequester.response;
34     catch (InterruptedException ex) {
35       log.error("Error occurred while trying to implement blocking link fetcher", ex);
36       response = "";
37     }
38
39     Map<String, Integer> responseMap = new HashMap<>(links.size());
40
41     Map<String, Integer> leftLinks = leftPartition.join();
42     responseMap.putAll(leftLinks);
43     responseMap.put(middleLink, response.length());
44     Map<String, Integer> rightLinks = rightPartition.join();
45     responseMap.putAll(rightLinks);
46
47     log.info("Left map {}, middle length {}, right map {}", leftLinks, response.length(), rightLinks);
48
49     return responseMap;
50   }
51
52   private static class HttpRequester implements ForkJoinPool.ManagedBlocker {
53     private final String link;
54     private String response;
55
56     private HttpRequester(String link) {
57       this.link = link;
58     }
59
60     @Override
61     public boolean block() {
62       HttpGet headRequest = new HttpGet(link);
63       CloseableHttpClient client = HttpClientBuilder
64           .create()
65           .disableRedirectHandling()
66           .build();
67       try {
68         log.info("Executing blocking request for {}", link);
69         CloseableHttpResponse response = client.execute(headRequest);
70         log.info("HTTP request for link {} has been executed", link);
71         this.response = EntityUtils.toString(response.getEntity());
72       catch (IOException e) {
73         log.error("Error while trying to fetch response from link {}: {}", link, e.getMessage());
74         this.response = "";
75       }
76       return true;
77     }
78
79     @Override
80     public boolean isReleasable() {
81       return false;
82     }
83   }
84 }

We create an implementation of the ForkJoinPool.ManagedBlocker where we put the blocking HTTP call. This interface defines two methods – block() and isReleasable(). The block() method is where we put our blocking call. After we are done with our blocking operation, we return true indicating that no further blocking is necessary. We return false from the isReleasable() implementation to indicate to a fork-join worker thread that the block() method implementation is potentially blocking in nature. The isReleasable() implementation will be invoked by a fork-join worker thread first before it invokes the block() method. Finally, we submit our  HttpRequester instance to our pool by invoking ForkJoinPool.managedBlock()static method. After that our blocking task will start executing. When it blocks on the HTTP request, the ForkJoinPool.managedBlock() method will also arrange for a spare thread to be activated if necessary to ensure sufficient parallelism.

Let’s take this implementation for a test drive then! Here’s the code –

01 public class ResponseLengthCalculatorTest {
02
03   @Test
04   public void shouldReturnEmptyMapForEmptyList() {
05     ResponseLengthCalculator responseLengthCalculator = newResponseLengthCalculator(Collections.emptyList());
06     ForkJoinPool pool = new ForkJoinPool();
07
08     pool.submit(responseLengthCalculator);
09
10     Map<String, Integer> result = responseLengthCalculator.join();
11     assertThat(result).isEmpty();
12   }
13
14   @Test
15   public void shouldHandle200Ok() {
16     ResponseLengthCalculator responseLengthCalculator = newResponseLengthCalculator(List.of(
17         "http://httpstat.us/200"
18     ));
19     ForkJoinPool pool = new ForkJoinPool();
20
21     pool.submit(responseLengthCalculator);
22
23     Map<String, Integer> result = responseLengthCalculator.join();
24     assertThat(result)
25         .hasSize(1)
26         .containsKeys("http://httpstat.us/200")
27         .containsValue(0);
28   }
29
30   @Test
31   public void shouldFetchResponseForDifferentResponseStatus() {
32     ResponseLengthCalculator responseLengthCalculator = newResponseLengthCalculator(List.of(
33         "http://httpstat.us/200",
34         "http://httpstat.us/302",
35         "http://httpstat.us/404",
36         "http://httpstat.us/502"
37     ));
38     ForkJoinPool pool = new ForkJoinPool();
39
40     pool.submit(responseLengthCalculator);
41
42     Map<String, Integer> result = responseLengthCalculator.join();
43     assertThat(result)
44         .hasSize(4);
45   }
46 }

That’s it for today, folks! As always, any feedback/improvement suggestions/comments are highly appreciated!

All the examples discussed here can be found onGithub (specific commit).

A big shout out to the awesome http://httpstat.us service, it was quite helpful for developing the simple tests.

REST vs WebSocket Comparison and Benchmarks

One of the common questions asked during my #JavaEE7 presentations around the world is how do WebSockets compare with REST ?

First of all, REST is a style of architecture so what really people mean is RESTful HTTP. As an architecture cannot be compared with a technology. But the term is so loosely used that they are used in place of each other commonly.

Lets start with a one line definition for WebSocket …

Bi-directional and full-duplex communication channel over a single TCP connection.

WebSocket solves a few issues with REST, or HTTP in general:

  • Bi-directional: HTTP is a uni-directional protocol where a request is always initiated by client, server processes and returns a response, and then the client consumes it. WebSocket is a bi-directional protocol where there are no pre-defined message patterns such as request/response. Either client or server can send a message to the other party.
  • Full-duplex: HTTP allows the request message to go from client to server and then server sends a response message to the client. At a given time, either client is talking to server or server is talking to client. WebSocket allows client and server to talk independent of each other.
  • Single TCP Connection: Typically a new TCP connection is initiated for a HTTP request and terminated after the response is received. A new TCP connection need to be established for another HTTP request/response. For WebSocket, the HTTP connection is upgraded using standard HTTP Upgrade mechanism and client and server communicate over that same TCP connection for the lifecycle of WebSocket connection.
  • Lean protocol: HTTP is a chatty protocol. Here is the set of HTTP headers sent in request message by Advanced REST Client Chrome extension.

    And the response headers received from WildFly 8:

    These are 663 characters exchanged for a trivial “Hello World” echo. The source code for this simple application is here.For WebSocket, after the initial HTTP handshake, the data is minimally framed with 2 bytes.

Lets take a look at a micro benchmark that shows the overhead caused by REST over a WebSocket echo endpoint. The payload is just a simple text array populated with ‘x’. The source code for the benchmark is available here.

The first graph shows the time (in milliseconds) taken to process N messages for a constant payload size.

websocket-rest-messages

Here is the raw data that feeds this graph:

websocket-rest-constant-payload

This graph and the table shows that the REST overhead increases with the number of messages. This is true because that many TCP connections need to be initiated and terminated and that many HTTP headers need to be sent and received. The last column particularly shows the multiplication factor for the amount of time to fulfill a REST request.

The second graph shows the time taken to process a fixed number of messages by varying the payload size.

websocket-rest-payload

Here is the raw data that feeds this graph:

websocket-rest-constant-messages

This graph shows that the incremental cost of processing the request/response for a REST endpoint is minimal and most of the time is spent in connection initiation/termination and honoring HTTP semantics.

These benchmarks were generated on WildFly 8 and the source code for the benchmark is available here.

Together the graph also shows that WebSocket is a more efficient protocol than RESTful HTTP. But does that mean it will replace RESTful HTTP ?

The answer to that, at least in the short term is, NO!

  • WebSocket is a low-level protocol, think of it as a socket on the web. Every thing, including a simple request/response design pattern, how to create/update/delete resources need, status codes etc to be build on top of it. All of these are well defined for HTTP.
  • WebSocket is a stateful protocol where as HTTP is a stateless protocol. WebSocket connections are know to scale vertically on a single server where as HTTP can scale horizontally. There are some proprietary solutions for WebSocket horizontal scaling, but they are not standards-based.
  • HTTP comes with a lot of other goodies such as caching, routing, multiplexing, gzipping and lot more. All of these need to be defined on top of WebSocket.
  • How will Search Engine Optimization (SEO) work with WebSocket ? Works very well for HTTP URLs.
  • All proxy, DNS, firewalls are not yet fully aware of WebSocket traffic. They allow port 80 but might restrict traffic by snooping on it first.
  • Security with WebSocket is all-or-nothing approach.

This blog does not provide any conclusion because its meant to trigger thoughts!

And if you want a complete introduction to JSR 356 WebSocket API in Java EE 7, then watch a recently concluded webinar at vJUG:

So, what do you think ?

Let’s talk about connection pools.

I claim that:

Default settings of most popular connection pools are poor!

For you, it means:

Go review your connection pool settings.

You might have a problem if you rely on default settings. You may have memory leaks and unresponsive application (even if load is not high at all).

Below I will show some of most important settings and my recommendations how they really should be configured.

What is connection pool?

A plain web application that needs to write or read data from database, does it like this:

  1. Open a connection to DB        // takes N ms
  2. read/write data
  3. close the connection

(by the way, in old good CGI applications it was the only possible approach)

This approach is perfectly fine in many cases. And you probably don’t need anything more. But it has some disadvantages for highly-performant systems:

  • Step 1 can take some time. Probably tens or hundreds of milliseconds (it depends, of course).
  • It’s easy to forget Step 3 (close the connection) which causes a connection leak (causing memory leaks and other problems).

A new hero

That’s why another approach was born: application may preliminarily open a bunch of connections and hold them openall the time. The bunch of open connections is called connection pool. Then any operation looks like this:

  1. Take a DB connection from pool        // blazingly fast in most cases
  2. read/write data
  3. return the connection to the pool

Seems cool. But new power always means new problems.

… and new problems

When using a connection pool, we need to solve (at least) the following questions:

  • How many connections we should keep open?
  • How long should they be kept?
  • What if they appear to be broken?
  • What if application needs more connections than the pool currently has?
  • What if somebody forgets to return connection to pool?

To answer these questions, connection pools have a lot of settings. And their default values are mostly bad. Intrigued? Let me show.

Basic settings

I will consider the 2 most popular connection pools in Java world:

The basic parameters, of cause, are:

  • min size (minimum number of connections that should be open at any moment)
  • initial size (how many connections application opens at start)
  • max size (maximum number of connections in pool)

By the way, these are the only settings that have reasonable defaults. Here they are:

c3p0 HikariCP
min size 3 10
initial size 3 10
max size 15 10

Let’s continue with more problematic settings.

Critical settings

checkout Timeout

How long can application wait until it gets a connection from pool.

  • c3p0 setting: checkoutTimeout
  • HikariCP setting: connectionTimeout

Default values:

c3p0 HikariCP I recommend
checkoutTimeout 30 s 1 ms

Both default values are just disaster. 

As I mentioned, in most cases getting a connection from pool is blazingly fast. Except the case when the pool has no more open connections. Then the pool needs to acquire a new connection (which takes less than a second, as a rule). But if maxSize is reached, pool cannot open new connection, and just waits until somebody returns its connection to pool. But if the application has a connection leak (a bug which prevents connections to be returned), the pool will never get the connection back!

What then happens? 

In case of c3p0, we end up with all Threads frozen in the following state:

01 "qtp1905485420-495 13e09-3211" #495 prio=5 os_prio=0 tid=0x00007f20e078d800 nid=0x10d7 in Object.wait() [0x00007f204bc79000]
02
03    java.lang.Thread.State: WAITING (on object monitor)
04
05 at java.lang.Object.wait(Native Method)
06
07 at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable()
08
09 - locked <0x00000000c3295ef8> (a com.mchange.v2.resourcepool.BasicResourcePool)
10
11 at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource()
12
13     
14
15     at org.hibernate.jpa.internal.QueryImpl.getResultList()
16
17     at domain.funds.FundsRepository.get()
18
19     

It may seem that the HikariCP default “30 seconds” is a bit better. No, it doesn’t really help in high-performant applications. During those 30 seconds, a lot of new requests may come, and all of them are just frozen. Apparently application will get an OutOfMemory error soon. Any waiting just postpones the death of application for a few seconds.

That’s why I recommend to set checkoutTimeout to the minimal possible value: 1ms. Unfortunately we cannot set it to 0 because 0 means endless waiting &#55357;&#56898; The sooner we fail, the more chances we give working threads to complete their job. And we can clearly inform the user that the application is currently overloaded, and he should try later.

test Connection On Checkout

Sometimes connections in pool may die. Database can close them by its initiative, or a system administrator can just break network cable. That’s why pool should monitor connection aliveness.

The easiest setting to do that is “testConnectionOnCheckout” in c3p0 (I haven’t found a similar setting in HikariCP, it seems to be always enabled).

Default values:

c3p0 HikariCP I recommend
testConnectionOnCheckout false true? true

Definitely, it should be enabled by default!

Otherwise you will end up with lots of such exceptions in log:

1 org.hibernate.TransactionException: Unable to rollback against JDBC Connection
2 at o.h.r.j.i.AbstractLogicalConnectionImplementor.rollback()
3 at o.h.r.t.b.j.i.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.rollback(JdbcResourceLocalTransactionCoordinatorImpl.java:294)

P.S. If you want to achieve even better performance, you may consider testing connection in background, not on checkout:

  • testConnectionOnCheckout=false
  • testConnectionOnCheckin=true
  • idleConnectionTestPeriod=10

preferred test query

But how exactly should the pool test connections?

The problem is that it depends on database.

By default, both pools test connections by executing

  • “connection.isValid()” (in case of JDBC4), or
  • “connection.getMetaData().getTables()” (in case of JDBC3)

It may be slow because “getTables()” retrieves meta information about all tables each time. A recommended value is something like

  • “SELECT 1” (in case of MySql), or
  • “SELECT 1 FROM DUAL” (in case of Oracle) etc.

By executing this simple and fast query, the pool can check if a connection is still alive.

max idle time

How long can an unused connection stay in pool

  • c3p0 setting: maxIdleTime
  • HikariCP setting: idleTimeout

Default values:

c3p0 HikariCP I recommend
maxIdleTimeout 10 minutes 1..10 minutes

It’s not probably a big deal, but every opened connection

  • holds some resources inside database
  • prevents other systems from getting connections to the same database (every database has some limit of maximum possible number of connections)

That’s why it’s a good idea to close unused (idle) connection. I recommend to set this value to non-endless period. Probably several minutes is reasonable.

min pool size

How many connections pools should always have (even if unused).

  • c3p0 setting: minPoolSize
  • HikariCP setting: minimumIdle

Default values:

c3p0 HikariCP I recommend
maxIdleTimeout 3 max pool size 0…N

For the same reason, it’s probably a good idea to close unused connections. I would set this value to 0 or 1 in most cases. If some user unexpectedly decides to log in to your application at midnight, he will just wait for a few more milliseconds. Not a big deal.

max connection age

How long a connection may live in pool (no matter if it’s idle or used)

  • c3p0 setting: maxConnectionAge
  • HikariCP setting: maxLifetime

Default values:

c3p0 HikariCP I recommend
maxIdleTimeout 30 minutes say, 30 minutes

Just in case, it’s probably a good idea to close connections time-to-time. Probably it helps to avoid some memory leaks.

A quote from HikariCP documentation:

“We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit.”

unreturned connection timeout

One of typical problems is a connection leak. Some buggy code took a connection from pool and didn’t return it. How to detect this problem?

Fortunately, we have a good setting for this case:

  • c3p0 setting: unreturnedConnectionTimeout
  • HikariCP setting: leakDetectionThreshold

Default values:

c3p0 HikariCP I recommend
maxIdleTimeout disabled disabled 5 minutes?

If any buggy code took a connection and didn’t return it during 5 minutes, the pool will forcedly return the connection and write warnings like this:

1 [C3P0PooledConnectionPoolManager Logging the stack trace by which the overdue resource was checked-out.
2 java.lang.Exception: DEBUG STACK TRACE: Overdue resource check-out stack trace.
3 at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource()
4 at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1885)
5 at domain.application.ApplicationReportSender.sendWeeklyReport(ApplicationReportSender.java:63)

It will help you to find out where is the guilty code.

Conclusion

I gave an overview of some connection pool settings. There is more of them. I gave some advices which seem reasonable from my experience. But your application may have different load. You users may have different behaviour. My advices may seem stupid to you.

No problems. Don’t trust me. But please, also Don’t trust defaults.

Go check your pool settings!

Using the IdentityHashMap in Java

In this article, we’ll discuss IdentityHashMap from the java.util package.

What Will We Learn?

  1. IdentityHashMap Class Overview
  2. IdentityHashMap Class Constructors
  3. IdentityHashMap Class Methods
  4. IdentityHashMap Class Example

Learn Java Collections Framework in-depth at the Java Collections Framework Tutorial.

1. IdentityHashMap Class Overview

This class is not a general-purpose Map implementation! While this class implements the Mapinterface, it intentionally violates Map’s general contract, which mandates the use of the equals method when comparing objects. This class is designed for use only in the rare cases wherein reference-equality semantics are required.IdentityHashMap is a HashTablebased implementation of the Map Interface. Normal HashMap compares keys using ‘.equals’ method. But Identity HashMap compares its keys using ‘==’ operator. Note that this implementation is not synchronized. If multiple threads access an identity hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. For Example:

   Map m = Collections.synchronizedMap(new IdentityHashMap(...));

2. IdentityHashMap Class Constructors

  • IdentityHashMap() — Constructs a new, empty identity hash map with a default expected maximum size (21).
  •  IdentityHashMap(int expectedMaxSize) — Constructs a new, empty map with the specified expected maximum size.
  • IdentityHashMap(Map<? extends K,? extends V>m) — Constructs a new identity hash map containing the key-value mappings in the specified map.

3. IdentityHashMap Class Methods

  • void clear() — Removes all of the mappings from this map.
  • Object clone() — Returns a shallow copy of this identity hash map: the keys and values themselves are not cloned.
  • boolean containsKey(Object key) — Tests whether the specified object reference is key in this identity hash map.
  •  boolean containsValue(Object value)  — Tests whether the specified object reference is a value in this identity hash map.
  •  Set<Map.Entry<K,V>>entrySet()  — Returns a Set view of the mappings contained in this map.
  •  boolean equals(Object o) — Compares the specified object with this map for equality.
  •  void forEach(BiConsumer<? super K,? superV> action)  — Performs the given action for each entry in this map until all entries have been processed or the action throws an exception.
  •  V get(Object key) — Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
  • int hashCode() — Returns the hash code value for this map.
  • boolean isEmpty() — Returns true if this identity hash map contains no key-value mappings.
  •  Set keySet() — Returns an identity-based set view of the keys contained in this map.
  •  V put(K key, V value) — Associates the specified value with the specified key in this identity hash map.
  •  void putAll(Map<? extends K,? extends V> m) — Copies all of the mappings from the specified map to this map.
  •  V remove(Object key) — Removes the mapping for this key from this map if present.
  •  void replaceAll(BiFunction<? super K,? super V,? extends V> function)  —Replaces each entry’s value with the result of invoking the given function on that entry until all entries have been processed or the function throws an exception.
  •  int size() — Returns the number of key-value mappings in this identity hash map.
  •  Collection values() — Returns a Collection view of the values contained in this map.

4. IdentityHashMap Class Example

As we know that  IdentityHashMap is a HashTable –based implementation of the Map Interface. Normal HashMap compares keys using the .equals method. But Identity HashMap compares its keys using the == operator. Hence, ‘a’ and new String(‘a’) are considered as two different keys. The initial size of Identity HashMap is 21, while the initial size of a normal HashMap is 16.

import java.util.IdentityHashMap;
public class IdentityHashMapExample {
 public static void main(final String[] args) {
   final IdentityHashMap<String, String> identityHashMap = new IdentityHashMap<String, String>();
         identityHashMap.put("a", "Java");
         identityHashMap.put(new String("a"), "J2EE");
         identityHashMap.put("b", "J2SE");
         identityHashMap.put(new String("b"), "Spring");
         identityHashMap.put("c", "Hibernate");
         for (final String str : identityHashMap.keySet()) {
             System.out.println("Key : " + str + " and Value : " + identityHashMap.get(str));
         }
         System.out.println("Size of map is : " + identityHashMap.size());
         System.out.println("Here 'a' and new String('a') are considered as separate keys");
 }
}
Output:
Key : a and Value : Java
Key : b and Value : J2SE
Key : c and Value : Hibernate
Key : b and Value : Spring
Key : a and Value : J2EE
Size of map is : 5
Here 'a' and new String('a') are considered as separate keys

Further Learning

Collections Framework – EnumMap
Collections Framework – IdentityHashMap

Collections Framework – CopyOnWriteArraySet
Collections Framework – EnumSet

Collections Framework – CopyOnWriteArrayList

Java Integer Cache – Why Integer.valueOf(127) == Integer.valueOf(127) Is True

In an interview, one of my friends was asked that if we have two Integer objects, Integer a = 127; Integer b = 127; Why a == b evaluate to truewhen both are holding two separate objects? In this article, I will try to answer this question and also try to explain the answer.

Short Answer

The short answer to this question is, direct assignment of an int literal to an Integer reference is an example of auto-boxing concept where the literal value to object conversion code is handled by the compiler, so during compilation phase compiler converts Integer a = 127; to Integer a = Integer.valueOf(127);.

TheInteger class maintains an internal IntegerCache for integers which by default ranges from -128 to 127andInteger.valueOf() method returns objects of mentioned range from that cache. So a == b returns true because aand b both are pointing to the same object.

Long Answer

In order to understand the short answer let’s first understand the Java types, all types in Java lies under two categories

  1. Primitive Types: There are 8 primitive types (byte, short, int, long, float, double, char and boolean) in Java which holds their values directly in the form of binary bits.
    For example int a = 5; int b = 5;, here a and b directly holds the binary value of 5 and if we try to compare a and b using a == b we are actually comparing 5 == 5 which returns true.
  2. Reference Types: All types other than primitive types lies under the category of reference types e.g. Classes, Interfaces, Enums, Arrays etc. and reference types holds the address of the object instead of the object iteslf.
    For example, Integer a = new Integer(5); Integer b = new Integer(5), here a and b do not hold the binary value of 5instead a and b holds memory addresses of two separate objects where both objects contain a value 5. So if we try to compare a and b using a == b, we are actually comparing those two separate memory addresses hence we get false, to perform actual equality on a and b we need to perform a.euqals(b)Reference types are further divided into 4 categories Strong, Soft, Weak and Phantom References.

And we know that Java provides wrapper classes for all primitive types and support auto-boxing and auto-unboxing.

1 // Example of auto-boxing, here c is a reference type
2 Integer c = 128// Compiler converts this line to Integer c = Integer.valueOf(128);
3
4 // Example of auto-unboxing, here e is a primitive type
5 int e = c; // Compiler converts this line to int e = c.intValue();

Now if we create two integer objectsa andb, and try to compare them using the equality operator==, we will getfalsebecause both references are holding different-different objects

Integer a = 128// Compiler converts this line to Integer a = Integer.valueOf(128);
Integer b = 128// Compiler converts this line to Integer b = Integer.valueOf(128);
System.out.println(a == b); // Output -- false

But if we assign the value 127 to both a and b and try to compare them using the equality operator ==, we will get true why?

Integer a = 127// Compiler converts this line to Integer a = Integer.valueOf(127);
Integer b = 127// Compiler converts this line to Integer b = Integer.valueOf(127);
System.out.println(a == b); // Output -- true

As we can see in the code that we are assigning different objects to a and b but a == b can return true only if both aand b are pointing to the same object.

So how the comparison returning true? whats actually happening here? are a and b pointing to the same object?

Well till now we know that the code Integer a = 127; is an example of auto-boxing and compiler automatically converts this line to Integer a = Integer.valueOf(127);.

So it is the Integer.valueOf() method which is returning these integer objects which means this method must be doing something under the hood.

And if we take a look at the source code of Integer.valueOf() method, we can clearly see that if the passed int literal i is greater than IntegerCache.low and less thanIntegerCache.high then the method returns Integer objects fromIntegerCache. Default values for IntegerCache.low and IntegerCache.high are -128 and 127 respectively.

In other words, instead of creating and retruning new integer objects, Integer.valueOf() method returns Integer objects from an internal IntegerCache if the passed int literal is greater than
-128 and less than 127.

/**
 * Returns an {@code Integer} instance representing the specified
 * {@code int} value.  If a new {@code Integer} instance is not
 * required, this method should generally be used in preference to
 * the constructor {@link #Integer(int)}, as this method is likely
 * to yield significantly better space and time performance by
 * caching frequently requested values.
 *
 * This method will always cache values in the range -128 to 127,
 * inclusive, and may cache other values outside of this range.
 *
 * @param  i an {@code int} value.
 * @return an {@code Integer} instance representing {@code i}.
 * @since  1.5
 */
 public static Integer valueOf(int i) {
     if (i >= IntegerCache.low && i <= IntegerCache.high)
         return IntegerCache.cache[i + (-IntegerCache.low)];
     return new Integer(i);
 }

Java caches integer objects which fall into -128 to 127 range because this range of integers gets used a lot in day to day programming which indirectly saves some memory.

As you can see in the following image Integer class maintains an inner static IntegerCache class which acts as the cache and holds integer objects from -128 to 127 and that’s why when we try to get integer object for 127 we always get the same object.

Integer.valueOf(127)

The cache is initialized on first usage when the class get loaded into memory because of the static block. The max range of the cache can be controlled by the -XX:AutoBoxCacheMax JVM option.

This caching behavior is not applicable for Integer objects only, similar to Integer.IntegerCache we also haveByteCache,ShortCache,LongCache,CharacterCache forByteShort,
Long,Character respectively.

Byte, Short and Long have a fixed range for caching between –127 to 127 (inclusive) but for Character, the range is from 0 to 127 (inclusive). The range can be modified via argument only for Integer but not for others.

You can find the complete source code for this article on this Github Repository and please feel free to provide your valuable feedback.