GC Explained: Heap :: Generational Garbage Collectors

JVM heap is divided into two different Generations. One is called Young and the second one is the Old (sometimes referred to as Tenured). The Young Generation is further separated into two main logical sections: Eden and Survivor spaces. There are also Virtual spaces for both Young and the Old Generations which are used by Garbage Collectors to resize other regions – mainly to meet different GC goals.

Image title

Weak Generational Hypothesis

Why is heap divided into the Young and Old Generations? It’s because lots of objects are usually created and used for a relatively short period of time. This observation is called Weak Generational Hypothesis in GC theory. Imagine some objects created and used only inside the loop – assuming that they are not going to be scalarized, every iteration discards previously created objects and creates new ones.

Object Lifecycle

Objects start their journey in Eden of the Young Generation. When Eden fills up, so called Minor GC is performed: all application threads are stopped (stop-the-world pause), objects which are not used anymore are discarded and all other objects from the Eden are moved to the first Survivor space (S0). Next time a Minor GC is performed, the objects go from S0 to the second Survivor space (S1). All live objects from Eden go to S1 as well. Notice that it leads to a differently aged object in the Survivor space – we have objects from Eden and objects which were already in the Survivor space. Next iteration of Minor GC moves the objects from S1 back to the S0, so the Survivor spaces switch every GC. Why do we have two Survivor spaces and why do we switch them? It’s pretty simple – when the object reaches certain age threshold, it is promoted to the Old Generation. It leads to Survivor space fragmentation which can be easily eliminated with moving all objects from S0 to S1 and back every Minor GC.

Eventually, when the Old Generation fills up, a Major GC will be performed on the Old Generation which cleans it up and compacts that space. If and how stop-the-world pauses occur during Major GC depends on specific GC algorithm used.

Besides Minor and Major GC, there is also a Full GC which is about cleaning the entire heap – both Young (by Minor GC) and Old (Tenured) (by Major GC) Generations. Because a Full GC includes Minor GC, it also causes stop-the-world pauses.

Summary

There are two main advantages of having the heap divided into two regions. Firstly, it’s always faster to process only some portion of the heap (stop-the-world pauses take less). Secondly, during Minor GC, all objects from Eden are either moved or discarded which automatically means that this part of the heap is compacted.

Assignments, initialization and jvm etc.

Stack and Heap:

  • Local variables (method variables) live in the stack.
  • Objects and their instance variables live on the heap.
  • Literals and primitive casting
  • Integer literals can be decimal, octal (e.g. 013) or hexadecimal (e.g. 0x3d).
  • Literals for long and in L or l.
  • Float literals and in F or f, double literals end in a digit Dor d.
  • The Boolean literals are tree and false
  • Literals for chars are a single character in side single quotes: ‘d’.

Scope:

  • Scope refer to life time of a variable.
  • There are four basic scopes.
  • Static variables live basically as their class lives.
  • Instance variables liv as long as their object lives.
  • Local variables live as long as their method is on the stach, however, if their method invokes another method, they are temporarily unavailable.
  • Stack variables (e.g. in for or an if) live until the block completes.

Basic Assignments:

  • Literal integers are logically ints.
  • Integer expressions always result in a int-sized result, never smaller.
  • Floating point numbers are implicitly double (64 bits).
  • Narrowing a primitive truncates the high order bits
  • Compound assignments e.g.(+=) perform atomic cast.
  • A reference variable holds the bits that are used to refer to an object.
  • Reference variable can refer to sub class of the declared type but not super class.
  • When creating a new objet, e.g. Button b = new Button() the three things happen:
    1. Make a reference variable b with type Button.
    2. Create a new button object
    3. Assign the Button object to the reference variable b using a variable or array element that is uninitialized an unassigned.
  • When an array of objects is instantiated, objects within the array are not instantiated automatically, but all the references get the default value of null.
  • When an array of primitives is instantiated, elements gets default values.
  • Instance variables are always initialized with default value.
  • Local/ automatic/ method variables are never give a default value. If you attempt to use one before initializing it, you will get a compiler error.

Passing variables into methods:

  • Methods can take primitives and/or object reference as arguments.
  • Method arguments are always copies.
  • Method arguments are never actual objects(they can be reference to objects)
  • A primitive argument Is an unattached copy of the original object.
  • A reference argument is another copy of a reference to the original object.
  • Shadowing occurs when two variables with different scopes share same name, this leads to hard-to-find bugs, and hard-to answer exam questions.

Arrays declaration Construction and Initialization:

  • Arrays can hold primitives or objects, but the arrays itself objects.
  • When you declare an array the brackets can left or right of the name.
  • It’s never legal to include the size of the array in the declaration.
  • You must include the size of the array when you construct it (using now) unless you are creating an anonymous array.
  • Elements in an array of objects are not automatically created, although primitive array elements are given default values.
  • You will get a NullPointerException if you try to use an array element in an object array, if that element does not refer to a real object Arrays are indexed beginning with zero.
  • An ArrayIdexOutofBoundsexception occurs of you use a bad index value.
  • Arrays have a length variable whose value is the number of array elements.
  • The last index you can access is always one less than the length of the array.
  • Multi-dimensional arrays are just arrays of arrays.
  • The dimensions in a multidimensional array can have different lengths.
  • An array of primitives can accept any value that can be promoted implicitly to the array declare type; e.g. a byte variable can go in an int array.
  • An array of objects can hold any object that passes IS-A( or instance of) test for the declared type of the array. For example if Horse extends Animal, then a Horse object can go inti an Animal array.
  • If you assign an array to a previously declared array reference, the array youa re ssigning must be the same dimension as the reference you are assigning to
  • You can assign an array of one type to previously declared reference of ine of its subtypes, for example a Honda array can be assigned to an array declared a stype Car(asumming Honda extends Car).

Initialization Blocks:

  • Static initializer blocks run once, when class is first loaded.
  • Instance initialization blocks run every time a new instance ois created. They run after all super- constructors and before the constructor’s code has run.
  • If multiple init blocks exist in a class, they a=follow the rules startd above and thery run int ehe order in which they appear int eh source file.

Using Wrappers:

  • The wrapper class correlates to the primitive types.
  • Wrappers have two main functions
  • With wrappers primitives so that they can be handled like objects
  • To provide utility methods for primitives (usually conversions)
  • The three most important method families are
  • xxxValue()takes no arguments, returns a primitive.
  • parseXXX() takes a String returns a primitive, throws NPE.
  • valueOf() takes a String, returns a warped object, throws NPE wrapper constructors can take a String or a primitive, except or Character which can only take a char.
  • Radix refers to bases (typically) other than 10, octal is radix=8, hex= 16.

Boxing:

  • As of Java 5, boxing allows you convert primitives to wrappers to convert wraps to primitives automatically.
  • Using == with wrappers created through boxing is tricky, those with the same small values
    (typically lower than 127) will be == larger values will not be ==.

Advanced Overloading:

  • Primitive widening uses the “smallest” method argument possible.
  • Used individually, being and var-args are compatible with overloading.
  • You CANNOT widen form one wrapper type to another ( IS-A fails).
  • You can box then widen. (An int can become an Object, via an Integer).
  • You can combine var-args with either widening or boxing.

Garbage collection:

  • In java, garbage collection (GC) provides automated memory management.
  • The purpose of GC is two delete objects that can’t be reached.
  • Only the JVM decides when to run the GC, you can only suggest it.
  • You can’t know the GC algorithm for sure.
  • Objects must be considered eligible before they can be garbage collected.
  • An object eligible when no live thread can reach it.
  • To reach an object, you must have an live, reachable reference of that object
  • Java applications can run out of memory.
  • Island of objects can be GC’ed, even though they refer to each other.
  • Request garbage collection with System.gc () or Runtime.getRuntime().gc().
  • Class object has a finalize () method.
  • The finalize method is guaranteed to run once and only once before the garbage collector deletes an object.
  • The garbage collector makes no guarantee finalize () method never run.
  • You can initialize an object for GC from within the finalize ().

JVM Architecture

jvm-570x250

Every Java developer knows that bytecode will be executed by JRE (Java Runtime Environment). But many doesn’t know the fact that JRE is the implementation of Java Virtual Machine (JVM), which analyzes the bytecode, interprets the code, and executes it. It is very important as a developer that we should know the Architecture of the JVM, as it enables us to write code more efficiently. In this article, we will learn more deeply about the JVM architecture in Java and the different components of the JVM.

What is the JVM?

A Virtual Machine is a software implementation of a physical machine. Java was developed with the concept of WORA (Write Once Run Anywhere), which runs on a VM. The compiler compiles the Java file into a Java .class file, then that .class file is input into the JVM, which Loads and executes the class file. Below is a diagram of the Architecture of the JVM.

JVM Architecture Diagram

JVM Architecture Diagram

How Does the JVM Work?

As shown in the above architecture diagram, the JVM is divided into three main subsystems:

  1. Class Loader Subsystem
  2. Runtime Data Area
  3. Execution Engine

1. Class Loader Subsystem

Java’s dynamic class loading functionality is handled by the class loader subsystem. It loads, links. and initializes the class file when it refers to a class for the first time at runtime, not compile time. 

1.1 Loading

Classes will be loaded by this component. Boot Strap class Loader, Extension class Loader, and Application class Loader are the three class loader which will help in achieving it.

  1. Boot Strap ClassLoader – Responsible for loading classes from the bootstrap classpath, nothing but rt.jar. Highest priority will be given to this loader.
  2. Extension ClassLoader – Responsible for loading classes which are inside ext folder (jre\lib).
  3. Application ClassLoader –Responsible for loading Application Level Classpath, path mentioned Environment Variable etc.

The above Class Loaders will follow Delegation Hierarchy Algorithm while loading the class files.

1.2 Linking

  1. Verify – Bytecode verifier will verify whether the generated bytecode is proper or not if verification fails we will get the verification error.
  2. Prepare – For all static variables memory will be allocated and assigned with default values.
  3. Resolve – All symbolic memory references are replaced with the original references from Method Area.

1.3 Initialization

This is the final phase of Class Loading, here all static variables will be assigned with the original values, and the static block will be executed.

2. Runtime Data Area

The Runtime Data Area is divided into 5 major components:

  1. Method Area – All the class level data will be stored here, including static variables. There is only one method area per JVM, and it is a shared resource.
  2. Heap Area – All the Objects and their corresponding instance variables and arrays will be stored here. There is also one Heap Area per JVM. Since the Method and Heap areas share memory for multiple threads, the data stored is not thread safe.
  3. Stack Area – For every thread, a separate runtime stack will be created. For every method call, one entry will be made in the stack memory which is called as Stack Frame. All local variables will be created in the stack memory. The stack area is thread safe since it is not a shared resource. The Stack Frame is divided into three subentities:
    1. Local Variable Array – Related to the method how many local variables are involved and the corresponding values will be stored here.
    2. Operand stack – If any intermediate operation is required to perform, operand stack acts as runtime workspace to perform the operation.
    3. Frame data – All symbols corresponding to the method is stored here. In the case of any exception, the catch block information will be maintained in the frame data.
  4. PC Registers – Each thread will have separate PC Registers, to hold the address of current executing instruction once the instruction is executed the PC register will be updated with the next instruction.
  5. Native Method stacks – Native Method Stack holds native method information. For every thread, a separate native method stack will be created.

3. Execution Engine

The bytecode which is assigned to the Runtime Data Area will be executed by the Execution Engine. The Execution Engine reads the bytecode and executes it piece by piece.

  1. Interpreter – The interpreter interprets the bytecode faster, but executes slowly. The disadvantage of the interpreter is that when one method is called multiple times, every time a new interpretation is required.
  2. JIT Compiler – The JIT Compiler neutralizes the disadvantage of the interpreter. The Execution Engine will be using the help of the interpreter in converting byte code, but when it finds repeated code it uses the JIT compiler, which compiles the entire bytecode and changes it to native code. This native code will be used directly for repeated method calls, which improve the performance of the system.
    1. Intermediate Code generator – Produces intermediate code
    2. Code Optimizer – Responsible for optimizing the intermediate code generated above
    3. Target Code Generator – Responsible for Generating Machine Code or Native Code
    4. Profiler – A special component, responsible for finding hotspots, i.e. whether the method is called multiple times or not.
  3. Garbage Collector: Collects and removes unreferenced objects. Garbage Collection can be triggered by calling “System.gc()”, but the execution is not guaranteed. Garbage collection of the JVM collects the objects that are created.

Java Native Interface (JNI): JNI will be interacting with the Native Method Libraries and provides the Native Libraries required for the Execution Engine.

Native Method Libraries:It is a collection of the Native Libraries which is required for the Execution Engine.

A Detailed Breakdown of the JVM

The JVM is the virtual machine on which Java code executes. It’s responsible for converting byte code into machine-specific code.

HotSpot JVM Architecture

Diagram: HotSpot JVM Architecture

Now, let’s discuss each and every component of JVM architecture in detail. It consists of a variety of components, and we’ll start with the classloader subsystem.

Classloader Subsystem of the JVM  

Classloader is a subsystem of the JVM. Classloader is used to load class files. It verifies class files using a bytecode verifier. A class file will only be loaded if it is valid.

Runtime Data Areas of JVM 

Method Area 

The method area is also called the class area. The method area stores data for each and every class, like fields, constant pools, and method data and information.

Heap 

The heap is the place where all objects are stored in JVM. The heap even contains arrays because arrays are objects.

Java Threads (Java Thread Stacks) 

You must know that each and every thread has its own stack. How are stack frames created when threads call new methods? As we know, each and every thread has its own stack. Whenever a new method is called, a new stack frame is created, and it is pushed on top of that thread’s stack.

What do thread stacks contain? They have all the local variables, all the parameters, and all the return addresses. Stacks never store objects, but they store object references.

Program Counter Registers (PC Registers) 

The program counter registers contain the address of the instructions currently being executed and the address of next instruction as well.

Native Internal Threads (Native Thread Stack) 

Native internal threads contain all the information related to native platforms. For example, if we’re running the JVM on Windows, it will contain Windows-related information. Likewise, if we’re running on Linux, it will have all the Linux-related information we need.

Execution Engine

The Execution Engine contains the JIT (Just In Time) Compiler and Garbage Collector compiler, as well as the Interpreter

JIT Compiler

The JIT Compiler compiles bytecode to machine code at runtime and improves the performance of Java applications.

Of course, JIT compilation does require processor time and memory usage. When the JVM first starts up, lots of methods are called. Compiling all of these methods might affect startup time significantly, though a program ultimately might achieve good performance.

Methods are not compiled when they are called the first time. For each and every method, the JVM maintains a call count, which is incremented every time the method is called. The methods are interpreted by the JVM until the call count exceeds the JIT compilation threshold (the JIT compilation threshold improves performance and helps the JVM to start quickly. The threshold has been selected carefully by Java developers for optimal performance. The balance between startup times and long-term performance is maintained).

Therefore, very frequently used methods are compiled as soon as the JVM has started, and less frequently used methods are compiled later.

After a method is compiled, its call count is reset to zero, and subsequent calls to the method increment its call count. When the call count of a method reaches a JIT recompilation threshold, the JIT compiler compiles method a second time, applying more optimizations as compared to optimizations applied in the previous compilation. This process is repeated until the maximum optimization level is reached. The most frequently used methods are always optimized to maximize the performance benefits of using the JIT compiler.

Let’s say the JIT recompilation threshold = 2.

After a method is compiled, its call count is reset to zero and subsequent calls to the method increment its call count. When the call count of a method reaches 2 (i.e. JIT recompilation threshold), the JIT compiler compiles the method a second time, applying more optimizations.

Garbage Collector

Garbage collection is the process by which the JVM clears objects (unused objects) from the heap to reclaim heap space.

Interpreter

Interpreter is responsible for reading the bytecode and then executing the instructions.

Native Method Libraries of the JVM

The native method interface is an interface that connects the JVM with the native method libraries for executing native methods.

If we are running the JVM (a Java application) on Windows, then the native method interface (Windows method interface) will connect the JVM with the Window method libraries (native method libraries) for executing Windows methods (native methods).

You may write your application purely in Java, but there are certain situations where Java code alone might not meet your requirements. Programmers use the JNI to write the Java native methods when an application cannot be written purely in Java.

Read more about the JNI here.

The most important JVM Components related to performance are:

  • Heap
  • JIT (Just In Time) Compiler and
  • Garbage collector

Diagram: key components of HotSpot JVM for performance.

Three components (the heap, JIT (Just In Time) compiler, and Garbage collector) are related to JVM’s performance tuning.

All objects are stored in the heap, and the garbage collector manages the heap at JVM initialization.

There are many VM (JVM) options for:

  • Increasing and decreasing the heap size for managing object for best performance.
  • Selecting different garbage collectors, depending on your requirement.

Meanwhile, as for the JIT Compiler JIT:

  • The JIT Compiler compiles bytecode to machine code at runtime and improves the performance of Java applications.
  • JIT Compiler tuning is rarely needed for newer versions of the JVM.

How Is the Java Platform an Independent Language?

Once source code (i.e. a .java file) is compiled on one platform (bytecode is formed), that bytecode can be executed (interpreted) on any other platform running a JVM.

Every platform has a different JVM implementation. From here, you can download the JVM for different platforms. For example, the JVM for Windows is different from the JVM for Linux.

This diagram helps demonstrate its independence:

The JVM is a very powerful and flexible runtime platform for languages such as Java, Groovy, Scala, and Clojure. The JVM provides a large number of libraries and is completely interoperable with Java.