Serialize java objects using GZip streams (GZIPInputStream and GZIPOutputStream)
The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization . Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .
Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .
The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .
To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements java.io.Serializable . This is a “marker interface” and does not contain any methods that need to be implemented .
import java.io.Serializable;
/**
* A dumb class that is generating not very large, but decently
* large java objects .
*
* @author Andrei Ciobanu
* @date 3 DEC, 2010
*/
public class VeryLargeObject implements Serializable {
public static final int SIZE = 1 << 12;
public String tag;
public int[][] bigOne = new int[SIZE][SIZE];
{
// Initialize bigOne
for(int i = 0; i < SIZE ; ++i) {
for(int j = 0; j < SIZE; ++j) {
bigOne[i][j] = (int) (Math.random() * 100);
}
}
}
public VeryLargeObject(String tag) {
this.tag = tag;
}
}
The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 << 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional