The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .

Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .

The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .

To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements java.io.Serializable . This is a “marker interface” and does not contain any methods that need to be implemented .

The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 << 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional GZip streams .

While the GZip versions looks almost the same, with the exception of the additional wrapper:

The interesting fact is that the object serialized with the saveObject(…) is going to be persisted in a file of 67149946 bytes (about 64M), while the file generated with the GZip stream occupies no more than 21479554 bytes (around 20M) .

One thought on “Serialize java objects using GZip streams (GZIPInputStream and GZIPOutputStream)

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url=""> 

required

Are we human, or are we dancer *