Serialize java objects using GZip streams (GZIPInputStream and GZIPOutputStream)

The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .

Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .

The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .

To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements java.io.Serializable . This is a “marker interface” and does not contain any methods that need to be implemented .

The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 < < 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional Read More

Generic data structures in C

NOTE: After performing a blog import WordPress messed my code . If you see any problems please let  me know .

In order to prove that generic programming (the style of computer programming in which algorithms are written in terms of to-be-specified-later types that are theninstantiated when needed for specific types provided as parameters) can be achieved in C, let’s write the implementation of a generic Stack data structure . We will follow two possible approaches:

  • Hacking with the #preprocessor;
  • Using the flexibility of the void pointer (void*);

You can always try both of the approaches and see which one is more suitable for your particular case . Also note that there are already generic C libraries available (see GLib ).

1. Hacking with the preprocessor

To understand the magic (not really) behind this approach you will need to be familiar with C macros and the associated concepts: function-like macros, macro arguments, stringification and concatenation. You can find a very nice tutorial on macros here. If you already know your stuff, you can skip the following paragraphs (or you can read them to refresh your memory / find errors and correct me :P). Read More