The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .

Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .

The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .

To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements java.io.Serializable . This is a “marker interface” and does not contain any methods that need to be implemented .

The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 < < 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional Continue reading

If you’ve tried to write your own calculator (something in the style of gcalctool) you’ve probably had to build a simple converter for your mathematical expressions from infix notation to  RPN (Reverse Polish Notation).

Inspired by this classical SPOJ challenge I wanted to write my own simplified version of an expression converter. If this topic is new for you, or you need to refresh your memory please don’t skip the following paragraph.

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 5). Unfortunately what seems natural for us, is not as simple to parse by computers as prefix or RPN notations.

RPN also known as the Reverse Polish Notation is mathematical notation wherein every operator (eg. + – * %) follows all of its operands. Examples:

Infix notation Reverse Polish Notation
A + B A B +
A ^ 2 + 2 * A * B + B ^ 2 A 2 ^ 2 A * B * + B 2 ^ +
( ( 1  + 2 ) / 3 ) ^ 4 1  2 + 3 / 4 ^
( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 ) 1 2 + 3 4 / 5 6 + ^ *

In order to parse and convert a given infix mathematical expression to RPN we will use the shunting-yard algorithm . Just like the evaluation of RPN, the algorithm is stack-based . For the conversion we will use two buffers (one for input, and one for output). Additionally we will use a stack for operators that haven’t been yet added to the output.

A simplified version of the Shunting-yard algorithm (complete version)
  • For all the input tokens [S1]:
    • Read the next token [S2];
    • If token is an operator (x) [S3]:
      • While there is an operator (y) at the top of the operators stack and either (x) is
        left-associative and its precedence is less or equal to that of (y), or (x) is right-associative
        and its precedence is less than (y) [S4]:

        • Pop (y) from the stack [S5];
        • Add (y) output buffer [S6];
      • Push (x) on the stack [S7];
    • Else If token is left parenthesis, then push it on the stack [S8];
    • Else If token is a right parenthesis [S9]:
      • Until the top token (from the stack) is left parenthesis, pop from the stack to the output buffer [S10];
      • Also pop the left parenthesis but don’t include it in the output buffer [S11];
    • Else add token to output buffer [S12].
  • While there are still operator tokens in the stack, pop them to output [S13]

Note: [SN] Relate with code.

I) Java Implementation

The following implementation of the shunting-yard algorithm does not impose any validations. The input should be a valid mathematical expression or else the program may end abruptly or Continue reading