## The merge sort algorithm (implementation in Java)

1. Description

In computer science many algorithms are recursive by nature and they usually follow an approach called divide-and-conquer . In this particular case dividing-and-conquering a problems means to break it into smaller identical sub-problems, resolve the smaller sub-problems recursively (by dividing the smaller sub-problems into even smaller problems), and eventually combine all their results to determine the solution for the initial problem .

The Merge Sort algorithm is based on this technique :

 1. The merge sort algorithm works on arrays (or lists). If the array has its size < = 1, then we consider the array already sorted and no change is done . This is considered to be the limit condition of this recursive approach . [Limit Condition] 2. If size of the array (or list) > 1, then we divide the array into two equally-sized sub-arrays . [Divide] – O(1) 3. At this point we recursively sort both of the two arrays . [Conquer] 2*T*(n/2) 4. We merge the two sorted sub-arrays into one sorted array . [Merge] O(n)

We will start by defining the MERGE function . This is function is responsible with the merging of two sorted arrays. The resulting array (in our case portion of the initial array composed by the two merged sub-arrays) will be also sorted :

Where:

 L The initial unsorted array . llim The starting index of the left sub-array; mlim The stopping index for right sub-array, and the starting index for the right sub-array; rlim The stopping index for the right sub-array; LEFTL, RIGHTL Additional helper arrays . SENTINEL A value to simplify our code, there’s nothing “bigger” than the Sentinel .

After we define the MERGE function, the actual MERGE-SORT function can be written as:

## Serialize java objects using GZip streams (GZIPInputStream and GZIPOutputStream)

The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .

Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .

The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream﻿ / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .

To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements java.io.Serializable . This is a “marker interface” and does not contain any methods that need to be implemented .

The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 < < 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional Read More

## Converting infix to RPN (shunting-yard algorithm)

If you’ve tried to write your own calculator (something in the style of gcalctool) you’ve probably had to build a simple converter for your mathematical expressions from infix notation to  RPN (Reverse Polish Notation).

Inspired by this classical SPOJ challenge I wanted to write my own simplified version of an expression converter. If this topic is new for you, or you need to refresh your memory please don’t skip the following paragraph.

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 5). Unfortunately what seems natural for us, is not as simple to parse by computers as prefix or RPN notations.

RPN also known as the Reverse Polish Notation is mathematical notation wherein every operator (eg. + – * %) follows all of its operands. Examples:

 Infix notation Reverse Polish Notation A + B A B + A ^ 2 + 2 * A * B + B ^ 2 A 2 ^ 2 A * B * + B 2 ^ + ( ( 1  + 2 ) / 3 ) ^ 4 1  2 + 3 / 4 ^ ( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 ) 1 2 + 3 4 / 5 6 + ^ *

In order to parse and convert a given infix mathematical expression to RPN we will use the shunting-yard algorithm . Just like the evaluation of RPN, the algorithm is stack-based . For the conversion we will use two buffers (one for input, and one for output). Additionally we will use a stack for operators that haven’t been yet added to the output.

 A simplified version of the Shunting-yard algorithm (complete version) For all the input tokens [S1]: Read the next token [S2]; If token is an operator (x) [S3]: While there is an operator (y) at the top of the operators stack and either (x) is left-associative and its precedence is less or equal to that of (y), or (x) is right-associative and its precedence is less than (y) [S4]: Pop (y) from the stack [S5]; Add (y) output buffer [S6]; Push (x) on the stack [S7]; Else If token is left parenthesis, then push it on the stack [S8]; Else If token is a right parenthesis [S9]: Until the top token (from the stack) is left parenthesis, pop from the stack to the output buffer [S10]; Also pop the left parenthesis but don’t include it in the output buffer [S11]; Else add token to output buffer [S12]. While there are still operator tokens in the stack, pop them to output [S13] Note: [SN] Relate with code.

I) Java Implementation

The following implementation of the shunting-yard algorithm does not impose any validations. The input should be a valid mathematical expression or else the program may end abruptly or Read More