Sieve of Eratosthenes (finding all prime numbers up to a specific integer)

1. Algorithm Description

Sieve of Eratosthenes is an ancient algorithm in mathematics that allows us to find all prime numbers up to a specific integer value . It works well on small inputs (the maximum value of the interval is 10.000.000 – 20.000.000 ) .

I am not going to enter into too many details, as the algorithm itself is not very complicated and there are plenty resources on the web describing the concept of a sieve in general and the Eratosthenes Sieve in particular . Still I will implement this algorithm, for the sake of having a “complete” (far from the truth) reference on algorithms and data-structures .

The algorithm is based on the following technique:

1. Create an array S of integers with size N (the maximum value of the interval). Mark all positions in the array as possible primes .
2. Start iterating the array from position 2 (the first prime number), and mark every multiple of 2 as non-prime: S[2*2], S[2*3], …
3. Find the next non-prime number (that wasn’t previously marked), in our case 3, and continue to mark as non-prime all its multiples: S[3*2], S[3*3], …
4. Find the next non-prime number (that wasn’t previously marked), in our case 5, and continue to mark as non-prime all its multiples: S[5*2], S[5*3], …
5. Repeat the logic, and stop when the next non-prime number, k, satisfies the following relationship k*k > p .

Sieve of Eratosthenes pseudo-code:

2. Algorithm Implementation Read More

Serialize java objects using GZip streams (GZIPInputStream and GZIPOutputStream)

The process of converting an object into an associated sequence of bits, so that we can store it in a file, a memory buffer or share it across a network, with the sole purpose of later resurrecting it, is called Serialization Wikipedia offers a nice insight of what serialization is, so if you have time, please check this article . If this is the first time you hear about this concept you can check the official java documentation on this topic .

Recently I had to write a Serialization mechanism for a hobby application of mine . I had some very big objects (graphs represented as matrices) that had to be somehow stored as files for later usage . Serialization is not hard in Java, but the results are not always satisfactory . For example every graph object was using around 100M of my free and precious hdd space … and space is always an issue on my “workspace” partition (probably because I start so many “projects” and I never finish them) .

The work-around for this issue is relatively simple, instead of using a simple FileOutputStream / FileInputStream in conjunction with an ObjectOutputStream / ObjectInputStream we would better “wrap” the initial streams through a GZIPOutputStream / GZIPInputStream, and serialize the big objects as gzip files . The results are better than I expected, as the space consumption was reduced dramatically (3 or 4 times less space) . In my case the additional runtime for zipping / unzipping the objects before reading / writing them is not a problem, but note that because of the additional stream encapsulation (the GZIP streams), a time penalty appears .

To better demonstrate what I was saying I will start by designing a class that generates “very large objects” . The objects must support serialization, so our class implements . This is a “marker interface” and does not contain any methods that need to be implemented .

The VeryLargeObject class (not a recommended name for a class) encapsulates a bi-dimensional array of size [1 < < 12][1 << 12] . That means the array has 4096 * 4096 elements = 1 << 24 elements = 16777216 elements (I believe it consumes enough memory to prove the concept) . The second step is to build an util class that contains the functions necessary for serialization / de serialization . For comparing the two strategies, I had to write two pair of functions [saveObject(…), loadObject(…)] and [saveGZipObject(…), loadGZipObject(…)] . The big difference between the two pairs is that the second use additional Read More

Binary GCD (Stein’s Algorithm) in C

1. Algorithm Description

Binary GCD also known as Stein’s Algorithm is an algorithm that computes the greatest common divisor of two (positive) numbers . Discovered in 1967 by the Israeli programmer Josef Stein, it’s an alternative to the classical Euclid’s Algorithm, and is considered to be more efficient than this as it’s replacing divisions and multiplications with bitwise operations . The algorithm is recursive by nature, but loops can be used instead of recursion .

The algorithm can be described by the following rules (here and here) . Note that by B_GCD we will refer to a function that returns the greatest common divisor of two positive numbers .

1. B_GCD(0, 0) is not defined, but for convenience we will consider it 0 (zero), so B_GCD(0,0) = 0 ;
2. B_GCD(num1, 0) = num1 and B_GCD(0, num2) = num2 . The reason for this is beacuse zero is divided by everything ;
3. If num1 and num2 are even, B_GCD(num1, num2) = 2 * B_GCD(num1/2, num2/2), as 2 is a common divisor .
4. If num1 is even and num2 is odd, B_GCD(num1, num2) = B_GCD(num1 /2, num2), as 2 is not a common divisor . The steps are the same if num1 is odd and num2 is even : B_GCD(num1, num2) = B_GCD(num1, num2/2) .
5. If both num1 and num2 are odd, then if num1 >= num2 the B_GCD(num1, num2) = B_GCD((num1-num2)/2, num2), else B_GCD(num1, num2) = B_GCD((num2-num1)/2, num1) .
6. Step 4 and 5 are repeated until num1 = num2, or num1 = 0 .

We can also use pseudo code to describe the above algorithm .

Recursive Version of Binary GCD (Stein Algorithm)

The loop-version of the Binary GCD Algorithm

Read More

Euclid’s Algorithm: Reducing fraction to lowest terms

In the last article I’ve described Euclid’s algorithm for finding the greatest common divisor of two given numbers .A simple programming exercise I found in the book called “Algorithms in C” (Sedgewick) asks us to reduce a given fraction to lowest terms, using the Euclid’s Algorithm .

Fraction = 36 / 120
Fraction = Reduce(Fraction) = 3 / 12

1. C Implementation

Step 1:

We will start by defining the data-structure we will use for encapsulating the fraction information (numerator and denominator):

Step 2:

Usually when we play with C structures we should have two functions, for allocating and de-allocating memory associated with a structure (if you have an object-oriented background, this step will be equivalent with writing a constructor / destructor pair ) .

An important aspect is that the fraction_new() function fails if a 0 (zero) denominator is given .

Step 3:

On this step we will write a function that reduces the fraction to lowest terms . For this we will also need to determine the greatest common divisor of the fraction’s denominator and numerator in order to divide the fraction Read More

Euclid’s Algorithm

Recently I’ve started to implement (or reimplement) the most common algorithms a software developer should know . One of the nicest books I found on this topic is Algorithms in C (Robert Sedgewick) . Of course, there is this and this, but lately I am more interested on the “implementation” side of things than on maths and theory .

1. Algorithm Description

Euclid’s Algorithm is an efficient method for calculating the greatest common divisor of two numbers (aka GCD) . The  GCD of two numbers is considered the largest number that divides both of them (without leaving a reminder) .

Euclid’s algorithm is based on the principle that given two numbers a and b, if a is greater than b than the greatest common divisor of a and b is the same with the common divisor of b and (b – a) . If we transform the above property into “code” (actually pseudo-code) the algorithm looks like this:

The above pseudo-code is called the subtraction-variant . We can of course replace the repetitive subtractions with one division . The division-based version looks like this (in pseudo-code):

There is also a recursive version of the algorithm:

Read More