If you’ve tried to write your own calculator (something in the style of gcalctool) you’ve probably had to build a simple converter for your mathematical expressions from infix notation to  RPN (Reverse Polish Notation).

Inspired by this classical SPOJ challenge I wanted to write my own simplified version of an expression converter. If this topic is new for you, or you need to refresh your memory please don’t skip the following paragraph.

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 5). Unfortunately what seems natural for us, is not as simple to parse by computers as prefix or RPN notations.

RPN also known as the Reverse Polish Notation is mathematical notation wherein every operator (eg. + – * %) follows all of its operands. Examples:

Infix notation Reverse Polish Notation
A + B A B +
A ^ 2 + 2 * A * B + B ^ 2 A 2 ^ 2 A * B * + B 2 ^ +
( ( 1  + 2 ) / 3 ) ^ 4 1  2 + 3 / 4 ^
( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 ) 1 2 + 3 4 / 5 6 + ^ *

In order to parse and convert a given infix mathematical expression to RPN we will use the shunting-yard algorithm . Just like the evaluation of RPN, the algorithm is stack-based . For the conversion we will use two buffers (one for input, and one for output). Additionally we will use a stack for operators that haven’t been yet added to the output.

A simplified version of the Shunting-yard algorithm (complete version)
  • For all the input tokens [S1]:
    • Read the next token [S2];
    • If token is an operator (x) [S3]:
      • While there is an operator (y) at the top of the operators stack and either (x) is
        left-associative and its precedence is less or equal to that of (y), or (x) is right-associative
        and its precedence is less than (y) [S4]:

        • Pop (y) from the stack [S5];
        • Add (y) output buffer [S6];
      • Push (x) on the stack [S7];
    • Else If token is left parenthesis, then push it on the stack [S8];
    • Else If token is a right parenthesis [S9]:
      • Until the top token (from the stack) is left parenthesis, pop from the stack to the output buffer [S10];
      • Also pop the left parenthesis but don’t include it in the output buffer [S11];
    • Else add token to output buffer [S12].
  • While there are still operator tokens in the stack, pop them to output [S13]

Note: [SN] Relate with code.

I) Java Implementation

The following implementation of the shunting-yard algorithm does not impose any validations. The input should be a valid mathematical expression or else the program may end abruptly or perform incorrectly.

Step 1 : Declaring and defining operators

// Associativity constants for operators
	private static final int LEFT_ASSOC = 0;
	private static final int RIGHT_ASSOC = 1;

	// Supported operators
	private static final Map<String, int[]> OPERATORS = new HashMap<String, int[]>();
	static {
		// Map<"token", []{precendence, associativity}>
		OPERATORS.put("+", new int[] { 0, LEFT_ASSOC });
		OPERATORS.put("-", new int[] { 0, LEFT_ASSOC });
		OPERATORS.put("*", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("/", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("%", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("^", new int[] { 10, RIGHT_ASSOC });
	}

	/**
	 * Test if a certain is an operator .
	 * @param token The token to be tested .
	 * @return True if token is an operator . Otherwise False .
	 */
	private static boolean isOperator(String token) {
		return OPERATORS.containsKey(token);
	}

	/**
	 * Test the associativity of a certain operator token .
	 * @param token The token to be tested (needs to operator).
	 * @param type LEFT_ASSOC or RIGHT_ASSOC
	 * @return True if the tokenType equals the input parameter type .
	 */
	private static boolean isAssociative(String token, int type) {
		if (!isOperator(token)) {
			throw new IllegalArgumentException("Invalid token: " + token);
		}
		if (OPERATORS.get(token)[1] == type) {
			return true;
		}
		return false;
	}

	/**
	 * Compare precendece of two operators.
	 * @param token1 The first operator .
	 * @param token2 The second operator .
	 * @return A negative number if token1 has a smaller precedence than token2,
	 * 0 if the precendences of the two tokens are equal, a positive number
	 * otherwise.
	 */
	private static final int cmpPrecedence(String token1, String token2) {
		if (!isOperator(token1) || !isOperator(token2)) {
			throw new IllegalArgumentException("Invalied tokens: " + token1
					+ " " + token2);
		}
		return OPERATORS.get(token1)[0] - OPERATORS.get(token2)[0];
	}

Step 2 : Parsing expression

The input in this case should an array of strings, where every little string is a token. The output will also be an array of strings but in RPN order. (Take a look at the code comments, and the algorithm references [Sn]).

	public static String[] infixToRPN(String[] inputTokens) {
		ArrayList<String> out = new ArrayList<String>();
		Stack<String> stack = new Stack<String>();
		// For all the input tokens [S1] read the next token [S2]
		for (String token : inputTokens) {
			if (isOperator(token)) {
				// If token is an operator (x) [S3]
				while (!stack.empty() &amp;&amp; isOperator(stack.peek())) {
					// [S4]
					if ((isAssociative(token, LEFT_ASSOC) &amp;&amp; cmpPrecedence(
							token, stack.peek()) <= 0)
							|| (isAssociative(token, RIGHT_ASSOC) &amp;&amp; cmpPrecedence(
									token, stack.peek()) < 0)) {
						out.add(stack.pop()); 	// [S5] [S6]
						continue;
					}
					break;
				}
				// Push the new operator on the stack [S7]
				stack.push(token);
			} else if (token.equals("(")) {
				stack.push(token); 	// [S8]
			} else if (token.equals(")")) {
				// [S9]
				while (!stack.empty() &amp;&amp; !stack.peek().equals("(")) {
					out.add(stack.pop()); // [S10]
				}
				stack.pop(); // [S11]
			} else {
				out.add(token); // [S12]
			}
		}
		while (!stack.empty()) {
			out.add(stack.pop()); // [S13]
		}
		String[] output = new String[out.size()];
		return out.toArray(output);
	}

Step 3 : Testing the working converter

	public static void main(String[] args) {
		String[] input = "( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 )".split(" ");
		String[] output = infixToRPN(input);
		for (String token : output) {
			System.out.print(token + " ");
		}
	}

And the output:

1 2 + 3 4 / 5 6 + ^ *

Step 4 : Putting all togheter

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Stack;

public class ReversePolishNotation {
	// Associativity constants for operators
	private static final int LEFT_ASSOC = 0;
	private static final int RIGHT_ASSOC = 1;

	// Supported operators
	private static final Map<String, int[]> OPERATORS = new HashMap<String, int[]>();
	static {
		// Map<"token", []{precendence, associativity}>
		OPERATORS.put("+", new int[] { 0, LEFT_ASSOC });
		OPERATORS.put("-", new int[] { 0, LEFT_ASSOC });
		OPERATORS.put("*", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("/", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("%", new int[] { 5, LEFT_ASSOC });
		OPERATORS.put("^", new int[] { 10, RIGHT_ASSOC });
	}

	/**
	 * Test if a certain is an operator .
	 * @param token The token to be tested .
	 * @return True if token is an operator . Otherwise False .
	 */
	private static boolean isOperator(String token) {
		return OPERATORS.containsKey(token);
	}

	/**
	 * Test the associativity of a certain operator token .
	 * @param token The token to be tested (needs to operator).
	 * @param type LEFT_ASSOC or RIGHT_ASSOC
	 * @return True if the tokenType equals the input parameter type .
	 */
	private static boolean isAssociative(String token, int type) {
		if (!isOperator(token)) {
			throw new IllegalArgumentException("Invalid token: " + token);
		}
		if (OPERATORS.get(token)[1] == type) {
			return true;
		}
		return false;
	}

	/**
	 * Compare precendece of two operators.
	 * @param token1 The first operator .
	 * @param token2 The second operator .
	 * @return A negative number if token1 has a smaller precedence than token2,
	 * 0 if the precendences of the two tokens are equal, a positive number
	 * otherwise.
	 */
	private static final int cmpPrecedence(String token1, String token2) {
		if (!isOperator(token1) || !isOperator(token2)) {
			throw new IllegalArgumentException("Invalied tokens: " + token1
					+ " " + token2);
		}
		return OPERATORS.get(token1)[0] - OPERATORS.get(token2)[0];
	}

	public static String[] infixToRPN(String[] inputTokens) {
		ArrayList<String> out = new ArrayList<String>();
		Stack<String> stack = new Stack<String>();
		// For all the input tokens [S1] read the next token [S2]
		for (String token : inputTokens) {
			if (isOperator(token)) {
				// If token is an operator (x) [S3]
				while (!stack.empty() &amp;&amp; isOperator(stack.peek())) {
					// [S4]
					if ((isAssociative(token, LEFT_ASSOC) &amp;&amp; cmpPrecedence(
							token, stack.peek()) <= 0)
							|| (isAssociative(token, RIGHT_ASSOC) &amp;&amp; cmpPrecedence(
									token, stack.peek()) < 0)) {
						out.add(stack.pop()); 	// [S5] [S6]
						continue;
					}
					break;
				}
				// Push the new operator on the stack [S7]
				stack.push(token);
			} else if (token.equals("(")) {
				stack.push(token); 	// [S8]
			} else if (token.equals(")")) {
				// [S9]
				while (!stack.empty() &amp;&amp; !stack.peek().equals("(")) {
					out.add(stack.pop()); // [S10]
				}
				stack.pop(); // [S11]
			} else {
				out.add(token); // [S12]
			}
		}
		while (!stack.empty()) {
			out.add(stack.pop()); // [S13]
		}
		String[] output = new String[out.size()];
		return out.toArray(output);
	}

	public static void main(String[] args) {
		String[] input = "( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 )".split(" ");
		String[] output = infixToRPN(input);
		for (String token : output) {
			System.out.print(token + " ");
		}
	}
}

II) Python Implementation

The python implementation is a complete “translation” of the previous Java implementation (only the syntax was changed … in better).

Step 1 : Declaring and defining operators

#Associativity constants for operators
LEFT_ASSOC = 0
RIGHT_ASSOC = 1

#Supported operators
OPERATORS = {
    '+' : (0, LEFT_ASSOC),
    '-' : (0, LEFT_ASSOC),
    '*' : (5, LEFT_ASSOC),
    '/' : (5, LEFT_ASSOC),
    '%' : (5, LEFT_ASSOC),
    '^' : (10, RIGHT_ASSOC)
}

#Test if a certain token is operator
def isOperator(token):
    return token in OPERATORS.keys()

#Test the associativity type of a certain token
def isAssociative(token, assoc):
    if not isOperator(token):
        raise ValueError('Invalid token: %s' % token)
    return OPERATORS[token][1] == assoc

#Compare the precedence of two tokens
def cmpPrecedence(token1, token2):
    if not isOperator(token1) or not isOperator(token2):
        raise ValueError('Invalid tokens: %s %s' % (token1, token2))
    return OPERATORS[token1][0] - OPERATORS[token2][0]

Step 2 : Parsing expression

Just like in the previous example, the algorithm does not impose any validation. The input expression should be composed of valid tokens, or else the program may malfunction or end abruptly.

#Transforms an infix expression to RPN
def infixToRPN(tokens):
    out = []
    stack = []
    #For all the input tokens [S1] read the next token [S2]
    for token in tokens:
        if isOperator(token):
            # If token is an operator (x) [S3]
            while len(stack) != 0 and isOperator(stack[-1]):
                # [S4]
                if (isAssociative(token, LEFT_ASSOC)
                    and cmpPrecedence(token, stack[-1]) <= 0) or
                    (isAssociative(token, RIGHT_ASSOC)
                    and cmpPrecedence(token, stack[-1]) < 0):
                    # [S5] [S6]
                    out.append(stack.pop())
                    continue
                break
            # [S7]
            stack.append(token)
        elif token == '(':
            stack.append(token) # [S8]
        elif token == ')':
            # [S9]
            while len(stack) != 0 and stack[-1] != '(':
                out.append(stack.pop()) # [S10]
            stack.pop() # [S11]
        else:
            out.append(token) # [S12]
    while len(stack) != 0:
        # [S13]
        out.append(stack.pop())
    return out

Step 3 : Testing the converter

if __name__ == '__main__':
    input = "( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 )".split(" ")
    output = infixToRPN(input)
    print output

And the output:

['1', '2', '+', '3', '4', '/', '5', '6', '+', '^', '*']

Putting all togheter:

'''
Created on Oct 5, 2010

@author: nomemory
'''

#Associativity constants for operators
LEFT_ASSOC = 0
RIGHT_ASSOC = 1

#Supported operators
OPERATORS = {
    '+' : (0, LEFT_ASSOC),
    '-' : (0, LEFT_ASSOC),
    '*' : (5, LEFT_ASSOC),
    '/' : (5, LEFT_ASSOC),
    '%' : (5, LEFT_ASSOC),
    '^' : (10, RIGHT_ASSOC)
}

#Test if a certain token is operator
def isOperator(token):
    return token in OPERATORS.keys()

#Test the associativity type of a certain token
def isAssociative(token, assoc):
    if not isOperator(token):
        raise ValueError('Invalid token: %s' % token)
    return OPERATORS[token][1] == assoc

#Compare the precedence of two tokens
def cmpPrecedence(token1, token2):
    if not isOperator(token1) or not isOperator(token2):
        raise ValueError('Invalid tokens: %s %s' % (token1, token2))
    return OPERATORS[token1][0] - OPERATORS[token2][0]

#Transforms an infix expression to RPN
def infixToRPN(tokens):
    out = []
    stack = []
    #For all the input tokens [S1] read the next token [S2]
    for token in tokens:
        if isOperator(token):
            # If token is an operator (x) [S3]
            while len(stack) != 0 and isOperator(stack[-1]):
                # [S4]
                if (isAssociative(token, LEFT_ASSOC)
                    and cmpPrecedence(token, stack[-1]) <= 0) or
                    (isAssociative(token, RIGHT_ASSOC)
                    and cmpPrecedence(token, stack[-1]) < 0):
                    # [S5] [S6]
                    out.append(stack.pop())
                    continue
                break
            # [S7]
            stack.append(token)
        elif token == '(':
            stack.append(token) # [S8]
        elif token == ')':
            # [S9]
            while len(stack) != 0 and stack[-1] != '(':
                out.append(stack.pop()) # [S10]
            stack.pop() # [S11]
        else:
            out.append(token) # [S12]
    while len(stack) != 0:
        # [S13]
        out.append(stack.pop())
    return out

if __name__ == '__main__':
    input = "( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 )".split(" ")
    output = infixToRPN(input)
    print output

14 thoughts on “Converting infix to RPN (shunting-yard algorithm)

  1. Pingback: RPN Calculator (using Scala, python and Java) « andreINC.net

  2. HI,
    I have a C# implementation of this algorithm. If you are interested and want to publish, let me know and I will email you the code.

    Reply
  3. @Tom

    Of course you can email me the code if you want, and I’ll publish the code (ofcourse giving you the credits).

    Reply
  4. Here’s a Javascript implementation ripped straight from your Java one. Scouring through t’interwebs there are previous little Javascript implementations that are any good out there, so hopefully this will be useful to others and save them the couple of hours I spent first looking for an existing one then hacking this one together.

    // implementation of shunting yard
    // converted from the Java version at http://andreinc.net/2010/10/05/converting-infix-to-rpn-shunting-yard-algorithm/
    function infixToRPN(expression)
    {
        var LEFT_ASSOC = 0;
        var RIGHT_ASSOC = 1;
        var OPERATORS = {};
        OPERATORS['+'] = [ 0, LEFT_ASSOC ];
        OPERATORS['-'] = [ 0, LEFT_ASSOC ];
        OPERATORS['*'] = [ 5, LEFT_ASSOC ];
        OPERATORS['/'] = [ 5, LEFT_ASSOC ];
        OPERATORS['%'] = [ 5, LEFT_ASSOC ];
        OPERATORS['^'] = [ 10, RIGHT_ASSOC ];
    
        function isOperator(token)
        {
            if (OPERATORS[token]) return true;
            return false;
        }
    
        function isAssociative(token,type)
        {
            if (!isOperator(token)) {
                throw new Error("Invalid token: " + token);
            }
             if (OPERATORS[token][1] == type) {
                return true;
            }
            return false;
        }
    
        function cmpPrecedence(token1, token2) 
        {
            if (!isOperator(token1) || !isOperator(token2)) {
                throw new Error("Invalid tokens: " + token1
                        + " " + token2);
            }
            return OPERATORS[token1][0] - OPERATORS[token2][0];
        }
    
        function peek(array)
        {
            if (array.length &gt; 0) return (array[array.length-1]);
            return null;
        }
        
        // for now, split at space    
        var inputTokens = expression.split(" ");
        
        var out = [];
        var stack = [];
        
        // For all the input tokens [S1] read the next token [S2]
        for (var i = 0; i  0 &amp;&amp; isOperator(peek(stack)))
                {
                    // [S4]
                    if ((isAssociative(token, LEFT_ASSOC) &amp;&amp; cmpPrecedence(
                        token, peek(stack)) &lt;= 0)
                    || (isAssociative(token, RIGHT_ASSOC) &amp;&amp; cmpPrecedence(
                        token, peek(stack))  0 &amp;&amp; peek(stack) != "(")
                {
                    out.push(stack.pop()); // [S10]
                }
                stack.pop(); // [S11]
            } 
            else
            {
                out.push(token); // [S12]
            }
        }
    
        while (stack.length &gt; 0)
        {
            out.push(stack.pop()); // [S13]
        }
        return out;
    
    }
    
    Reply
    • Thanks for your comment and implementation. I’ve almost deserted this site. Recently I’ve decided to “revive” it, so that’s the reason for this very late reply.

      Reply
  5. Pingback: » AdvancedFilter: Parsing the Search String Spreadsheet Budget and Consulting

  6. Pingback: Memorized my reading ;) « Parnurzeal’s Weblog

  7. Pingback: Interesting sites | Latoto

Leave a reply

required


five + 1 =

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>