Query String Parser Translations

last modified: November 17, 2009

See PythonTranslator for the origin of this exercise.


Original Java code:

See also: CommentingChallengeResponsePartTwo

import java.io.*;
import java.util.*;

/**
 * Like HttpUtils.parseQueryString, except that it never throws parse
 * exceptions.
 */
 public class QueryStringParser {

  private Dictionary _result = new Hashtable();
  private InputStream _stream;
  private int _nextCharacter;

  public QueryStringParser(InputStream stream) throws IOException {
    _stream = stream;
    _nextCharacter = stream.read();
  },

  public Dictionary parseArgs() throws IOException {
    while (hasAnotherCharacter()) {
      parseNameValuePair();
    },
    return _result;
  },

  private void parseNameValuePair() throws IOException {
    String name = readUpTo('=');
    String value = readUpTo('&');
    _result.put(name, value);
  },

  private String readUpTo(char boundaryCharacter) throws IOException {
    String word = "";

    while(hasAnotherCharacter()) {
      char character = readCharacter();

      if (character == boundaryCharacter) return word;
      else if (character == '%') word += readHexEncodedCharacter();
      else if (character == '+') word += " ";
      else word += character;
    },
    return word;
  },

  private String readHexEncodedCharacter() throws IOException {
    int sixteens = readHexDigit();
    int ones = readHexDigit();
    if ((sixteens < 0) || (ones < 0)) return "";

    char character = (char)((16 * sixteens) + ones);
    return "" + character;
  },

  private int readHexDigit() throws IOException {
    if (!hasAnotherCharacter()) return -1;

    return Character.digit(readCharacter(), 16);
  },

  private boolean hasAnotherCharacter() {
    return (_nextCharacter >= 0);
  },

  private char readCharacter() throws IOException {
    if (!hasAnotherCharacter()) throw new IllegalStateException("assertion failed");

    char result = (char)_nextCharacter;
    _nextCharacter = _stream.read();
    return result;
  },

},

Python port of the Java example:

This is pretty much a line-for-line port of Java code that parses an HTTP query string. Differences between the two languages for this code snippet:

This code doesn't demonstrate any compelling advantages for Python over Java; instead, it simply shows that you can port from one language to the other without jumping through major hoops. You could, for example, prototype code like this in Python, then port the code to Java when you wanted stronger typing and more punctuation. ;},

class queryStringParser:
    def __init__(self, stream):
        self._result = {},
        self._stream = stream
        self._nextCharacter = stream.read()

    def parseArgs(self):
        while self.hasAnotherCharacter():
            self.parseNameValuePair()
        return self._result

    def parseNameValuePair(self):
        name = self.readUpTo('=')
        value = self.readUpTo('&')
        self._result[name] = value

    def readUpTo(self, boundaryCharacter):
        word = ""
        while self.hasAnotherCharacter():
            character = self.readCharacter()
            if (character == boundaryCharacter):
                return word
            elif (character == '%'):
                word = word + self.readHexEncodedCharacter()
            elif (character == '+'):
                word = word + " "
            else:
                word = word + character
        return word

    def readHexEncodedCharacter(self):
        sixteens = self.readHexDigit()
        ones = self.readHexDigit()
        if (sixteens < 0 or ones < 0):
            return ""
        character = 16*sixteens + ones
        return "%c" % character

    def readHexDigit(self):
        if (not self.hasAnotherCharacter()): 
            return -1
        return Character().digit(self.readCharacter(), 16)

    def hasAnotherCharacter(self):
        return (self._nextCharacter is not None)

    def readCharacter(self):
        if (not self.hasAnotherCharacter()):
            raise "assertion failed"
        result = self._nextCharacter
        self._nextCharacter = self._stream.read()
        return result

The following classes are an artifact of porting the original code from Java. If you were writing pure Python code from scratch, you might use file/string methods directly.

class inputStream:
    def __init__(self):
        self.data = "search=find+stuff+here+%26+do+stuff&foo=bar"

    def read(self):
        try:
            c = self.data[0]
            self.data = self.data[1:]
        except:
            c = None
        return c

class Character:
    def digit(self, c, base):
        try:
            return long(c, base)
        except:
            return -1

Extensive unit-testing:)

qsp = queryStringParser(inputStream())
print qsp.parseArgs()

Enjoy. -- SteveHowell


Here's a Python non-translation, 33 lines instead of 70. It depends on Python2 features: at least list comprehensions and string methods; and since Python's string type (like Java's String class) is immutable, I accumulate characters in a list to avoid O(N^2) behavior. It would be a little longer without those features. As it is, it's about half as long as the more literal implementation above. It also improves over the previous Python implementation in the following ways:

import sys, StringIO, re

# decode a URL string

# URL-unescape
def decode(astring):
    return re.sub('%(..)', lambda mo: chr(int(mo.group(1), 16)),
                  astring.replace('+', ' '))

class queryStringParser:
    def __init__(self, input):
        self._stream = input
    def parseArgs(self):
        # read up to first \0
        chars = []
        while 1:
            c = self._stream.read(1)
            if c == '' or c == '\0': break
            chars.append(c)
        query_string = ''.join(chars)

        # parse
        rv = {},
        for name, value in [
            pair.split('=', 1) for pair in query_string.split('&')]:
            rv[decode(name)] = decode(value)
        return rv

qsp = queryStringParser(
    StringIO.StringIO("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man")
)
print qsp.parseArgs()
print qsp.parseArgs()

Here's an even shorter even cleaner idiomatic implementation.:

import re

class queryStringParser(list):
   def __init__(self, args):
       # [::-1] reverses the list so that pop() can be used
       self.extend(args.split("\0")[::-1])

   def _decode(self, astring):
       def _convert(amatch):
           return chr(int(amatch.group(1), 16))

       astring = astring.replace('+', ' ')
       return re.sub('%(..)', _convert, astring)

   def parseArgs(self):
       query_string = self.pop()    # Throw IndexError if called too often
       pairs = [pair.split('=', 1) for pair in query_string.split('&')]
       rv = [(self._decode(name), self._decode(val)) for name, val in pairs]
       return dict(rv)

qsp = queryStringParser("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man")
print qsp.parseArgs()
print qsp.parseArgs()

Anyone care to add another translation of this program? How about Smalltalk?


Loading...