Free search string tokenization in Python

Want to do some simple lex parsing in Python? Using shlex, you may be able to get something that meets your requirements almost for free. Here is an example I used recently to parse a search string. The requirements were that tokens could be separated by spaces or commas, and double-quotes denotes a single token.

import shlex 

def _tokens(query):
    return shlex.split(str(query))


>>> _tokens("java, perl, c++")
['java,', 'perl,', 'c++']

>>> _tokens("java perl c++")
['java', 'perl', 'c++']

>>> _tokens("java perl c++ \"Phil's Staffing\"")
['java', 'perl', 'c++', "Phil's Staffing"]

I'm currently working at NerdWallet, a startup in San Francisco trying to bring clarity to all of life's financial decisions. We're hiring like crazy. Hit me up on Twitter, I would love to talk.

Follow @chase_seibert on Twitter