When you’re trying spark with its python repl, it’s really easy to write stuff using simple function or lambda. However, it will be a pain in the ass when you’re starting to try some complex stuff because you could easily miss something like indentation, etc.

Try running your pyspark with this command

IPYTHON_OPTS="notebook" path/to/your/pyspark

It will start an IPython Notebook in your browser with Spark Context as sc variable. You could start using it like this: