When you’re trying spark with its python repl, it’s really easy to write stuff using simple function or lambda. However, it will be a pain in the ass when you’re starting to try some complex stuff because you could easily miss something like indentation, etc.
Try running your pyspark with this command
IPYTHON_OPTS="notebook" path/to/your/pyspark
It will start an IPython Notebook in your browser with Spark Context as sc variable. You could start using it like this: