Preprocessing data using PySpark