Consider the code snippet
rdd=sc.parallelize([("1001","john","finance","20000"),("1002","Harry","finance",""),("1003","marry","finance","30000"),("1004","jim","finance","")])
The above dataset contains employee information. The schema is (employeeId, employeeName, department, salary)
Complete the code to find out how many employees have missing salary component? Choose three options.
rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]=="").count()
rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]<>"").count()
rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]=="null").count()
rdd.map(lambda x: (x[0],x[3])).filter(lambda x: len(x[1])==0).count()
To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee