Consider the code snippet

rdd=sc.parallelize([("1001","john","finance","20000"),("1002","Harry","finance",""),("1003","marry","finance","30000"),("1004","jim","finance","")])

The above dataset contains employee information. The schema is (employeeId, employeeName, department, salary)

Complete the code to find out how many employees have missing salary component? Choose three options.

rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]=="").count()

rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]<>"").count()

rdd.map(lambda x: (x[0],x[3])).filter(lambda x: x[1]=="null").count()

rdd.map(lambda x: (x[0],x[3])).filter(lambda x: len(x[1])==0).count()

Verified Answer
Correct Option - abd

To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee

Telegram