Consider the below code snippet.

rdd=sc.parallelize(["user1,password1","user2,password2","user2,password2","user4,password4","user1,password1"])

The above dataset indicates the login details of each user.

Schema is (usernaname,password).

Choose the correct code snippet which generates the below output. The output will be [(“user4”,1),(“user2”,2),(“user1”,2)]

rdd.map(lambda x:x.split(",")[0]). map(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()

rdd.flatMap(lambda x:x.split(",")[0]).map(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()

rdd.map(lambda x:x.split(",")[0]). flatMap(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()

rdd.map(lambda x:x.split(",")[0]).filter(lambda x:(x,1)). reduceByKey(lambda x,y:x+y).collect()

Verified Answer
Correct Option - a

To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee

Telegram