Consider the below code snippet.
rdd=sc.parallelize(["user1,password1","user2,password2","user2,password2","user4,password4","user1,password1"])
The above dataset indicates the login details of each user.
Schema is (usernaname,password).
Choose the correct code snippet which generates the below output. The output will be [(“user4”,1),(“user2”,2),(“user1”,2)]
rdd.map(lambda x:x.split(",")[0]). map(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()
rdd.flatMap(lambda x:x.split(",")[0]).map(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()
rdd.map(lambda x:x.split(",")[0]). flatMap(lambda x:(x,1)).reduceByKey(lambda x,y:x+y).collect()
rdd.map(lambda x:x.split(",")[0]).filter(lambda x:(x,1)). reduceByKey(lambda x,y:x+y).collect()
To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee