Consider the below code snippet:

rdd=sc.parallelize([("PROD001",8000,"USA"),("PROD002",3000,"INDIA"),("PROD001",8000,"INDIA"),("PROD002",9000,"USA")])

The above dataset indicates the total sales amount of each product location wise.

The schema is (productid, amount, country)

Which of the below code snippets provide the total sales amount. The output should be 28000 . Choose TWO options.

rdd.map(lambda x:(x[0],x[1])).map(lambda x:x[1]).sum()

rdd.map(lambda x:x[1]).sum()

rdd.map(lambda x:(x[0],x[1])). flatMap(lambda x : x[1]).sum()

rdd.map(lambda x:(x[0],x[1])).filter(lambda x : x[1]).sum()

Verified Answer
Correct Option - ab

To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee

Telegram