Consider the below code snippet:
rdd=sc.parallelize([("PROD001",8000,"USA"),("PROD002",3000,"INDIA"),("PROD001",8000,"INDIA"),("PROD002",9000,"USA")])
The above dataset indicates the total sales amount of each product location wise.
The schema is (productid, amount, country)
Which of the below code snippets provide the total sales amount. The output should be 28000 . Choose TWO options.
rdd.map(lambda x:(x[0],x[1])).map(lambda x:x[1]).sum()
rdd.map(lambda x:x[1]).sum()
rdd.map(lambda x:(x[0],x[1])). flatMap(lambda x : x[1]).sum()
rdd.map(lambda x:(x[0],x[1])).filter(lambda x : x[1]).sum()
To get all Infosys Certified PySpark Professional Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee