Sunday, December 27, 2020

Hadoop Questions and Answers – Spark with Hadoop – 1

 This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on ” Spark with Hadoop – 1″.

1. Spark was initially started by ____________ at UC Berkeley AMPLab in 2009.
a) Mahek Zaharia
b) Matei Zaharia
c) Doug Cutting
d) Stonebraker
View Answer

Answer: b
Explanation: Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley.

2. Point out the correct statement.
a) RSS abstraction provides distributed task dispatching, scheduling, and basic I/O functionalities
b) For cluster manager, Spark supports standalone Hadoop YARN
c) Hive SQL is a component on top of Spark Core
d) None of the mentioned
View Answer

Answer: b
Explanation: Spark requires a cluster manager and a distributed storage system.

3. ____________ is a component on top of Spark Core.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned
View Answer

Answer: b
Explanation: Spark SQL introduces a new data abstraction called SchemaRDD, which provides support for structured and semi-structured data.

4. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned
View Answer

Answer: c
Explanation: Spark SQL provides SQL language support, with command-line interfaces and ODBC/JDBC server.

5. Point out the wrong statement.
a) For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS)
b) Spark also supports a pseudo-distributed mode, usually used only for development or testing purposes
c) Spark has over 465 contributors in 2014
d) All of the mentioned
View Answer

Answer: d
Explanation: Spark is the most active project in the Apache Software Foundation and among Big Data open source projects.

6. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
View Answer

Answer: b
Explanation: Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.

7. ____________ is a distributed machine learning framework on top of Spark.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
View Answer

Answer: a
Explanation: MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.

8. ________ is a distributed graph processing framework on top of Spark.
a) MLlib
b) Spark Streaming
c) GraphX
d) All of the mentioned
View Answer

Answer: c
Explanation: GraphX started initially as a research project at UC Berkeley AMPLab and Databricks, and was later donated to the Spark project.

9. GraphX provides an API for expressing graph computation that can model the __________ abstraction.
a) GaAdt
b) Spark Core
c) Pregel
d) None of the mentioned
View Answer

Answer: c
Explanation: GraphX is used for machine learning.

10. Spark architecture is ___________ times as fast as Hadoop disk-based Apache Mahout and even scales better than Vowpal Wabbit.
a) 10
b) 20
c) 50
d) 100
View Answer

Answer: a
Explanation: Spark architecture has proven scalability to over 8000 nodes in production.

Sanfoundry Global Education & Learning Series – Hadoop.

No comments:

Post a Comment

Lab 09: Publish and subscribe to Event Grid events

  Microsoft Azure user interface Given the dynamic nature of Microsoft cloud tools, you might experience Azure UI changes that occur after t...