This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on ” Spark with Hadoop – 1″.
1. Spark was initially started by ____________ at UC Berkeley AMPLab in 2009.
a) Mahek Zaharia
b) Matei Zaharia
c) Doug Cutting
d) Stonebraker
View Answer
Explanation: Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley.
2. Point out the correct statement.
a) RSS abstraction provides distributed task dispatching, scheduling, and basic I/O functionalities
b) For cluster manager, Spark supports standalone Hadoop YARN
c) Hive SQL is a component on top of Spark Core
d) None of the mentioned
View Answer
Explanation: Spark requires a cluster manager and a distributed storage system.
3. ____________ is a component on top of Spark Core.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned
View Answer
Explanation: Spark SQL introduces a new data abstraction called SchemaRDD, which provides support for structured and semi-structured data.
4. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned
View Answer
Explanation: Spark SQL provides SQL language support, with command-line interfaces and ODBC/JDBC server.
5. Point out the wrong statement.
a) For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS)
b) Spark also supports a pseudo-distributed mode, usually used only for development or testing purposes
c) Spark has over 465 contributors in 2014
d) All of the mentioned
View Answer
Explanation: Spark is the most active project in the Apache Software Foundation and among Big Data open source projects.
6. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
View Answer
Explanation: Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.
7. ____________ is a distributed machine learning framework on top of Spark.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
View Answer
Explanation: MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.
8. ________ is a distributed graph processing framework on top of Spark.
a) MLlib
b) Spark Streaming
c) GraphX
d) All of the mentioned
View Answer
Explanation: GraphX started initially as a research project at UC Berkeley AMPLab and Databricks, and was later donated to the Spark project.
9. GraphX provides an API for expressing graph computation that can model the __________ abstraction.
a) GaAdt
b) Spark Core
c) Pregel
d) None of the mentioned
View Answer
Explanation: GraphX is used for machine learning.
10. Spark architecture is ___________ times as fast as Hadoop disk-based Apache Mahout and even scales better than Vowpal Wabbit.
a) 10
b) 20
c) 50
d) 100
View Answer
Explanation: Spark architecture has proven scalability to over 8000 nodes in production.
Sanfoundry Global Education & Learning Series – Hadoop.
No comments:
Post a Comment