Important RGPV Question
Table of Contents
Toggle
CE-803(D), Data Analytics,
VIII Sem, CE
Unit I: Descriptive Statistics
Q.1 Define probability distribution with example.
Q.2 What is inferential statistics? How is it different from descriptive statistics?
Q.3 Explain hypothesis testing with an example.
Q.4 What is the p-value and its significance in hypothesis testing?
Q.5 Describe the steps involved in hypothesis testing.
Q.6 What is regression analysis? Explain with an example.
Q.7 Differentiate between correlation and regression.
Q.8 What is ANOVA? Explain its significance.
Q.9 Compare one-way and two-way ANOVA.
Q.10 Explain the assumptions of regression analysis.
Q.11 How is standard deviation used in data analysis?
Q.12 Define and explain normal distribution.
Q.13 What are confidence intervals?
Q.14 What is the null hypothesis and alternate hypothesis?
Q.15 Numerical: Perform a simple linear regression on a given dataset.
Unit II: Introduction to Big Data & Technologies
Q.1 Define Big Data. Why is it important today?
Q.2 What are the Four V’s of Big Data?
Q.3 Explain the key drivers for Big Data adoption.
Q.4 What is Big Data Analytics?
Q.5 List and explain applications of Big Data Analytics.
Q.6 What is Hadoop’s parallel world?
Q.7 What is open source technology in context of Big Data?
Q.8 How is cloud computing used in Big Data?
Q.9 Explain predictive analytics with example.
Q.10 What is Mobile Business Intelligence?
Q.11 Define Crowd Sourcing Analytics.
Q.12 What is meant by Inter- and Trans-Firewall Analytics?
Q.13 Explain the role of data discovery in Big Data.
Q.14 What is Information Management in Big Data?
Q.15 Compare traditional analytics vs Big Data Analytics.
Unit III: Processing Big Data
Q.1 What is data integration in Big Data processing?
Q.2 How to map data to the programming framework?
Q.3 Explain the process of connecting and extracting data from storage.
Q.4 How is data transformed for processing?
Q.5 What is the role of data preparation in Hadoop MapReduce?
Q.6 Explain structured vs unstructured data in Big Data.
Q.7 What is the challenge in integrating disparate data stores?
Q.8 Discuss ETL in context of Big Data.
Q.9 What are the key steps in Big Data processing pipeline?
Q.10 Explain real-time vs batch processing.
Q.11 What is data ingestion in Big Data systems?
Q.12 How to handle data quality issues during processing?
Q.13 Describe schema-on-read approach.
Q.14 Explain how MapReduce handles large datasets.
Q.15 Numerical: Create a simple data transformation pipeline with sample data.
Unit IV: Hadoop MapReduce
Q.1 What is Hadoop MapReduce?
Q.2 Describe the components of a MapReduce job.
Q.3 Explain how MapReduce distributes data processing.
Q.4 What is the role of job tracker and task tracker?
Q.5 How is data split in MapReduce framework?
Q.6 Describe the lifecycle of a MapReduce job.
Q.7 Explain how to monitor the progress of job flows.
Q.8 What are the different execution modes in Hadoop?
Q.9 Compare local, pseudo-distributed and fully distributed modes.
Q.10 What are Hadoop daemons?
Q.11 Describe the role of HDFS in Hadoop.
Q.12 How to create and run a MapReduce program?
Q.13 What are the benefits of using Hadoop for Big Data?
Q.14 List the challenges in running Hadoop MapReduce.
Q.15 Numerical: Simulate a word count problem using MapReduce logic.
Unit V: Big Data Tools and Techniques
Q.1 What is Apache Pig? How is it different from traditional databases?
Q.2 Compare Pig Latin with SQL.
Q.3 How to install and run Pig?
Q.4 What are user-defined functions in Pig?
Q.5 List common data processing operators in Pig.
Q.6 What is Apache Hive?
Q.7 How to install and run Hive?
Q.8 Explain HiveQL with example queries.
Q.9 Compare Hive with RDBMS.
Q.10 How to write user-defined functions in Hive?
Q.11 Describe the architecture of Hive.
Q.12 What is the use of Hive in data warehousing?
Q.13 Explain querying in Hive using practical examples.
Q.14 What is Oracle Big Data?
Q.15 Numerical: Write a Pig Latin script for filtering and grouping sample data.
— Best of Luck for Exam —