Hruday - Talasila

Spark

Learned fundamentals of Apache Spark and distributed data processing.
Developed data processing applications using PySpark and RDD/DataFrame APIs.
Performed ETL operations and transformations on large datasets.
Implemented Spark jobs for batch processing and real-time analytics.
Worked with Spark SQL to query structured data efficiently.

Completed an introductory course on Apache Spark through online platforms, gaining a fundamental understanding of big data processing and distributed computation. Learned how Spark handles large-scale data using resilient distributed datasets (RDDs), DataFrames, and SQL interfaces. Practiced basic transformations, actions, and data pipelines using PySpark in self-guided exercises. The course emphasized how Spark can be integrated with tools like Hadoop and used in real-world scenarios such as data engineering workflows and analytics.