Spark


Course Info

  • Date: 17 Jun 2024
  • Category: Database
  • Practiced In: Localhost

About Course

  • Learned fundamentals of Apache Spark and distributed data processing.
  • Developed data processing applications using PySpark and RDD/DataFrame APIs.
  • Performed ETL operations and transformations on large datasets.
  • Implemented Spark jobs for batch processing and real-time analytics.
  • Worked with Spark SQL to query structured data efficiently.

  • Completed an introductory course on Apache Spark through online platforms, gaining a fundamental understanding of big data processing and distributed computation. Learned how Spark handles large-scale data using resilient distributed datasets (RDDs), DataFrames, and SQL interfaces. Practiced basic transformations, actions, and data pipelines using PySpark in self-guided exercises. The course emphasized how Spark can be integrated with tools like Hadoop and used in real-world scenarios such as data engineering workflows and analytics.