Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. In this course we will discuss the challenges created by Big Data and some of the state-of-the-art approaches do deal with them. In this curricular unit students will obtain practical experience with Hadoop, Hive, and Spark tools and understand their role in the analytical workflow of a data scientist.
- Karau, Holden, et al. Learning spark: lightning-fast big data analysis. “ O’Reilly Media, Inc.”, 2015
- White, Tom. Hadoop: The definitive guide. “ O’Reilly Media, Inc.”, 2012
- Capriolo, Edward, Dean Wampler, and Jason Rutherglen. Programming Hive: Data warehouse and query language for Hadoop. “ O’Reilly Media, Inc.”, 2012