Mark Johnson is the Regional Director of Consulting - Northeast at Hortonworks a leading distribution of Apache Hadoop with over 25 years of application and data technology experience. Prior to joining Hortonworks, he was had roles evangelizing fast distributed data solutions at VMware and managing large development teams at CGI. When not working Mark heads up The New England Java Users Group, one of the worlds largest and oldest java users groups and spends time relaxing with his family.
Testing “Big Data” can mean big time investment; several hours often spent just realize you made a simple typo. You fix the typo and then wait another couple hours for your script to hopefully this time run to completion. Even if the Big Data script or program ran to completion are you sure your data analysis is correct? Getting programs to run to completion and to assure functional accuracy per the requirements are some of the biggest hidden problems in big data today.
During this overview presentation we will first introduce unit and functional testing techniques and high level concepts to consider in the Hadoop Ecosystem. The second half of the presentation we will explore real testing examples using tools such as PigUnit, JUnit for UDF testing, BeeTest and Hive limited test data set testing.