Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.
What is Apache Tez and why is it useful?
Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as Apache Hive and Apache Pig, as well as 3rd-party software vendors to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale. Apache Tez provides a developer API and framework to write native YARN applications that bridge the spectrum of interactive and batch workloads and is used with Apache Hive 0.13 as part of Hortonworks Data Platform.
What does “graduation” mean for Apache Tez?
Ultimately, an open source project is only as good as the community that supports it by improving quality and adding features needed by end users. The ASF has been established to do exactly this. They set governance frameworks and build communities around open source software projects. Further, the ASF believes (end enforces) that the real life of a project lies in the vibrance of its community of developers and users.
A project enters the Apache Software Foundation as an incubating project so that it can develop within these frameworks. One of the main criteria for graduation to a top-level project is an evaluation of the diversity and momentum behind the project community and whether it can self-sustain that momentum without the active guidance of the foundation. This graduating vote from the Apache Software Foundation is approval that Apache Tez has met that bar.
There were many factors that supported the graduation. Strong uptake from important Apache projects like Apache Hive and Apache Pig and other popular projects like Cascading. There are efforts underway to use Tez as the processing engine for Twitter’s Summingbird and Apache Flink (formerly Stratosphere from TU Berlin). Increasing usage among companies like Microsoft, Yahoo, LinkedIn and Netflix reflects considerable investment that will motivate more effort and sustenance from this influential user base. Vocal support from members of these communities about the excellent engagements they had from the Apache Tez community. Increasing number of contributors and committers into the Tez project. The fundamental open architecture of Tez is motivating a lot of interest in solving a number of hard problems in distributed data processing. All in all, the future for Apache Tez looks promising both from a project roadmap perspective as well user adoption and use case scenarios.
What does this mean for you?
Apache Tez can be adopted with a level of confidence in the project’s strength and long term sustainability. You can be sure that there will be an active and diverse community that is going to support you and the project will keep evolving with its users. As a community driven open source project, Tez encourages you to invest in the project by becoming a contributor so that you can meaningfully improve the project for yourself and others. Your use cases and your contributions are welcome!