A Comparative Study On Hadoop Ecosystem: Hive And HBase – A Literature Review
Main Article Content
Moh Rifqi Zamzami
Moh Riswandha Imawan
Imam Ghozali
In the current era, where handling vast volumes of data is a pivotal challenge for businesses, this research conducts a systematic literature review comparing Hive and HBase within the Hadoop ecosystem. The study filters relevant articles from diverse sources, including accredited SINTA national journals, international journals, and Scopus-accredited journals. Results deepen understanding of the roles and contributions of Hive and HBase components in processing big data, aiding companies in selecting the most suitable solution for their data processing and security needs. The research highlights that HBase excels in fast, random read/write operations, while Hive is more efficient in data querying. However, both tools' performance is influenced by factors like data size, node quantity, and system configuration. This study provides a profound comprehension of Hive and HBase in addressing big data challenges, laying the groundwork for further development. Intended for IT practitioners and researchers, this research contributes valuable insights into big data processing within the Hadoop.
D. Reinsel, J. Gantz, and J. Rydning, “The Digitization of the World From Edge to Core,” Framingham, MA 01701, Nov. 2018. Accessed: Dec. 26, 2023. [Online]. Available: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper
J. Leonard, “19 Data and Analytics Predictions Through 2025,” Business2Community. Accessed: Dec. 23, 2023. [Online]. Available: https://www.business2community.com/big-data/19-data-and-analytics-predictions-through-2025-02178668
J. Zhang, G. Wu, X. Hu, and X. Wu, “A distributed cache for Hadoop Distributed File System in real-time cloud services,” in Proceedings - IEEE/ACM International Workshop on Grid Computing, 2012, pp. 12–21. doi: 10.1109/Grid.2012.17.
M. U. Hassan, I. Yaqoob, S. Zulfiqar, and I. A. Hameed, “A comprehensive study of HBase storage architecture-a systematic literature review,” Symmetry (Basel), vol. 13, no. 1, pp. 1–21, Jan. 2021, doi: 10.3390/sym13010109.
Mint Fox, “How much data does TikTok use?,” Mint Mobile. Accessed: Dec. 23, 2023. [Online]. Available: https://www.mintmobile.com/blog/how-much-data-does-tiktok-use/
L. Zhang and C. Malife, “Processing billions of events in real time at Twitter,” Twitter Blog. Accessed: Dec. 23, 2023. [Online]. Available: https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter-
J. Tidy, “TikTok: What is the app and how much data does it collect?,” BBC. Accessed: Dec. 23, 2023. [Online]. Available: https://www.bbc.com/news/technology-53476117
L. George, HBase: The Definitive Guide : Random Access to Your Planet-Size Data, First Edition. Sebastopol, CA, USA: O’Reilly Media, Inc, 2011.
N. Azizah and H. Saptono, “UJI PERFORMA DAN PERBANDINGAN RDBMS MYSQL DAN HIVE-HADOOP,” Jurnal Informatika Terpadu, vol. 6, no. 1, pp. 20–28, Mar. 2020, [Online]. Available: https://journal.nurulfikri.ac.id/index.php/JIT
W. B. Alfajri, A. Puji Widodo, and K. Adi, “Penerapan Tata Kelola Teknologi Informasi pada Instansi: Systematic Literature Review,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 7, no. 3, pp. 191–198, Jan. 2022, doi: 10.25077/teknosi.v7i3.2021.191-198.
H. Matallah, G. Belalem, and K. Bouamrane, “Evaluation of NoSQL Databases,” International Journal of Software Science and Computational Intelligence, vol. 12, no. 4, pp. 71–91, Sep. 2020, doi: 10.4018/ijssci.2020100105.
F. Ye, J. Sun, Z. Du, N. Nedjah, W. Liu, and L. Lan, “Efficient data replay mechanism of sensor stream data based on concurrent buffer pool,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, pp. 10293–10303, Nov. 2022, doi: 10.1016/j.jksuci.2022.10.021.
J. Kalajdjieski, M. Raikwar, N. Arsov, G. Velinov, and D. Gligoroski, “Databases fit for blockchain technology: A complete overview,” Blockchain: Research and Applications, vol. 4, no. 1. Zhejiang University, Mar. 01, 2023. doi: 10.1016/j.bcra.2022.100116.
M. J. Suárez-Cabal, P. Suárez-Otero, C. de la Riva, and J. Tuya, “MDICA: Maintenance of data integrity in column-oriented database applications,” Comput Stand Interfaces, vol. 83, Jan. 2023
Z. M. Zhu, F. Q. Xu, and X. Gao, “Research on school intelligent classroom management system based on internet of things,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 144–149. doi: 10.1016/j.procs.2020.02.037.
C. Feng and B. Li, “Research of Temporal Information Index Strategy Based on HBase,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 367–372. doi: 10.1016/j.procs.2017.03.119.
M. Sharma and M. Bundele, “Analysis of NoSQL schema design approaches using HBase for GIS data,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 59–65. doi: 10.1016/j.procs.2019.05.027.
L. Ding and L. Hsin Cheng, “Introduction and Performance: An Overview of Hive,” New York, USA, 2017.
D. Chrimes and H. Zamani, “Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services,” Comput Math Methods Med, vol. 2017, 2017, doi: 10.1155/2017/6120820.
Z. Bousalem, I. El Guabassi, and I. Cherti, “Relational databases versus HBase: An experimental evaluation,” Advances in Science, Technology and Engineering Systems, vol. 4, no. 2, pp. 395–401, 2019, doi: 10.25046/aj040249.
R. Sethy, S. K. Dash, and M. Panda, “Performance comparison between apache hive and oracle SQL for big data analytics,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2018, pp. 130–141. doi: 10.1007/978-3-319-60618-7_14.
A. Preetih and J. Elavarasi, “BIG DATA ANALYTICS USING HADOOP TOOLS – APACHE HIVE VS APACHE PIG,” International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE), vol. 24, no. 3, Feb. 2017.
N. Ahmed, S. Ahamed, J. I. Rafiq, and S. Rahim, “Data processing in Hive vs. SQL server: A comparative analysis in the query performance,” in 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences, ICETSS 2017, Institute of Electrical and Electronics Engineers Inc., Jul. 2017, pp. 1–5. doi: 10.1109/ICETSS.2017.8324202.
S. Arora, A. Verma, R. Vasuja, and R. Vasuja, “An Overview of Apache Pig and Apache Hive,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, pp. 432–436, Mar. 2019, doi: 10.32628/cseit195250.
N. Y. Wicaksono, E. Sakti Pramukantoro, and W. Yahya, “Perbandingan Kinerja HBase dan MongoDB Sebagai Backend IoT Data Storage,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 12, pp. 6842–6848, 2018, [Online]. Available: http://j-ptiik.ub.ac.id
D. Malik Ibrahim, R. Primananda, and M. Data, “Perbandingan Performa Database Apache HBase dan Apache Cassandra Sebagai Media Penyimpanan Data Sensor Internet of Things,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 8, pp. 2943–2949, Aug. 2018, [Online]. Available: http://j-ptiik.ub.ac.id
B. KHONDE Noel, B. MANGATA Bopatriciat, M. MUKENDI Eugène, and B. CHRISTIAN Parfum, “STUDY AND IMPROVEMENT OF PERFORMANCE OF NoSQL DATABASES: MongoDB, HBase and OrientDB,” IJISCS (International Journal of Information System and Computer Science), pp. 164–172, 2022, [Online]. Available: http://www.oracle.com/technetwork/java/ja
K. D. Mahajan and V. D. Chaudhari, “Hive: A Literature Review,” International Journal of Innovations in Engineering and Science, vol. 4, no. 10, p. 425003, 2019, [Online]. Available: www.ijies.net
A. A. Khaleel, A. N. Kareem, and L. H. Mahdi, “Predictive analytics on COVID-19 data using Hive based on Hadoop cluster,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 2, pp. 945–956, Aug. 2023, doi: 10.11591/ijeecs.v31.i2.pp945-956.
Z. Cao, H. Dong, Y. Wei, S. Liu, and D. H. C. Du, “IS-HBase: An In-Storage Computing Optimized HBase with I/O Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated Infrastructure,” ACM Transactions on Storage, vol. 18, no. 2, May 2022, doi: 10.1145/3488368.
ZamZami, M. R., Wibowo, N. C., Ana Wati, S. F., Ghozali, I., & Imawan, M. R. (2024). Rancang Bangun Sistem Informasi Berbasis Web Menggunakan Metode Waterfall. CYCLOTRON, 7(01), 61–66. https://doi.org/10.30651/cl.v7i01.21084
Ghozali, I., Riswandha Imawan, M., Rifqi Zamzami, M., Zuhri, S., Pembagunan Nasional Veteran Jawa Timur, U., & Muhammadiyah Surabaya, U. (2023). WEBMAP UNTUK PENGEMBANGAN JALUR IRIGASI BARU DI KABUPATEN LAMONGAN. 1(5). https://doi.org/10.47353/satukata.v1i5.1401
Riswandha Imawan, M., Rifqi Zamzami, M., Ghozali, I., Muhammadiyah Surabaya, U., & Pembangunan Nasional Veteran Jawa Timur, U. (2023). PANDANGAN ORANG TUA DALAM PENGGUNAAN APLIKASI MEDIA SOSIAL DI ANAK REMAJA (STUDI KASUS: KOTA SURABAYA). 1(4). https://doi.org/10.47353/satukata.v1i4.1015
Riswandha Imawan, M. (2023). MODEL PEMBELAJARAN KOOPERATIF TIPE TPS-TGT PADA PEMBELAJARAN MATEMATIKADI KELAS VIII SMP N 1 SEMARANG. 1(1), 1–9. https://doi.org/10.3342/jursih.v1i1.14
Riswanda, M., & Ghozali, I. (2020). Tips & Trick Android Root:Cara Cepat dan Mudah Belajar Tips & Trick Android. Jakad Media Publishing. www.nandroid19.com



