DevOps Articles

Curated articles, resources, tips and trends from the DevOps World.

How Apache Arrow Is Changing the Big Data Ecosystem

2 years ago thenewstack.io
How Apache Arrow Is Changing the Big Data Ecosystem

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

One of the biggest challenges of working with https://thenewstack.io/how-aiops-conquers-performance-gaps-on-big-data-pipelines/ is the performance overhead involved with moving data between different tools and systems as part of your data processing pipeline. The process of serializing and deserializing the same data into a different representation at potentially each step in a data pipeline makes working with large amounts of data slower and more costly in terms of hardware.

Apache Arrow is an open source project intended to provide a standardized https://thenewstack.io/apache-arrow-designed-accelerate-hadoop-spark-columnar-layouts-data/for flat and hierarchical data.

InfluxDB IOx — https://www.influxdata.com/products/influxdb-cloud/?utm_source=vendor&utm_medium=referral&utm_campaign=2022-12_spnsr-ctn_apache-arrow-big-data_tns’s new columnar storage engine https://www.influxdata.com/blog/influxdb-engine/?utm_source=vendor&utm_medium=referral&utm_campaign=2022-12_spnsr-ctn_apache-arrow-big-data_tns uses the Arrow format for representing data and moving data to and from Parquet.

Pandas is able to read data stored in Parquet files by using Apache Arrow behind the scenes.

Made with pure grit © 2024 Jetpack Labs Inc. All rights reserved. www.jetpacklabs.com