pyarrow
import pyarrow.parquet as pq
pfile = pq.read_table("000.parquet")
print("Column names: {}".format(pfile.column_names))
print("Schema: {}".format(pfile.schema))
python3 test.py
Column names: ['order_status_id', 'source_id', 'source_system_status', 'order_status', 'order_status_group']
Schema: order_status_id: int32
-- field metadata --
PARQUET:field_id: '1'
source_id: int16
-- field metadata --
PARQUET:field_id: '2'
source_system_status: string
-- field metadata --
PARQUET:field_id: '3'
order_status: string
-- field metadata --
PARQUET:field_id: '4'
order_status_group: string
-- field metadata --
PARQUET:field_id: '5'
'BigData' 카테고리의 다른 글
NiFi로 초당 10 억 개의 이벤트 처리 (0) | 2020.11.11 |
---|---|
Argo (feat. Jenkins) (0) | 2020.10.29 |
Apache Flink & Kafka & minio (0) | 2020.09.03 |
Airflow vs. Luigi vs. Argo vs. MLFlow vs. KubeFlow (0) | 2020.09.02 |
SchemaSpy & SchemaCrawler & SqlDesigner (0) | 2020.08.20 |