BigData

Parquet File Format 에 대한 스키마 확인

Kyle79 2020. 9. 7. 10:21

 

pyarrow

 

import pyarrow.parquet as pq
pfile = pq.read_table("000.parquet")
print("Column names: {}".format(pfile.column_names))
print("Schema: {}".format(pfile.schema))

 

python3 test.py
Column names: ['order_status_id', 'source_id', 'source_system_status', 'order_status', 'order_status_group']
Schema: order_status_id: int32
  -- field metadata --
  PARQUET:field_id: '1'
source_id: int16
  -- field metadata --
  PARQUET:field_id: '2'
source_system_status: string
  -- field metadata --
  PARQUET:field_id: '3'
order_status: string
  -- field metadata --
  PARQUET:field_id: '4'
order_status_group: string
  -- field metadata --
  PARQUET:field_id: '5'

'BigData' 카테고리의 다른 글

NiFi로 초당 10 억 개의 이벤트 처리  (0) 2020.11.11
Argo (feat. Jenkins)  (0) 2020.10.29
Apache Flink & Kafka & minio  (0) 2020.09.03
Airflow vs. Luigi vs. Argo vs. MLFlow vs. KubeFlow  (0) 2020.09.02
SchemaSpy & SchemaCrawler & SqlDesigner  (0) 2020.08.20