BigData
Parquet File Format 에 대한 스키마 확인
Kyle79
2020. 9. 7. 10:21
pyarrow
import pyarrow.parquet as pq
pfile = pq.read_table("000.parquet")
print("Column names: {}".format(pfile.column_names))
print("Schema: {}".format(pfile.schema))
python3 test.py
Column names: ['order_status_id', 'source_id', 'source_system_status', 'order_status', 'order_status_group']
Schema: order_status_id: int32
-- field metadata --
PARQUET:field_id: '1'
source_id: int16
-- field metadata --
PARQUET:field_id: '2'
source_system_status: string
-- field metadata --
PARQUET:field_id: '3'
order_status: string
-- field metadata --
PARQUET:field_id: '4'
order_status_group: string
-- field metadata --
PARQUET:field_id: '5'