BigData

Parquet File Format 에 대한 스키마 확인

Kyle79 2020. 9. 7. 10:21

 

pyarrow

 

import pyarrow.parquet as pq
pfile = pq.read_table("000.parquet")
print("Column names: {}".format(pfile.column_names))
print("Schema: {}".format(pfile.schema))

 

python3 test.py
Column names: ['order_status_id', 'source_id', 'source_system_status', 'order_status', 'order_status_group']
Schema: order_status_id: int32
  -- field metadata --
  PARQUET:field_id: '1'
source_id: int16
  -- field metadata --
  PARQUET:field_id: '2'
source_system_status: string
  -- field metadata --
  PARQUET:field_id: '3'
order_status: string
  -- field metadata --
  PARQUET:field_id: '4'
order_status_group: string
  -- field metadata --
  PARQUET:field_id: '5'