Liking cljdoc? Tell your friends :D

edd.parquet.core

Parquet file generation for columnar data export.

Schema format: {:description "Table description" :columns [["COL_NAME" :type "description" :required/:optional & opts] ...]}

Supported column types:

  • :string - UTF-8 string (BINARY with STRING logical type)
  • :enum - String with optional validation (BINARY with STRING logical type)
  • :boolean - Stored as string "TRUE"/"FALSE" (BINARY with STRING logical type)
  • :uuid - UUID as string (BINARY with STRING logical type)
  • :date - Date as days since epoch (INT32 with DATE logical type)
  • :double - 64-bit floating point (DOUBLE)
  • :long - 64-bit signed integer (INT64)

Memory considerations:

  • Rows are consumed lazily (one at a time), so prefer passing lazy sequences
  • Parquet writer buffers up to one row group (~128MB) internally
  • Output byte[] is fully materialized in memory

Example: (write-parquet-bytes {:table-name "orders" :schema {:description "Customer orders" :columns [["ORDER_ID" :uuid "Order identifier" :required] ["CUSTOMER" :string "Customer name" :required] ["AMOUNT" :double "Order amount" :required] ["STATUS" :enum "Order status" :required :enum ["PENDING" "SHIPPED" "DELIVERED"]] ["CREATED" :date "Creation date" :optional]]} :rows [{"ORDER_ID" "123e4567-e89b-12d3-a456-426614174000" "CUSTOMER" "Acme Corp" "AMOUNT" 1234.56 "STATUS" "PENDING" "CREATED" "2024-01-15"}] :compression :gzip})

Parquet file generation for columnar data export.

Schema format:
{:description "Table description"
 :columns [["COL_NAME" :type "description" :required/:optional & opts]
           ...]}

Supported column types:
- :string   - UTF-8 string (BINARY with STRING logical type)
- :enum     - String with optional validation (BINARY with STRING logical type)
- :boolean  - Stored as string "TRUE"/"FALSE" (BINARY with STRING logical type)
- :uuid     - UUID as string (BINARY with STRING logical type)
- :date     - Date as days since epoch (INT32 with DATE logical type)
- :double   - 64-bit floating point (DOUBLE)
- :long     - 64-bit signed integer (INT64)

Memory considerations:
- Rows are consumed lazily (one at a time), so prefer passing lazy sequences
- Parquet writer buffers up to one row group (~128MB) internally
- Output byte[] is fully materialized in memory

Example:
(write-parquet-bytes
  {:table-name "orders"
   :schema {:description "Customer orders"
            :columns [["ORDER_ID" :uuid "Order identifier" :required]
                      ["CUSTOMER" :string "Customer name" :required]
                      ["AMOUNT" :double "Order amount" :required]
                      ["STATUS" :enum "Order status" :required :enum ["PENDING" "SHIPPED" "DELIVERED"]]
                      ["CREATED" :date "Creation date" :optional]]}
   :rows [{"ORDER_ID" "123e4567-e89b-12d3-a456-426614174000"
           "CUSTOMER" "Acme Corp"
           "AMOUNT" 1234.56
           "STATUS" "PENDING"
           "CREATED" "2024-01-15"}]
   :compression :gzip})
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close