Use Babashka to extract data from Excel Spreadsheets!
bb-excel is a simple Clojure library to extract data from Excel files without relying on the venerable Apache POI ooxml library. See below for rationale.
Add the following dependency to your project.clj
, deps.edn
, or bb.edn
Please note that being a beta version indicates the provisional status of this library, and that features are subject to change.
This library is meant for the simplest of spreadsheets.
The primary function is get-sheet
In the first example below we pull data from a specific sheet within the workbook. It returns a list of hash-maps with :_r key indicating the row within the sheet.
(ns demo
(:require [clojure.pprint :refer [print-table pprint]]
[bb-excel.core :refer [get-sheet get-sheets]]))
(get-sheet "test/data/simple.xlsx" "Shows" )
;=>
({:_r 2, :A 1.0, :B "Sesame Street"}
{:_r 3, :A 2.0, :B "La Femme Nikita"}
{:_r 4, :A 3.0, :B "House M.D"}
{:_r 5, :A 4.0, :B "Breaking Bad"})
Let's print out the results for better readability
(print-table
(get-sheet "test/data/simple.xlsx" "Shows" ))
| :_r | :A | :B |
|-----+------+-----------------|
| 1 | Rank | TV Show |
| 2 | 1.0 | Sesame Street |
| 3 | 2.0 | La Femme Nikita |
| 4 | 3.0 | House M.D |
| 5 | 4.0 | Breaking Bad |
If there is a header row we can use the values from that row as keys.Set :hdr flag in opts to true and specify the row number.
(print-table
(get-sheet "test/data/simple.xlsx" "Shows" {:hdr true :row 1}))
| :_r | Rank | TV Show |
|-----+------+-----------------|
| 2 | 1.0 | Sesame Street |
| 3 | 2.0 | La Femme Nikita |
| 4 | 3.0 | House M.D |
| 5 | 4.0 | Breaking Bad |
You can also format the keys by providing a function. In the example below we keywordize each heading
(print-table
(get-sheet "test/data/simple.xlsx" "Shows" {:hdr true :fxn keyword :row 1}))
| :_r | :Rank | :TV Show |
|-----+-------+-----------------|
| 2 | 1.0 | Sesame Street |
| 3 | 2.0 | La Femme Nikita |
| 4 | 3.0 | House M.D |
| 5 | 4.0 | Breaking Bad |
Another example. skeyword Lowercases, replaces spaces and slashes, excises non-ascii characters and limits to 50 characters every row heading
(print-table
(get-sheet "test/data/simple.xlsx" "Shows"
{:hdr true
:fxn bb-excel.cli/skeyword}))
| :_r | :rank | :tv_show |
|-----+-------+-----------------|
| 2 | 1.0 | Sesame Street |
| 3 | 2.0 | La Femme Nikita |
| 4 | 3.0 | House M.D |
| 5 | 4.0 | Breaking Bad |
That's all well and good but how do we use the same cell addressing format used in excel? For instance if we want B2:C4 then we can leverage the get-range function instead. Now that expects a sheet so
(print-table
(get-range
(get-sheet "test/data/simple.xlsx" "Shows")
"A3:B4"))
| :_r | :A | :B |
|-----+-----+-----------------|
| 3 | 2.0 | La Femme Nikita |
| 4 | 3.0 | House M.D |
Why create another excel library in clojure when you can use docjure or wrap the venerable Apache POI library. Answer is simple. Babashka, my scripting language of choice does not support Apache POI. This library currently reads xlsx files and returns a vector of hashmaps. Each hashmap representing a row in a selected sheet.
Can you improve this documentation? These fine people already did:
Kwabena (Kay) Bosompem & K BEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close