Liking cljdoc? Tell your friends :D

Changelog

All notable changes to edgarjure are documented here.

[Unreleased — bug fixes batch 2]

Fixed

edgar.filing/filing-by-accession — silent nil :form on missing index key

  • The accession lookup previously contained a fallback :form-type key that does not exist in the SEC filing index JSON (the correct key is :formType, keywordised by jsonista). When :formType was absent for any reason, :form silently became nil, causing filing-obj to dispatch on nil rather than throwing a useful error. Fixed by removing the fallback entirely; the function now throws ex-info with {:type ::not-found :accession-number "..."} when :formType is absent. Tested in filing-by-accession-form-type-test.

edgar.schema/FilingArgs:n and :include-amends? required instead of optional

  • FilingArgs declared :n and :include-amends? as required fields, while FilingsArgs (plural) marked its optional fields with {:optional true}. In practice edgar.api/filing always supplies defaults for these keys before calling validate!, so there was no user-visible failure — but calling validate! directly with a minimal map would throw confusingly. Fixed by adding {:optional true} to :n and :include-amends? in FilingArgs, aligning the schema with the actual call-site behaviour. Covered by new schema_test.clj.

edgar.dataset/multi-company-facts(apply ds/concat []) arity error on empty result

  • When all tickers threw (bad CIK, HTTP error, etc.) or the input vector was empty, the for comprehension produced an empty sequence and (apply ds/concat []) threw an arity exception. Fixed in two parts: (1) each ticker fetch is now wrapped in try/catch and failed tickers return nil, collected via keep identity; (2) ds/concat is only called when (seq rows) is truthy, otherwise an empty dataset is returned. Tested in new dataset_test.clj with with-redefs stubs covering all three cases: empty input, all-fail, and partial-fail.

edgar.tables/extract-tablesel/select recursively included nested-table <tr> rows

  • extract-table used (sel/select (sel/tag :tr) table-node) to collect rows, which recursively traverses the entire subtree including nested <table> elements. This inflated the row count for filings with nested tables (common in SEC HTML exhibits) and caused column misalignment. Fixed by replacing sel/select with a new private function direct-rows that collects only <tr> nodes that are direct children of the <table>, <thead>, <tbody>, or <tfoot> elements — the same direct-child pattern already used by row-cells for cells. Tested in extract-table-nested-no-double-count-test.

edgar.extract/item-pattern — regex could not match 10-Q Roman-numeral item IDs

  • The item-pattern regex only matched \d{1,2}[AB]? for the item number, making it impossible to match 10-Q item headings like "Item I-1. Financial Statements" or "Item II-1A. Risk Factors". As a result, extract-items on 10-Q filings always returned {}. Fixed by extending the regex with an optional (?:[IVXivx]+\s*[-\s]\s*)? prefix before the digit group. ID normalization (str/replace #"\s*[-\s]\s*(?=\d)" "-" + str/upper-case) is applied in both find-item-boundaries (to produce correctly cased boundary IDs) and in extract-items (to normalize user-supplied :items to match items-10q map keys). Tested in item-pattern-test, find-item-boundaries-10q-test, and extract-items-10q-normalized-ids-test.

Added

test/edgar/schema_test.clj (new file)

  • Comprehensive Malli schema unit tests: validate! helper (nil on success, ex-info on failure, :args and :errors in ex-data), FilingArgs required/optional fields, FilingsArgs optional :form with [:maybe FormType], StatementArgs (:shape enum, :as-of regex), AccessionNumber primitive (dashed format required). Added to test_runner.clj.

test/edgar/dataset_test.clj (new file)

  • Offline tests for multi-company-facts robustness using with-redefs: empty tickers vector returns empty dataset; all tickers failing returns empty dataset; partial failure returns only the rows from the succeeding ticker. Added to test_runner.clj.

[Unreleased]

Fixed

edgar.filings/parse-filings-recent — double (keys recent) call

  • (keys recent) was called twice: once to build the keyword key sequence and once to get the column values. Although Clojure maps have stable iteration order within a single JVM session, calling keys twice is fragile. Bound the result once to raw-ks and reused it for both ks (keywordised) and cols (value extraction). Eliminates any theoretical risk of column/key misalignment.

[Unreleased — Phase 4: Additional Infrastructure]

Added

Fixture-based offline tests

  • test/edgar/tables_test.clj — fixture HTML string (fixture-tables-html) embedded in the test namespace; fixture-driven tests cover extract-tables end-to-end (:nth, :min-rows, dedup column names), cell-text, row-cells (direct-child-only extraction), extract-table, matrix->dataset column deduplication, parse-number, infer-column, and layout-table?. No network access required.
  • test/edgar/forms/form4_test.clj — fixture XML string; covers parse-issuer, parse-owner, parse-non-derivative.
  • test/edgar/forms/form13f_test.clj — fixture XML string; covers parse-report-summary, parse-holding, is-amendment?.
  • All fixture tests are offline (no HTTP calls) and run under clj -M:test.

Exhibit and XBRL document API (edgar.filing / edgar.api)

  • filing/filing-exhibits — filters the filing index for entries whose :type starts with "EX-". Returns a seq of maps {:name :type :document :description :sequence}.
  • filing/filing-exhibit — returns the first index entry matching a given exhibit type string (e.g. "EX-21"), or nil. Fetch its content with (filing/filing-document filing (:name exhibit)).
  • filing/filing-xbrl-docs — returns index entries whose :type starts with "EX-101" or whose :name ends with ".xsd". Covers instance, schema, calculation, label, presentation, and definition linkbases.
  • Exposed via edgar.api as e/exhibits, e/exhibit, and e/xbrl-docs.

:as-of on e/panel / edgar.dataset/multi-company-facts

  • dataset/multi-company-facts now accepts :as-of "YYYY-MM-DD". When set, observations where :filed > as-of-date are excluded, then deduplication keeps the most recently filed survivor per [ticker concept end] key — identical look-ahead-safe semantics to edgar.financials/dedup-point-in-time.
  • Exposed via (e/panel [...] :as-of "2022-01-01"). The :as-of key is now part of schema/PanelArgs and Malli-validated.

Malli input validation (edgar.schema / edgar.api)

  • New namespace src/edgar/schema.clj — defines Malli map schemas for every public edgar.api function argument and a shared validate! helper. Invalid args throw ex-info with {:type ::edgar.schema/invalid-args :args {...} :errors {...}}.
  • Schemas: InitArgs, FilingsArgs, FilingArgs, FactsArgs, StatementArgs, FrameArgs, PanelArgs, SearchArgs, SearchFilingsArgs, TablesArgs, FilingByAccessionArgs.
  • Primitive schemas: NonBlankString, TickerOrCIK, ISODate, FormType, ShapeKw, ConceptArg, AccessionNumber, PositiveInt, TaxonomyStr, FrameStr.
  • All public functions in edgar.api call schema/validate! at entry. Inner namespaces (edgar.filings, edgar.xbrl, etc.) do not validate — validation is centralised in the facade.
  • metosin/malli 0.16.4 promoted from :future alias to main deps.edn deps.

[Unreleased — previous batch]

Added

edgar.financials / edgar.api — Point-in-time :as-of option

  • All four financial-statement functions (income-statement, balance-sheet, cash-flow, get-financials and their e/ wrappers) now accept :as-of "YYYY-MM-DD". When set, observations where :filed > as-of-date are excluded before restatement deduplication, giving look-ahead-safe point-in-time results. Default behaviour (no :as-of) is unchanged: latest restated value is returned. Implemented in new private function edgar.financials/dedup-point-in-time.

edgar.forms.form13f (new file)

  • 13F-HR parser — institutional holdings (XML-era, post-2013Q2 only). Registers filing-obj "13F-HR" and filing-obj "13F-HR/A" methods. Returns {:form :period-of-report :report-type :is-amendment? :form13f-file-number :table-entry-count :table-value-total :manager {:name :street :city :state :zip} :holdings <tech.ml.dataset> :total-value}. Holdings dataset columns: :name :cusip :title :value :shares :shares-type :put-call :investment-discretion :other-managers :voting-sole :voting-shared :voting-none. :value and voting columns are Long; :total-value is the sum of the :value column (thousands of USD as SEC reports it). Uses only clojure.xml + clojure.string; no new dependencies.

edgar.forms (new file)

  • Central parser loader namespace. (require '[edgar.forms]) loads all built-in form parsers (currently form4 and form13f) in a single call, solving the discoverability problem where users would forget to require individual parser namespaces. Individual requires ([edgar.forms.form4], [edgar.forms.form13f]) continue to work unchanged.

edgar.tables (new file)

  • extract-tables — parses a filing's HTML with hickory, collects all <table> elements, extracts cell text per row (th + td), uses the first row with ≥2 non-blank cells as the header, aligns data rows to header width, deduplicates column names (suffixes _1, _2), and infers numeric types (strips $, ,, %; converts (123)-123). Layout tables with <2 data rows are skipped automatically. Options: :min-rows, :min-cols for post-hoc filtering; :nth to return a single table by index. row-cells uses direct-child filtering (not recursive subtree selection) to avoid double-counting cells in nested tables.

edgar.financials

  • Concept fallback chains — each line item now has a primary GAAP concept and one or more fallback alternatives. Revenue, for example, tries RevenueFromContractWithCustomerExcludingAssessedTax (post-ASC-606) before falling back to Revenues and SalesRevenueNet. The first concept present in the company's facts wins.
  • Duration vs instant filtering — balance sheet uses only observations where :frame ends in "I" (instant snapshots, e.g. CY2023Q4I); income statement and cash flow use only duration observations. Prevents mixing point-in-time and period values.
  • Restatement deduplication — when multiple filings report the same concept+period, the observation with the latest :filed date wins. Implemented via reduce + (pos? (compare (:filed %1) (:filed %2))).
  • :line-item column — all normalized datasets now carry a human-readable label alongside the raw GAAP concept name.
  • :shape :wide option — pass :shape :wide to income-statement, balance-sheet, cash-flow, or get-financials to receive a pivoted dataset with one row per period and one column per line item. Default remains :long.
  • Public concept vars — income-statement-concepts, balance-sheet-concepts, cash-flow-concepts are now public defs. Users can inspect and override the concept fallback chains for non-standard filers.

edgar.company

  • company-metadata — returns a shaped map extracted from the SEC submissions JSON: :cik :name :tickers :exchanges :sic :sic-description :entity-type :category :state-of-inc :state-of-inc-description :fiscal-year-end :ein :phone :website :investor-website :addresses {:business {...} :mailing {...}} :former-names.

edgar.filings

  • get-daily-filings — lazy seq of all SEC filings submitted on a given date. Accepts an ISO date string ("2026-03-10") or java.time.LocalDate. Optional :form filter and :filter-fn predicate. Implemented via EFTS search-index with dateRange=custom and lazy from= pagination (100 results per page).

edgar.api

  • e/company-metadata — thin wrapper around company/company-metadata.
  • e/daily-filings — thin wrapper around filings/get-daily-filings.
  • e/tables — thin wrapper around tables/extract-tables.
  • e/income, e/balance, e/cashflow, e/financials — all now accept :shape :wide option.

Fixed

edgar.forms.form13f/is-amendment?

  • Previously checked the XML reportType field for the string "RESTATEMENT", which is never present (the field contains values like "13F HOLDINGS REPORT"). This caused :is-amendment? to always be false. Fixed by overriding the value in the merge call inside parse-form13f using (= "13F-HR/A" (:form filing)) — the :form key of the filing map is the correct source.

edgar.financials/dedup-restatements and dedup-point-in-time — bad max-key comparator

  • Both functions previously used (apply max-key #(compare (:filed %) "") group). max-key expects a key function returning a comparable value, not a comparator; using compare against "" returned -1/0/1 integers, which accidentally selected the lexicographically largest :filed string for non-empty values but would silently return a wrong entry for nil or empty :filed. Replaced with reduce + (pos? (compare (:filed %1) (:filed %2))), which is semantically correct in all cases.

edgar.extract/remove-tables — nil nodes in :content causing NPE

  • clojure.walk/postwalk replaced <table> nodes with nil, leaving nil entries in parent :content vectors. Downstream node-text and flatten-nodes did not guard against nil content entries, causing NullPointerExceptions when extracting items from filings with tables near item boundaries. Fixed by additionally filtering nil from each node's :content during postwalk.

edgar.tables/row-cells — recursive subtree selection double-counting nested cells

  • Previously used (sel/select (sel/tag :th) tr-node) and (sel/select (sel/tag :td) tr-node), which select all matching elements in the entire subtree including nested tables. For filings with nested tables this double-counted cells and produced misaligned columns. Fixed by filtering direct <th> and <td> children from :content of each <tr> node.

[Unreleased — earlier batch]

Added

edgar.core

  • In-memory TTL cache on edgar-get. JSON responses are cached keyed by URL: 5 minutes for metadata endpoints, 1 hour for /api/xbrl/ endpoints. Cache is skipped when :raw? true. Evict with (edgar.core/clear-cache!).
  • Exponential backoff retry on 429/5xx responses. http-get-with-retry retries up to 3 times with delays of 2s → 4s → 8s. Throws ex-info with {:type ::http-error :status N :url "..."} on exhaustion or non-retryable 4xx. Applied to both edgar-get and edgar-get-bytes.

edgar.filing

  • filing-by-accession — hydrates a complete filing map from an accession number string. Accepts dashed format (0000320193-23-000106) or undashed (000032019323000106). Extracts the CIK from the first 10 digits, fetches the filing index, and returns {:cik :accessionNumber :form :primaryDocument :filingDate} ready for all downstream functions. Throws ex-info with {:type ::not-found} if the accession does not exist.

edgar.filings

  • latest-effective-filing — returns the most recent non-amended filing for a company and form type. If an amendment (e.g. 10-K/A) exists with a newer :filingDate than the original, the amendment is returned instead.

edgar.xbrl

  • get-concepts — returns a tech.ml.dataset with columns [:taxonomy :concept :label :description], one row per distinct XBRL concept available for a company. Useful for discovering what data is available before calling get-facts-dataset.
  • :label and :description columns added to all facts datasets. flatten-facts now preserves these fields from the XBRL taxonomy response. Affects get-facts-dataset and any downstream datasets built from it.

edgar.api

  • e/filing-by-accession — thin wrapper around filing/filing-by-accession.
  • e/latest-effective-filing — thin wrapper around filings/latest-effective-filing, defaulting :form to "10-K".
  • e/concepts — thin wrapper around xbrl/get-concepts, accepting ticker or CIK.

edgar.forms.form4

  • Form 4 parser (Statement of Changes in Beneficial Ownership / insider trades). Parses issuer, reporting owner, non-derivative and derivative transactions from XML. Registers filing-obj "4" via the standard multimethod. No new dependencies — uses only clojure.xml and clojure.string.

Changed

edgar.filings

  • get-filings now filters out amended filings (10-K/A, 10-Q/A, etc.) by default. Pass :include-amends? true to include them. This is a behaviour change: callers who previously received amendment forms in results and relied on them will need to add :include-amends? true.
  • get-filing gains :include-amends? option (default false), threading through to get-filings.
  • get-filing converted from positional [ticker-or-cik form] to keyword args [ticker-or-cik & {:keys [form n] :or {n 0}}]. Old call (get-filing "AAPL" "10-K") must be updated to (get-filing "AAPL" :form "10-K").

edgar.api

  • e/filings gains :include-amends? option (default false).
  • e/filing gains :include-amends? option (default false).
  • e/facts simplified — delegates filtering to xbrl/get-facts-dataset directly.
  • e/frame updated to new get-concept-frame keyword-arg signature.
  • e/panel passes :concept directly to multi-company-facts.
  • e/pivot unwrapped — pivot-wide now returns a dataset directly.
  • e/income, e/balance, e/cashflow, e/financials simplified — no longer pre-resolve CIK before delegating to edgar.financials.
  • Removed unused require entries for tech.v3.dataset and clojure.string.
  • Removed private coerce-concepts helper (logic moved into xbrl/get-facts-dataset).

edgar.download

  • download-filings! gains :skip-existing? option (default false). When true, checks whether the output file already exists before downloading; returns {:status :skipped :accession-number "..."} if so.
  • download-batch! :parallelism parameter is now honoured. Uses (partition-all parallelism tickers) + pmap per partition for bounded concurrency (previously ignored, used plain pmap).
  • download-batch! now passes :download-all? and :skip-existing? through to download-filings!.
  • All download functions (download-filings!, download-batch!, download-index!) now return structured result envelopes: {:status :ok :path "..."}, {:status :skipped :accession-number "..."}, or {:status :error :accession-number "..." :type ... :message "..."}. Previously returned bare strings or raw exception messages.

edgar.xbrl

  • get-facts-dataset now accepts & {:keys [concept form sort]}. :concept accepts a string or collection; :form filters by form type; :sort defaults to :desc (uses ds/reverse-rows), pass nil to skip.
  • facts-for-concept, annual-facts, quarterly-facts removed. Use get-facts-dataset options directly.
  • get-concept-frame signature changed from positional [taxonomy concept unit frame] to [concept frame & {:keys [taxonomy unit]}] with defaults taxonomy="us-gaap", unit="USD". Old call (get-concept-frame "us-gaap" "Assets" "USD" "CY2023Q4I") must be updated to (get-concept-frame "Assets" "CY2023Q4I").

edgar.financials

  • income-statement, balance-sheet, cash-flow now accept ticker or CIK interchangeably (call company/company-cik internally). Previously accepted CIK only.

edgar.dataset

  • multi-company-facts option renamed from :concepts to :concept for consistency. :concept accepts a string or collection.
  • pivot-wide now returns a tech.ml.dataset directly (previously returned a seq of maps).
  • cross-sectional-dataset updated to use new get-concept-frame keyword-arg signature.

edgar.extract

  • extract-items now returns full section body text, not just the heading node text. Detection algorithm rewritten: flattens the full hickory tree into a document-order node sequence, identifies item heading boundaries by matching item-pattern across heading-tags, deduplicates by keeping the last match per item-id (body heading wins over TOC entry), then extracts text from the node slice between consecutive boundaries.
  • Return shape changed from {item-id "text"} to {item-id {:title "..." :text "..." :method ...}}. Breaking change: callers that previously did (get result "7") and expected a string must now do (get-in result ["7" :text]).
  • extract-item return shape changed accordingly: returns {:title "..." :text "..." :method ...} or nil (previously a string or nil).
  • Plain-text fallback (extract-items-text) is now wired into the main extract-items dispatch path. Previously defined but never called.
  • :method key added to return maps: :html-heading-boundaries for modern HTML filings, :plain-text-regex for pre-2000 plain-text fallback.
  • batch-extract! arg order changed to [filing-seq output-dir & opts].
  • Removed unused hickory.zip and clojure.zip requires.

deps.edn

  • Moved five unused dependencies from core :deps to a new :future alias: next.jdbc, honeysql, sqlite-jdbc, malli, and datajure. None are referenced in any source file. Core install weight reduced by ~15 MB.

Fixed

edgar.company

  • search-companies no longer passes &forms=10-K to the EFTS query. Previously excluded companies that had never filed a 10-K.

edgar.filings

  • get-filings now fetches all submission chunks for active filers with >1000 filings. Previously only read [:filings :recent], silently truncating filing history for large filers such as AAPL.

edgar.financials

  • build-statement docstring corrected: returns long-format dataset, not wide. Directs users to (e/pivot (e/income "AAPL")) for wide format.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close