clj-thamil.format

Liking cljdoc? Tell your friends :D

All platforms.

c-cv-letters^clj/s

consonants^clj/s

cursor-adjust^clj/s

(cursor-adjust s idx direction)

given a string, a cursor position (idx), and a direction, give the new position of the cursor that that is on the boundary of the actual letters

given a string, a cursor position (idx), and a direction, give the new position of the cursor that that is on the boundary of the actual letters

raw docstring

get-in-trie^clj/s

(get-in-trie trie sq)

return the corresponding value from the trie -- either the combined version of the input seq, or the value attached to the terminus of the input seq in the trie

return the corresponding value from the trie -- either the combined version of the input seq, or the value attached to the terminus of the input seq in the trie

raw docstring

in-trie?^clj/s

(in-trie? sq)

(in-trie? trie sq)

return whether the sequence exists in the trie

return whether the sequence exists in the trie

raw docstring

index-of^clj/s

a wrapper around the native fn call that gives the index of the first occurrence of a particular substring

a wrapper around the native fn call that gives the index of the first occurrence of a particular substring

raw docstring

inverse-phoneme-map^clj/s

letter-before?^clj/s

(letter-before? s1 s2)

a 2-arg predicate indicating whether the first string comes before the second string, but assuming that each string will only represent individual letters

a 2-arg predicate indicating whether the first string comes before the second string, but assuming that each string will only represent individual letters

raw docstring

letter-comp^clj/s

a comparator for strings that represent a single letter that respects தமிழ் alphabetical order

a comparator for strings that represent a single letter that respects தமிழ் alphabetical order

raw docstring

letter-seq^clj/s

a flattened seq of all தமிழ் letters in lexicographical (alphabetical) order -- put anohter way, in the order of அகர முதல் னரக இறுவாய் as the 2500 yr old grammatical compendium தொல்காப்பியம் states in its outset

a flattened seq of all தமிழ் letters in lexicographical (alphabetical) order -- put anohter way, in the order of அகர முதல் னரக இறுவாய் as the 2500 yr old grammatical compendium தொல்காப்பியம் states in its outset

raw docstring

letters^clj/s

make-trie^clj/s

(make-trie sequence)

take a sequence (may be nested) of input sequences, or else takes a map (single-level) where keys are sequences and vals are attached to the terminus in trie. fn creates a trie, represented as a nested map.

take a sequence (may be nested) of input sequences, or else takes a map (single-level) where keys are sequences and vals are attached to the terminus in trie. fn creates a trie, represented as a nested map.

raw docstring

phoneme-map^clj/s

a map whose keys are தமிழ் letters and whose values are sequences of the constituent phonemes (represented as strings) of those letters. letters are from the set {உயிர்-, மெய்-, உயிர்மெய்-}எழுத்துகள், phonemes are from the set {உயிர்-,மெய்-}எழுத்துகள்

a map whose keys are தமிழ் letters and whose values are sequences of the constituent phonemes (represented as strings) of those letters. letters are from the set {உயிர்-, மெய்-, உயிர்மெய்-}எழுத்துகள், phonemes are from the set {உயிர்-,மெய்-}எழுத்துகள்

raw docstring

phoneme-trie^clj/s

a trie of the individual letters in தமிழ், whose terminus-attached values are sequences of each letter's phonemes -- this trie can be used in str->elems for directly splitting a word into its phonemes

a trie of the individual letters in தமிழ், whose terminus-attached values are sequences of each letter's phonemes -- this trie can be used in str->elems for directly splitting a word into its phonemes

raw docstring

phonemes->str^clj/s

(phonemes->str phoneme-seq)

given a seq of phonemes, create a string where the phonemes are combined into their proper letters

given a seq of phonemes, create a string where the phonemes are combined into their proper letters

raw docstring

prefix?^clj/s

(prefix? str1 str2)

return whether the 2nd word is a prefix of the 1st word, based on தமிழ் phonemes

return whether the 2nd word is a prefix of the 1st word, based on தமிழ் phonemes

raw docstring

seq-index-of^clj/s

(seq-index-of tgt qry)

given a target seq and a query seq, return the 0-based index of the first occurrence of the query seq appearing inside the target seq, or else return -1 (is that Clojure-y, or is returning nil more Clojure-y?) calls seq-prefix? at every index -- only realizes the target seq as needed, pulls query seq into memory

given a target seq and a query seq, return the 0-based index of the first occurrence of the query seq appearing inside the target seq, or else return -1 (is that Clojure-y, or is returning nil more Clojure-y?)
calls seq-prefix? at every index -- only realizes the target seq as needed, pulls query seq into memory

raw docstring

seq-prefix^clj/s

(seq-prefix seq1 seq2)

return the shared prefix between the 2 input sequence

return the shared prefix between the 2 input sequence

raw docstring

seq-prefix?^clj/s

(seq-prefix? tgt qry)

return whether the query seq is a prefix of the target

return whether the query seq is a prefix of the target

raw docstring

sort-map^clj/s

a map where the key is a தமிழ் letter, and the value is a number indicating its relative position in sort order

a map where the key is a தமிழ் letter, and the value is a number indicating its relative position in sort order

raw docstring

str->elems^clj/s

(str->elems s)

(str->elems trie s & [{:keys [transform] :as opts}])

take a string and split it into chunks based on the input trie. for every maximally long sequence in the trie that is detected in the input string, the terminus-attached value is added to the output sequence if it exists (ex: useful for transliteration / format conversion), or else the string chunk itself is added.

take a string and split it into chunks based on the input trie.  for every maximally long sequence in the trie that is detected in the input string, the terminus-attached value is added to the output sequence if it exists (ex: useful for transliteration / format conversion), or else the string chunk itself is added.

raw docstring

str->letters^clj/s

(str->letters s)

take a string and split it into its constitutent தமிழ் + non-complex letters (non-complex = all left-to-right, 1-to-1 codepoint-to-glyph encodings -- this includes all Western languages)

take a string and split it into its constitutent தமிழ் + non-complex letters (non-complex = all left-to-right, 1-to-1 codepoint-to-glyph encodings -- this includes all Western languages)

raw docstring

str->phonemes^clj/s

(str->phonemes s)

take a string and split it into its constitutent தமிழ் phonemes

take a string and split it into its constitutent தமிழ் phonemes

raw docstring

suffix?^clj/s

(suffix? str1 str2)

return whether the 2nd word is a suffix of the 1st word, based on தமிழ் phonemes

return whether the 2nd word is a suffix of the 1st word, based on தமிழ் phonemes

raw docstring

trie-prefix-subtree^clj/s

(trie-prefix-subtree trie sq)

take a trie and a sequence, look up the sequence in the trie, and return the subtree

take a trie and a sequence, look up the sequence in the trie, and return the subtree

raw docstring

vowels^clj/s

whitespace?^clj/s

(whitespace? ch)

returns whether a Java Character a.k.a. Unicode codepoint is whitespace or not (according to Java's understanding of Unicode)

returns whether a Java Character a.k.a. Unicode codepoint is whitespace or not (according to Java's understanding of Unicode)

raw docstring

word-before?^clj/s

(word-before? str1 str2)

a 2-arg predicate indicating whether the first string comes before the second string lexicographically, handling தமிழ் letters in addition to 1-to-1 codepoint-to-letter encodings

a 2-arg predicate indicating whether the first string comes before the second string lexicographically, handling தமிழ் letters in addition to 1-to-1 codepoint-to-letter encodings

raw docstring

word-comp^clj/s

a comparator for lexicographical comparisons of arbitrary strings (consisting of தமிழ் letters and letters from 1-to-1 encodings)

a comparator for lexicographical comparisons of arbitrary strings (consisting of தமிழ் letters and letters from 1-to-1 encodings)

raw docstring

wordy-char?^clj/s

(wordy-char? ch)

take a Java Character a.k.a. Unicode codepoint and return whether it represents a character that might go into a word or identifier. In other words, it is for Unicode like what \w has representing in regular expressions for ASCII characters -- which is alpha-numeric characters

take a Java Character a.k.a. Unicode codepoint and return whether it represents a character that might go into a word or identifier.  In other words, it is for Unicode like what \w has representing in regular expressions for ASCII characters -- which is alpha-numeric characters

raw docstring

wordy-chunk-and-cursor-pos^clj/s

(wordy-chunk-and-cursor-pos s idx)

given a string and an index number that the cursor is on or before, return the wordy chunk that the cursor is in the middle of, and the cursor pos relative to the chunk. if cursor is before or after a word, or at the beginning or end of string, return a falsey value (ex: nil). accepts idx being at end of string (idx == (count s)).

given a string and an index number that the cursor is on or before, return the wordy chunk that the cursor is in the middle of, and the cursor pos relative to the chunk. if cursor is before or after a word, or at the beginning or end of string, return a falsey value (ex: nil).  accepts idx being at end of string (idx == (count s)).

raw docstring

wordy-chunk-under^clj/s

wordy-seq^clj/s

(wordy-seq s)

take a string and produce a seq of the Unicode-aware version of the \w+ regex pattern - basically, split input string into all chunks of non-whitepsace. Originally, I called this fn word-seq, but that is not true for all languages and/or throughout time where there was no spearation between words (ex: Thai, Chinese, Japanese, Latin manuscripts, ancient Thamil stone inscriptions, etc.)

take a string and produce a seq of the Unicode-aware version of the \w+ regex pattern - basically, split input string into all chunks of non-whitepsace.  Originally, I called this fn word-seq, but that is not true for all languages and/or throughout time where there was no spearation between words (ex: Thai, Chinese, Japanese, Latin manuscripts, ancient Thamil stone inscriptions, etc.)

raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close