(cursor-adjust s idx direction)
given a string, a cursor position (idx), and a direction, give the new position of the cursor that that is on the boundary of the actual letters
given a string, a cursor position (idx), and a direction, give the new position of the cursor that that is on the boundary of the actual letters
(get-in-trie trie sq)
return the corresponding value from the trie -- either the combined version of the input seq, or the value attached to the terminus of the input seq in the trie
return the corresponding value from the trie -- either the combined version of the input seq, or the value attached to the terminus of the input seq in the trie
(in-trie? sq)
(in-trie? trie sq)
return whether the sequence exists in the trie
return whether the sequence exists in the trie
a wrapper around the native fn call that gives the index of the first occurrence of a particular substring
a wrapper around the native fn call that gives the index of the first occurrence of a particular substring
(letter-before? s1 s2)
a 2-arg predicate indicating whether the first string comes before the second string, but assuming that each string will only represent individual letters
a 2-arg predicate indicating whether the first string comes before the second string, but assuming that each string will only represent individual letters
a comparator for strings that represent a single letter that respects தமிழ் alphabetical order
a comparator for strings that represent a single letter that respects தமிழ் alphabetical order
a flattened seq of all தமிழ் letters in lexicographical (alphabetical) order -- put anohter way, in the order of அகர முதல் னரக இறுவாய் as the 2500 yr old grammatical compendium தொல்காப்பியம் states in its outset
a flattened seq of all தமிழ் letters in lexicographical (alphabetical) order -- put anohter way, in the order of அகர முதல் னரக இறுவாய் as the 2500 yr old grammatical compendium தொல்காப்பியம் states in its outset
(make-trie sequence)
take a sequence (may be nested) of input sequences, or else takes a map (single-level) where keys are sequences and vals are attached to the terminus in trie. fn creates a trie, represented as a nested map.
take a sequence (may be nested) of input sequences, or else takes a map (single-level) where keys are sequences and vals are attached to the terminus in trie. fn creates a trie, represented as a nested map.
a map whose keys are தமிழ் letters and whose values are sequences of the constituent phonemes (represented as strings) of those letters. letters are from the set {உயிர்-, மெய்-, உயிர்மெய்-}எழுத்துகள், phonemes are from the set {உயிர்-,மெய்-}எழுத்துகள்
a map whose keys are தமிழ் letters and whose values are sequences of the constituent phonemes (represented as strings) of those letters. letters are from the set {உயிர்-, மெய்-, உயிர்மெய்-}எழுத்துகள், phonemes are from the set {உயிர்-,மெய்-}எழுத்துகள்
a trie of the individual letters in தமிழ், whose terminus-attached values are sequences of each letter's phonemes -- this trie can be used in str->elems for directly splitting a word into its phonemes
a trie of the individual letters in தமிழ், whose terminus-attached values are sequences of each letter's phonemes -- this trie can be used in str->elems for directly splitting a word into its phonemes
(phonemes->str phoneme-seq)
given a seq of phonemes, create a string where the phonemes are combined into their proper letters
given a seq of phonemes, create a string where the phonemes are combined into their proper letters
(prefix? str1 str2)
return whether the 2nd word is a prefix of the 1st word, based on தமிழ் phonemes
return whether the 2nd word is a prefix of the 1st word, based on தமிழ் phonemes
(seq-index-of tgt qry)
given a target seq and a query seq, return the 0-based index of the first occurrence of the query seq appearing inside the target seq, or else return -1 (is that Clojure-y, or is returning nil more Clojure-y?) calls seq-prefix? at every index -- only realizes the target seq as needed, pulls query seq into memory
given a target seq and a query seq, return the 0-based index of the first occurrence of the query seq appearing inside the target seq, or else return -1 (is that Clojure-y, or is returning nil more Clojure-y?) calls seq-prefix? at every index -- only realizes the target seq as needed, pulls query seq into memory
(seq-prefix seq1 seq2)
return the shared prefix between the 2 input sequence
return the shared prefix between the 2 input sequence
(seq-prefix? tgt qry)
return whether the query seq is a prefix of the target
return whether the query seq is a prefix of the target
a map where the key is a தமிழ் letter, and the value is a number indicating its relative position in sort order
a map where the key is a தமிழ் letter, and the value is a number indicating its relative position in sort order
(str->elems s)
(str->elems trie s & [{:keys [transform] :as opts}])
take a string and split it into chunks based on the input trie. for every maximally long sequence in the trie that is detected in the input string, the terminus-attached value is added to the output sequence if it exists (ex: useful for transliteration / format conversion), or else the string chunk itself is added.
take a string and split it into chunks based on the input trie. for every maximally long sequence in the trie that is detected in the input string, the terminus-attached value is added to the output sequence if it exists (ex: useful for transliteration / format conversion), or else the string chunk itself is added.
(str->letters s)
take a string and split it into its constitutent தமிழ் + non-complex letters (non-complex = all left-to-right, 1-to-1 codepoint-to-glyph encodings -- this includes all Western languages)
take a string and split it into its constitutent தமிழ் + non-complex letters (non-complex = all left-to-right, 1-to-1 codepoint-to-glyph encodings -- this includes all Western languages)
(str->phonemes s)
take a string and split it into its constitutent தமிழ் phonemes
take a string and split it into its constitutent தமிழ் phonemes
(suffix? str1 str2)
return whether the 2nd word is a suffix of the 1st word, based on தமிழ் phonemes
return whether the 2nd word is a suffix of the 1st word, based on தமிழ் phonemes
(trie-prefix-subtree trie sq)
take a trie and a sequence, look up the sequence in the trie, and return the subtree
take a trie and a sequence, look up the sequence in the trie, and return the subtree
(whitespace? ch)
returns whether a Java Character a.k.a. Unicode codepoint is whitespace or not (according to Java's understanding of Unicode)
returns whether a Java Character a.k.a. Unicode codepoint is whitespace or not (according to Java's understanding of Unicode)
(word-before? str1 str2)
a 2-arg predicate indicating whether the first string comes before the second string lexicographically, handling தமிழ் letters in addition to 1-to-1 codepoint-to-letter encodings
a 2-arg predicate indicating whether the first string comes before the second string lexicographically, handling தமிழ் letters in addition to 1-to-1 codepoint-to-letter encodings
a comparator for lexicographical comparisons of arbitrary strings (consisting of தமிழ் letters and letters from 1-to-1 encodings)
a comparator for lexicographical comparisons of arbitrary strings (consisting of தமிழ் letters and letters from 1-to-1 encodings)
(wordy-char? ch)
take a Java Character a.k.a. Unicode codepoint and return whether it represents a character that might go into a word or identifier. In other words, it is for Unicode like what \w has representing in regular expressions for ASCII characters -- which is alpha-numeric characters
take a Java Character a.k.a. Unicode codepoint and return whether it represents a character that might go into a word or identifier. In other words, it is for Unicode like what \w has representing in regular expressions for ASCII characters -- which is alpha-numeric characters
(wordy-chunk-and-cursor-pos s idx)
given a string and an index number that the cursor is on or before, return the wordy chunk that the cursor is in the middle of, and the cursor pos relative to the chunk. if cursor is before or after a word, or at the beginning or end of string, return a falsey value (ex: nil). accepts idx being at end of string (idx == (count s)).
given a string and an index number that the cursor is on or before, return the wordy chunk that the cursor is in the middle of, and the cursor pos relative to the chunk. if cursor is before or after a word, or at the beginning or end of string, return a falsey value (ex: nil). accepts idx being at end of string (idx == (count s)).
(wordy-seq s)
take a string and produce a seq of the Unicode-aware version of the \w+ regex pattern - basically, split input string into all chunks of non-whitepsace. Originally, I called this fn word-seq, but that is not true for all languages and/or throughout time where there was no spearation between words (ex: Thai, Chinese, Japanese, Latin manuscripts, ancient Thamil stone inscriptions, etc.)
take a string and produce a seq of the Unicode-aware version of the \w+ regex pattern - basically, split input string into all chunks of non-whitepsace. Originally, I called this fn word-seq, but that is not true for all languages and/or throughout time where there was no spearation between words (ex: Thai, Chinese, Japanese, Latin manuscripts, ancient Thamil stone inscriptions, etc.)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close