Liking cljdoc? Tell your friends :D

corona.query


default-term-vectors-settingsclj

source

format-paramclj

(format-param p)
source

format-paramsclj

(format-params m)
source

format-valuesclj

(format-values v)
source

interesting-terms-per-field->qclj

(interesting-terms-per-field->q interesting-terms-per-field)
source

mlt-ids->tv-qclj

(mlt-ids->tv-q mlt-ids & [mlt-field-name])
source

mlt-keysclj

source

mlt-terms->qclj

(mlt-terms->q mlt-q mlt-terms & [q])
source

partition-kvsclj

(partition-kvs [k v & rst])
source

qualified-term?clj

(qualified-term? term tf df mintf mindf minwl)
source

queryclj

(query client-config settings)

Makes and executes solr query from setting map Uses solr /select route. Returns decoded response of solr service.

Makes and executes solr query from setting map
Uses solr /select route.
Returns decoded response of solr service.
sourceraw docstring

query-handlerclj

(query-handler client-config handler settings)
source

query-mltclj

(query-mlt client-config settings)

A MoreLikeThis query that uses MLT request handler (/mlt route) to give back similar results to a matching document identified in the query under :q (e.g. {:q id:12345}.)

From the specified document, MLT handler will build a query behind the scenes, by searching for 'interesting terms' from fields specified under :fl key.

PriorityQueue is used to fetch the scores for all the terms, which are then added as boost queries to a large set of terms in a boolean query, where each term is set to SHOULD occur. That way the terms are boosted based on MLT semantics, while it uses the ClassicSimilarity behind the scenes.

These values will be used to build the boost term queries: tq = new BoostQuery(tq, boostFactor * myScore / bestScore); e.g. Queue = Term1:100 , Term2:50, Term3:20, Term4:10 => Term1:10 , Term2:5, Term3:2, Term4:1

settings map:

:q <string> default: ":" (everything) Query terms

:fq Filter query, this does not affect the search, only what gets returned

:mlt.fl <string>, default: "contents" The fields to use for similarity. NOTE: if possible use stored TermVectors in the managedschema file for fields (e.g. <field name="cat" ... termVectors="true" />) If termVectors are not stored, MoreLikeThis will generate terms from stored fields.

:mlt.mintf <int>, default: 2 Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. NOTE: Getting good MLT results require some fine-tuning based on experimentation, in particular mlt.mintf. Start low and slowly increase until you start getting results that "feel right".

:mlt.mindf <int>, default: 5 Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs.

:mlt.minwl <int>, default: 0 Minimum word length below which words will be ignored.

:mlt.maxwl <int>, default: 0 Maximum word length above which words will be ignored.

:mlt.maxqt <int>, default: 25 Maximum number of query terms that will be included in any generated query.

:mlt.maxntp <int>, default: 5000 Maximum number of tokens to parse in each example doc field that is not stored with TermVector support.

:mlt.boost <bool>, default: false [true/false] set if the query will be boosted by the interesting term relevance.

:mlt.qf Query fields and their boosts using the same format as that used in DisMaxQParserPlugin. These fields must also be specified in mlt.fl.

:mlt.match.include <bool>, default: true Specifies whether or not the response should include the matched document under :match key.

:mlt.match.offset Specifies an offset into the main query search results to locate the document on which the MoreLikeThis query should operate. By default, the query operates on the first result for the q parameter.

:mlt.interestingTerms <["list", "none", "details"]> Controls how the MoreLikeThis component presents the "interesting" terms (the top TF/IDF terms) for the query. Supports three values.

  • "list" : lists the terms.
  • "none" : lists no terms.
  • "details": lists the terms along with the boost value used for each term. Unless mlt.boost=true, all terms will have boost=1.0.

:fl Fields to return. We force 'id' to be returned so that there is a unique identifier with each record.

:wt <enum>, default: "json" Data type returned.

:start <int>, default: 0 Record to start at

:rows <int>, default: 10 Number of records to return.

A MoreLikeThis query that uses MLT request handler (/mlt route) to give back
similar results to a matching document identified in the query under :q
(e.g. {:q id:12345}.)

From the specified document, MLT handler will build a query behind the scenes,
by searching for 'interesting terms' from fields specified under :fl key.

PriorityQueue is used to fetch the scores for all the terms, which are then
added as boost queries to a large set of terms in a boolean query, where each
term is set to SHOULD occur. That way the terms are boosted based on MLT
semantics, while it uses the ClassicSimilarity behind the scenes.

These values will be used to build the boost term queries:
tq = new BoostQuery(tq, boostFactor * myScore / bestScore); 
 e.g. Queue = Term1:100 , Term2:50, Term3:20, Term4:10 
 => Term1:10 , Term2:5, Term3:2, Term4:1 

settings map:

:q <string> default: "*:*" (everything)
Query terms

:fq
Filter query, this does not affect the search, only what gets returned

:mlt.fl <string>, default: "contents"
The fields to use for similarity. 
NOTE: if possible use stored TermVectors in the managedschema file for fields
(e.g. <field name="cat" ... termVectors="true" />)
If termVectors are not stored, MoreLikeThis will generate terms from stored fields.

:mlt.mintf <int>, default: 2
Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc. 
NOTE: Getting good MLT results require some fine-tuning based on experimentation,
in particular mlt.mintf. Start low and slowly increase until you start getting
results that "feel right".

:mlt.mindf <int>, default: 5
Minimum Document Frequency - the frequency at which words will be
ignored which do not occur in at least this many docs.

:mlt.minwl <int>, default: 0
Minimum word length below which words will be ignored.

:mlt.maxwl <int>, default: 0
Maximum word length above which words will be ignored.

:mlt.maxqt <int>, default: 25
Maximum number of query terms that will be included in any generated query.

:mlt.maxntp <int>, default: 5000
Maximum number of tokens to parse in each example doc field that is not stored
with TermVector support.

:mlt.boost <bool>, default: false
[true/false] set if the query will be boosted by the interesting term relevance.

:mlt.qf
Query fields and their boosts using the same format as that used in
DisMaxQParserPlugin. These fields must also be specified in mlt.fl.

:mlt.match.include <bool>, default: true
Specifies whether or not the response should include the matched document
under :match key.

:mlt.match.offset
Specifies an offset into the main query search results to locate the document
on which the MoreLikeThis query should operate. By default, the query operates
on the first result for the q parameter.

:mlt.interestingTerms <["list", "none", "details"]>
Controls how the MoreLikeThis component presents the "interesting" terms
(the top TF/IDF terms) for the query. Supports three values.
- "list" : lists the terms.
- "none" : lists no terms.
- "details": lists the terms along with the boost value used for each term.
Unless mlt.boost=true, all terms will have boost=1.0.

:fl
Fields to return. We force 'id' to be returned so that there is a unique
identifier with each record.

:wt <enum>, default: "json"
Data type returned.

:start <int>, default: 0
Record to start at

:rows <int>, default: 10
Number of records to return.
sourceraw docstring

query-mlt-tv-edismaxclj

(query-mlt-tv-edismax client-config settings)

Like more like this handler query or query-mlt but

Special settings:

:mlt.field <string>, default: "id" The name of the id field

:mlt.ids A list of ids and boosts e.g. [["12345" 3] ["12346" 2]]

:mlt.top <int> The number of top interesting terms to use, per field.

:q "Regular edismax query" that is added to mlt query

:route, default: searches all shards The value will be hashed to find which shards to search for similar items.

:original-documents_route_, default: searches all shards The value will be hashed to find which which shards the mlt.ids belong to.

Special vars:

${mltq} This is the computed interesting-term query you can pass in. e.g. {!boost b=recip(ms(NOW,date),3.16e-11,1,1)^100 v="{!lucene v='(${mltq})'}"}

Supported mlt keys: :mlt.fl :mlt.mintf :mlt.mindf :mlt.minwl :mlt.boost :mlt.qf

IMPORTANT: All mlt.fl fields MUST be set as TermVectors=true in the managedschema for the mlt query to be integrated to main q.

Like more like this handler query or `query-mlt` but

- takes top-k terms *PER FIELD*, for more explanations, see
  https://github.com/DiceTechJobs/RelevancyFeedback#isnt-this-just-the-mlt-handler

- allows edismax params (e.g. `:boost` `:bf` `:bq` `:qf`)
  NOTE: To better understand boosting methods, see
  https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/

Special settings:

:mlt.field <string>, default: "id"
The name of the id field

:mlt.ids
A list of ids and boosts e.g. [["12345" 3] ["12346" 2]]

:mlt.top <int> 
The number of top interesting terms to use, per field.

:q
"Regular edismax query" that is added to mlt query

:_route_, default: searches all shards
The value will be hashed to find which shards to search for similar items.

:original-documents_route_, default: searches all shards
The value will be hashed to find which which shards the mlt.ids belong to.

Special vars:

${mltq}
This is the computed interesting-term query you can pass in.
e.g. {!boost b=recip(ms(NOW,date),3.16e-11,1,1)^100 v="{!lucene v='(${mltq})'}"}

Supported mlt keys:
:mlt.fl
:mlt.mintf
:mlt.mindf
:mlt.minwl
:mlt.boost
:mlt.qf

IMPORTANT: All mlt.fl fields MUST be set as TermVectors=true in the managedschema
for the mlt query to be integrated to main q.
sourceraw docstring

query-term-vectorsclj

(query-term-vectors client-config settings)

Settings

:tv <bool>, default: false If true, the Term Vector Component will run.

:tv.docIds <sequential> For a list of Lucene document IDs (not the Solr Unique Key), term vectors will be returned.

:tv.fl <vector> For a given list of fields, term vectors will be returned. If not specified, the fl parameter is used.

:tv.all <bool>, default: false If true, all the boolean parameters listed below (tv.df, tv.offsets, tv.positions, tv.payloads, tv.tf and tv.tf_idf) will be enabled.

:tv.df <bool>, default: false If true, returns the Document Frequency (DF) of the term in the collection. This can be computationally expensive.

:tv.offsets <bool>, default: false If true, returns offset information for each term in the document.

:tv.positions <bool>, default: false If true, returns position information.

:tv.payloads <bool>, default: false If true, returns payload information.

:tv.tf <bool>, default: false If true, returns document term frequency info for each term in the document.

:tv.tf_idf <bool>, default: false If true, calculates TF / DF (i.e.,: TF * IDF) for each term. Please note that this is a literal calculation of "Term Frequency multiplied by Inverse Document Frequency" and not a classical TF-IDF similarity measure. This parameter requires both tv.tf and tv.df to be "true". This can be computationally expensive. (The results are not shown in example output)

Settings

:tv <bool>, default: false
If true, the Term Vector Component will run.

:tv.docIds <sequential>
For a list of Lucene document IDs (not the Solr Unique
Key), term vectors will be returned.

:tv.fl <vector>
For a given list of fields, term vectors will be returned.
If not specified, the fl parameter is used.

:tv.all <bool>, default: false
If true, all the boolean parameters listed below (tv.df, tv.offsets,
tv.positions, tv.payloads, tv.tf and tv.tf_idf) will be enabled.

:tv.df <bool>, default: false
If true, returns the Document Frequency (DF) of the term in the collection.
This can be computationally expensive.

:tv.offsets <bool>, default: false
If true, returns offset information for each term in the document.

:tv.positions <bool>, default: false
If true, returns position information.

:tv.payloads <bool>, default: false
If true, returns payload information.

:tv.tf <bool>, default: false
If true, returns document term frequency info for each term in the document.

:tv.tf_idf <bool>, default: false
If true, calculates TF / DF (i.e.,: TF * IDF) for each term. Please note that
this is a literal calculation of "Term Frequency multiplied by Inverse
Document Frequency" and not a classical TF-IDF similarity measure.
This parameter requires both tv.tf and tv.df to be "true". This can be
computationally expensive. (The results are not shown in example output)
sourceraw docstring

term-vectors-resp->interesting-terms-per-fieldclj

(term-vectors-resp->interesting-terms-per-field
  tv-resp
  &
  [{qf :mlt.qf
    ids :mlt.ids
    top :mlt.top
    boost :mlt.boost
    mintf :mlt.mintf
    mindf :mlt.mindf
    minwl :mlt.minwl
    :or {top 15 mintf 1 mindf 3 minwl 3}}])

Digests the response from tvrh handler and creates a interestingTerms map per matching document using mlt special keys.

Digests the response from tvrh handler and creates a interestingTerms map
per matching document using mlt special keys.
sourceraw docstring

terms->qclj

(terms->q terms)
sourceraw docstring

terms-per-field->qclj

(terms-per-field->q terms-map)
source

tf-idfclj

(tf-idf tf df)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close