Benchmark suite
This command-line utility will simulate catalog submission for a population. It requires that a separate, running instance of PuppetDB for it to submit catalogs to.
We attempt to approximate a number of hosts submitting catalogs at the specified runinterval with the specified rate-of-churn in catalog content.
If are running up against the upper limit at which Benchmark can submit simulated requests, you can run multiple instances of benchmark and make use of the --offset flag to shift the cert numbers.
Example (probably run on completely separate hosts):
benchmark --offset 0 --numhosts 100000
benchmark --offset 100000 --numhosts 100000
benchmark --offset 200000 --numhosts 100000
...
By default, each time Benchmark is run, it initializes the host-map catalog, factset and report data randomly from the given set of base --catalogs --factsets and --reports files. When re-running benchmark, this causes excessive load on puppetdb due to the completely changed catalogs/factsets that must be processed.
To avoid this, set --simulation-dir to preserve all of the host map data between runs as nippy/frozen files. Benchmark will then load and initialize a preserved host matching a particular host-# from these files at startup. Missing hosts (if --numhosts exceeds preserved, for example) will be initialized randomly as by default.
The benchmark tool automatically refreshs timestamps and transaction ids when submitting catalogs, factsets and reports, but the content does not change.
To simulate system drift, code changes and fact changes, use '--rand-catalog=PERCENT_CHANCE:CHANGE_COUNT' and '--rand-facts=PERCENT_CHANCE:PERCENT_CHANGE'.
The former indicates the chance any given catalog will perform CHANGE_COUNT resource mutations (additions, modifications or deletions). The later is the chance any given factset will mutate PERCENT_CHANGE of its fact values. These may be set multiple times, provided that PERCENT_CHANCE does not sum to more than 100%.
By default edges are not included in catalogs. If --include-edges is true, then add-resource and del-resource will involve edges as well.
By ensuring we only ever delete leaves from the graph, we maintain the graph integrity, which is important since PuppetDB validates the edges on injestion.
This provides only limited exercise of edge mutation, which seemed like a reasonable trade-off given that edge submission is deprecated. Running with --include-edges also impacts the nature of catalog mutation, since original resources will never be removed from the catalog.
See add-resource, mod-resource and del-resource for details of resource and edge changes.
TODO: Fact addition/removal TODO: Mutating reports
There are benchmark metrics which can be viewed via JMX.
WARNING: DO NOT DO THIS WITH A PRODUCTION OR INTERNET-ACCESSIBLE INSTANCE! This gives remote access to the JVM internals, including potentially secrets. If you absolutely must (you don't), read about using certs with JMX to do it securely. You are better off using the metrics API or Grafana metrics exporter.
Add the following properties to your Benchmark Java process on startup:
-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=5555
-Djava.rmi.server.hostname=127.0.0.1
-Dcom.sun.management.jmxremote.rmi.port=5556
Then with a tool like VisualVM, you can add a JMX Connection, and (with the MBeans plugin) view puppetlabs.puppetdb.benchmark metrics.
Benchmark suite This command-line utility will simulate catalog submission for a population. It requires that a separate, running instance of PuppetDB for it to submit catalogs to. We attempt to approximate a number of hosts submitting catalogs at the specified runinterval with the specified rate-of-churn in catalog content. ### Running parallel Benchmarks If are running up against the upper limit at which Benchmark can submit simulated requests, you can run multiple instances of benchmark and make use of the --offset flag to shift the cert numbers. Example (probably run on completely separate hosts): ``` benchmark --offset 0 --numhosts 100000 benchmark --offset 100000 --numhosts 100000 benchmark --offset 200000 --numhosts 100000 ... ``` ### Preserving host-map data By default, each time Benchmark is run, it initializes the host-map catalog, factset and report data randomly from the given set of base --catalogs --factsets and --reports files. When re-running benchmark, this causes excessive load on puppetdb due to the completely changed catalogs/factsets that must be processed. To avoid this, set --simulation-dir to preserve all of the host map data between runs as nippy/frozen files. Benchmark will then load and initialize a preserved host matching a particular host-# from these files at startup. Missing hosts (if --numhosts exceeds preserved, for example) will be initialized randomly as by default. ### Mutating Catalogs and Factsets The benchmark tool automatically refreshs timestamps and transaction ids when submitting catalogs, factsets and reports, but the content does not change. To simulate system drift, code changes and fact changes, use '--rand-catalog=PERCENT_CHANCE:CHANGE_COUNT' and '--rand-facts=PERCENT_CHANCE:PERCENT_CHANGE'. The former indicates the chance any given catalog will perform CHANGE_COUNT resource mutations (additions, modifications or deletions). The later is the chance any given factset will mutate PERCENT_CHANGE of its fact values. These may be set multiple times, provided that PERCENT_CHANCE does not sum to more than 100%. By default edges are not included in catalogs. If --include-edges is true, then add-resource and del-resource will involve edges as well. * adding a resource adds a single 'contains' edge with the source being one of the catalog's original (non-added) resources. * deleting a resource removes one of the added resources (if there are any) and it's related leaf edge. By ensuring we only ever delete leaves from the graph, we maintain the graph integrity, which is important since PuppetDB validates the edges on injestion. This provides only limited exercise of edge mutation, which seemed like a reasonable trade-off given that edge submission is deprecated. Running with --include-edges also impacts the nature of catalog mutation, since original resources will never be removed from the catalog. See add-resource, mod-resource and del-resource for details of resource and edge changes. TODO: Fact addition/removal TODO: Mutating reports ### Viewing Metrics There are benchmark metrics which can be viewed via JMX. WARNING: DO NOT DO THIS WITH A PRODUCTION OR INTERNET-ACCESSIBLE INSTANCE! This gives remote access to the JVM internals, including potentially secrets. If you absolutely must (you don't), read about using certs with JMX to do it securely. You are better off using the metrics API or Grafana metrics exporter. Add the following properties to your Benchmark Java process on startup: ``` -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=5555 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.rmi.port=5556 ```` Then with a tool like VisualVM, you can add a JMX Connection, and (with the MBeans plugin) view puppetlabs.puppetdb.benchmark metrics.
(add-catalog-varying-fields catalog)This function adds the fields that change when there is a different catalog. code_id and catalog_uuid should be different whenever the catalog is different
This function adds the fields that change when there is a different catalog. code_id and catalog_uuid should be different whenever the catalog is different
(add-resource {:keys [original-keys include-edges] :as work-cat}
resource-to-clone)Adds a new resource. The new resource is built to be the same type and of a similar weight as the given resource. This helps keep the catalog relatively stable in overall weight when resources are dropped by del-resource.
If include-edges is true, a single leaf edge is created with a source from the given set of original-keys. This array is passed in and does not contain any of the 'clone-*' resources created by add-resource. This prevents nested relationships from forming between added resources, and in turn allows del-resource in the include-edges case to simply drop a cloned resource and its edge without breaking the graph validated by PuppetDB on injestion.
Adds a new resource. The new resource is built to be the same type and of a similar weight as the given resource. This helps keep the catalog relatively stable in overall weight when resources are dropped by del-resource. If include-edges is true, a single leaf edge is created with a source from the given set of original-keys. This array is passed in and does not contain any of the 'clone-*' resources created by add-resource. This prevents nested relationships from forming between added resources, and in turn allows del-resource in the include-edges case to simply drop a cloned resource and its edge without breaking the graph validated by PuppetDB on injestion.
(change-resources operation {:keys [resource-hash] :as work-cat})Dispatches resource change based on operation.
Makes two judgements:
If the selected resource has large blob parameters it routes an :add or :del operation to :mod so as to preserve the overall lumpiness of the catalog. Without this, over time, deletes could drop the blob resources, evening out catalogs unintentionally.
If there is only one resource a :del becomes a :mod.
NOTE: regarding uniformity, we probably need to revist this, since overtime, depending on the number of original resources, it grows more likely the catalog will reach 1 resource and thereafter all resources will be clones of that single resource.
Dispatches resource change based on operation. Makes two judgements: 1) If the selected resource has large blob parameters it routes an :add or :del operation to :mod so as to preserve the overall lumpiness of the catalog. Without this, over time, deletes could drop the blob resources, evening out catalogs unintentionally. 2) If there is only one resource a :del becomes a :mod. NOTE: regarding uniformity, we probably need to revist this, since overtime, depending on the number of original resources, it grows more likely the catalog will reach 1 resource and thereafter all resources will be clones of that single resource.
(cli args)Runs the benchmark command as directed by the command line args and returns an appropriate exit status.
Runs the benchmark command as directed by the command line args and returns an appropriate exit status.
(clone-resource resource)Build a new resource loosely based off the characteristics of the given resource.
Keeps type, tags, and approximate parameter size (in bytes).
Build a new resource loosely based off the characteristics of the given resource. Keeps type, tags, and approximate parameter size (in bytes).
(create-storage-dir simulation-dir)Returns a Path to the directory where simulation host-maps are stored.
If simulation-dir is set, then the path will be the absolute-path to simulation-dir. Otherwise a temporary directory will be created in tmpdir.
The directory is created as a side effect of calling this method if it does not already exist. Parent directories are not created.
Returns a Path to the directory where simulation host-maps are stored. If simulation-dir is set, then the path will be the absolute-path to simulation-dir. Otherwise a temporary directory will be created in tmpdir. The directory is created as a side effect of calling this method if it does not already exist. Parent directories are not created.
(del-resource {:keys [resource-hash include-edges] :as work-cat} rkey)Return the resource hash with chosen resource removed.
But if we have edges, instead choose a resource from the list of cloned resources (from add-resource actions). This is so we can just drop the single leaf edge associated with the cloned resource (we're careful in add-resource to only form a contain relation with original uncloned resources).
If no cloned resources are available to choose from, do nothing, so as not to break the graph.
Return the resource hash with chosen resource removed. But if we have edges, instead choose a resource from the list of cloned resources (from add-resource actions). This is so we can just drop the single leaf edge associated with the cloned resource (we're careful in add-resource to only form a contain relation with original uncloned resources). If no cloned resources are available to choose from, do nothing, so as not to break the graph.
(director base-url
ssl-opts
scheduler
{:keys [max-command-delay-ms] :as cmd-opts}
event-ch
seq-end)(jitter stamp n)jitter a timestamp (rand-int n) seconds in the forward direction
jitter a timestamp (rand-int n) seconds in the forward direction
(load-sample-data dir from-classpath?)Load all .json files contained in dir.
Load all .json files contained in `dir`.
(mod-resource {:keys [resource-hash] :as work-cat} rkey)Updates resource by touching parameters.
Updates resource by touching parameters.
(modify-title title prefix)Regenerate a title of the same size as the one given matching cli.generate/pseudonym format. The original ordinal is kept to help with debugging, and to avoid the cost of scanning for a next value.
NOTE: the minimum title-size is 20, but there is still a chance of duplicates in long running benchmarks.
Regenerate a title of the same size as the one given matching cli.generate/pseudonym format. The original ordinal is kept to help with debugging, and to avoid the cost of scanning for a next value. NOTE: the minimum title-size is 20, but there is still a chance of duplicates in long running benchmarks.
Functions that randomly change a catalog's resources.
Functions that randomly change a catalog's resources.
(populate-hosts n
offset
pdb-host
include-edges?
catalogs
reports
facts
storage-dir)Returns a lazy sequence of host info maps, reading the data from storage-dir when a suitable file exists, and deriving the data from catalogs, reports, and facts otherwise.
Returns a lazy sequence of host info maps, reading the data from storage-dir when a suitable file exists, and deriving the data from catalogs, reports, and facts otherwise.
(progressing-timestamp num-hosts num-msgs run-interval-minutes end-commands-in)Return a function that will return a timestamp that progresses forward in time.
Return a function that will return a timestamp that progresses forward in time.
(prune-host-info info factsets catalogs reports)Adjusts the info to match the current run, i.e. if the current run didn't specify --catalogs, then prune it. We might have extra data when using a simulation dir from a previous run with different arguments.
Adjusts the info to match the current run, i.e. if the current run didn't specify --catalogs, then prune it. We might have extra data when using a simulation dir from a previous run with different arguments.
(rand-catalog-mutation catalog randomize-count include-edges)Updates id fields that change with a catalog change, and makes randomize-count additions, modifications and/or removals of resources (and edges if include-edges is true).
Updates id fields that change with a catalog change, and makes randomize-count additions, modifications and/or removals of resources (and edges if include-edges is true).
(randomize-map-leaf leaf)Randomizes a fact leaf.
Randomizes a fact leaf.
(randomize-map-leaves rand-perc value)Runs through a map and randomizes a random percentage of leaves.
Runs through a map and randomizes a random percentage of leaves.
(rebuild-parameters parameters)Return resource parameters with changed keys and values of the same number and size.
In order to avoid key collisions, keys are rebuilt with at least five characters.
If changing keys still results in a collision, log an error and return the original parameters.
Return resource parameters with changed keys and values of the same number and size. In order to avoid key collisions, keys are rebuilt with at least five characters. If changing keys still results in a collision, log an error and return the original parameters.
(register-resource-counts numhosts)Setup a metric to track catalog resource counts based on numhosts.
Setup a metric to track catalog resource counts based on numhosts.
(resource-has-blob? resource)True if the given resource has a BLOB parameter value. Sample catalogs created by the PuppetDB Generate command may have 'content_blob_*' parameters with large values.
True if the given resource has a BLOB parameter value. Sample catalogs created by the PuppetDB Generate command may have 'content_blob_*' parameters with large values.
(send-commands options)Feeds commands to PDB as requested by args. Returns a map of :join, a function to wait for the benchmark process to terminate (only happens when you pass nummsgs), and :stop, function to request termination of the benchmark process and wait for it to stop cleanly. These functions return true if shutdown happened cleanly, or false if there was a timeout.
Feeds commands to PDB as requested by args. Returns a map of :join, a function to wait for the benchmark process to terminate (only happens when you pass nummsgs), and :stop, function to request termination of the benchmark process and wait for it to stop cleanly. These functions return true if shutdown happened cleanly, or false if there was a timeout.
(start-rate-monitor rate-monitor-ch run-interval commands-per-puppet-run _state)Start a task which monitors the rate of messages on rate-monitor-ch and prints it to the console every 5 seconds. Uses run-interval to compute the number of nodes that would produce that load.
Start a task which monitors the rate of messages on rate-monitor-ch and prints it to the console every 5 seconds. Uses run-interval to compute the number of nodes that would produce that load.
(start-simulation-loop numhosts
run-interval
num-msgs
end-commands-in
rand-catalogs
rand-facts
simulation-threads
sim-ch
host-info-ch
read-ch
&
{:keys [facts catalogs reports include-edges?
storage-dir]})Run a background process which takes host-state maps from read-ch, updates them with update-host, and puts them on write-ch. If num-msgs is not given, uses numhosts and run-interval to run the simulation at a reasonable rate. Close read-ch to terminate the background process.
Run a background process which takes host-state maps from read-ch, updates them with update-host, and puts them on write-ch. If num-msgs is not given, uses numhosts and run-interval to run the simulation at a reasonable rate. Close read-ch to terminate the background process.
Return a new parameter of about the same size and type.
TODO: handle arrays and maps.
Return a new parameter of about the same size and type. TODO: handle arrays and maps.
(touch-parameters parameters)Return resource parameters with one value changed. Size is the same.
Return resource parameters with one value changed. Size is the same.
(try-load-file file)Attempt to read and parse the JSON in file. If this failed, an error is
logged, and nil is returned.
Attempt to read and parse the JSON in `file`. If this failed, an error is logged, and nil is returned.
(update-catalog catalog include-edges rand-catalogs uuid stamp)Updates catalog timestamps and transaction UUIDS that vary with every catalog run. Depending on settings in the rand-catalogs array, may make additional random changes to catalog resources.
Updates catalog timestamps and transaction UUIDS that vary with every catalog run. Depending on settings in the rand-catalogs array, may make additional random changes to catalog resources.
(update-factset factset rand-facts stamp)Updates the producer_timestamp to be current, and randomly updates the leaves
of the factset based on a percentage provided in rand-percentage.
Updates the producer_timestamp to be current, and randomly updates the leaves of the factset based on a percentage provided in `rand-percentage`.
(update-host {:keys [_host catalog report factset] :as state}
include-edges
rand-catalogs
rand-facts
get-timestamp)Perform a simulation step on host-map. Always update timestamps and uuids; randomly mutate other data depending on rand-catalogs and rand-facts
Perform a simulation step on host-map. Always update timestamps and uuids; randomly mutate other data depending on rand-catalogs and rand-facts
(update-report report uuid stamp)configuration_version, start_time and end_time should always change on subsequent report submittions, this changes those fields to avoid computing the same hash again (causing constraint errors in the DB)
configuration_version, start_time and end_time should always change on subsequent report submittions, this changes those fields to avoid computing the same hash again (causing constraint errors in the DB)
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |