Configuration


Configuration can be specified in environment variables (a limited set of options), in a TOML config file or passed into monstache as program arguments on the command line.

Environment variables names can be suffixed with __FILE. In this case the value of the environment variable will be interpreted as a file path. Monstache will attempt to read the file at that path and use the contents of the file as the value of the variable.

Note

Command line arguments take precedance over environment variables which in turn take precedance over the TOML config file. You can verify the final configuration used by Monstache by running monstache with -print-config.

Warning

Keep simple one-line configs above any TOML table definitions in your config file. A TOML table is only ended by another TOML table or the end of the file. Anything below a TOML table will be interpreted to be part of the table by the parser unless it is ended. See the following Issue 58 for more information.

aws-connect

TOML table (default nil)

Enable support for using a connection to Elasticsearch that uses AWS Signature Version 4

strategy

int (default 0)

The stategy used to configure the AWS credential provider. The 0 strategy is static and uses the values of access-key and secret-key. The 1 strategy is file based and loads the credentials from the credentials-file setting or from value of the standard AWS_SHARED_CREDENTIALS_FILE, or ~/.aws/credentials. The 2 strategy loads the credentials from the standard AWS environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The 3 strategy loads the credentials from the default remote endpoints such as EC2 or ECS roles. The 4 strategy chains together strategies 1-3 and uses the first strategy that returns a credential.

credentials-file

string (default "~/.aws/credentials")

The credentials file to use. Normally, you need not set this as it will come from either AWS_SHARED_CREDENTIALS_FILE or default to ~/.aws/credentials.

profile

string (default "")

The AWS profile to use from the credentials file. If not provided a profile named default will be used.

watch-credentials

bool (default false)

Set to true to put a watch on the credentials-watch-dir. When a file in the watch dir is changed monstache will invalidate the credentials such that they will be re-established on the next request to Elasticsearch.

credentials-watch-dir

string (default "~/.aws")

The path to a directory to watch for changes if watch-credentials is enabled.

force-expire

string (default "")

A golang duration string, e.g. 5m. If given, monstache will force expire the credentials on this interval.

access-key

string (default "") (env var name MONSTACHE_AWS_ACCESS_KEY)

AWS Access Key

secret-key

string (default "") (env var name MONSTACHE_AWS_SECRET_KEY)

AWS Secret Key

region

string (default "") (env var name MONSTACHE_AWS_REGION)

AWS Region

change-stream-namespaces

[]string (default nil) (env var name MONSTACHE_CHANGE_STREAM_NS)

This option requires MongoDB 3.6 or above

This option allows you to opt in to using MongoDB change streams.
The namespaces included will be tailed using watch API. When this option is enabled the direct tailing of the oplog is disabled, therefore you do not need to specify additional regular expressions to filter the set of collections to watch.

If you are using MongoDB 4 or greater you can open a change stream against entire databases or even the entire deployment. To tail a database set the value of the namespace to the database name. For example, instead of db.collection the value would simply be db. To tail the entire deployment use an empty string as the namespace value. For example, change-stream-namespaces = [ '' ].

This option may be passed on the command line as ./monstache --change-stream-namespace test.foo

If specified as an environment variable the value should be namespaces separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_CHANGE_STREAM_NS=test.foo,test.bar

config-database-name

string (default monstache)

The name of the MongoDB database that monstache will store metadata under. This metadata includes information to support resuming from a specific point in the oplog and managing cluster mode. This database is only written to for some configurations. Namely, if you specify cluster-name, enable resume or set direct-read-stateful. WARNING: If you are listening to changes via change-stream-namespaces, you cannot set the same database to both listen to changes & store the configs in.

cluster-name

string (default "") (env var name MONSTACHE_CLUSTER)

When cluster-name is given monstache will enter a high availablity mode. Processes with cluster name set to the same value will coordinate. Only one of the processes in a cluster will sync changes. The other processes will be in a paused state. If the process which is syncing changes goes down for some reason one of the processes in paused state will take control and start syncing. See the section high availability for more information.

delete-index-pattern

string (default *)

When using a stateless delete strategy, set this to a valid Elasticsearch index pattern to restrict the scope of possible indexes that a stateless delete will consider. If monstache only indexes to index a, b, and c then you can set this to a,b,c. If monstache only indexes to indexes starting with mydb then you can set this to mydb*.

delete-strategy

int (default 0)

The strategy to use for handling document deletes when custom indexing is done in scripts.

Strategy 0 -default- will do a term query by document id across all Elasticsearch indexes in delete-index-pattern. Will only perform the delete if one single document is returned by the query.

Stategy 1 -deprecated- will store indexing metadata in MongoDB in the monstache.meta collection and use this metadata to locate and delete the document.

Stategy 2 will completely ignore document deletes in MongoDB.

direct-read-bounded

boolean (default false)

When this option is enabled monstache will ensure that all direct read queries have a min and max set on the query. This ensures that direct reads will complete and not chase new data that is being inserted while the cursor is being exhausted.

direct-read-concur

int (default 0)

This option allows you to control the number of namespaces in direct-read-namespaces which will be syncing concurrently. By default monstache starts reading and syncing all namespaces concurrently. If this places too much stress on MongoDB then you can set this option to an integer greater than 0. If you set it to 1, for example, then monstache will sync the collections serially. Numbers greater than 1 allow you to sync collections in batches of that size.

direct-read-dynamic-exclude-regex

string (default "") (env var name MONSTACHE_DIRECT_READ_NS_DYNAMIC_EXCLUDE_REGEX)

This option is only available in monstache v5 and v6.

This options allows you to exclude any collections that match the given regex when monstache is directed to dynamically register direct-read-namespaces. When direct read namespaces are explicit it is not used.

direct-read-dynamic-include-regex

string (default "") (env var name MONSTACHE_DIRECT_READ_NS_DYNAMIC_INCLUDE_REGEX)

This option is only available in monstache v5 and v6.

This options allows you to only include collections that match the given regex when monstache is directed to dynamically register direct-read-namespaces. When direct read namespaces are explicit it is not used.

direct-read-namespaces

[]string (default nil) (env var name MONSTACHE_DIRECT_READ_NS)

This option allows you to directly copy collections from MongoDB to Elasticsearch. Monstache allows filtering the data that is actually indexed to Elasticsearch, so you need not necessarily copy the entire collection.

Note

In monstache v5 and v6 you can use an array with a single empty string to direct monstache to dynamically discover your collections and perform direct reads on them. When direct-read-dynamic-exclude-regex is configured you can prune from the list that is discovered. System collections will not be considered for inclusion in the discovery.

Since the oplog is a capped collection it may only contain a subset of all your data. In this case you can perform a direct sync of Mongodb to Elasticsearch. To do this, set direct-read-namespaces to an array of namespaces that you would like to copy. Monstache will perform reads directly from the given set of db.collection and sync them to Elasticsearch.

Note

This option may be passed on the command line as ./monstache --direct-read-namespace test.foo --direct-read-namespace test.bar

If specified as an environment variable the value should be namespaces separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_DIRECT_READ_NS=test.foo,test.bar

Warning

When direct reads are enabled Monstache still processes change events while the direct reads are being performed.
It does not wait until direct reads are completed to start listening for changes. This is to ensure that any changes that occur during the direct read process get synchronized.

By default, Monstache maps a MongoDB collection named foo in a database named test to the test.foo index in Elasticsearch.

For maximum indexing performance when doing alot of a direct reads you might want to adjust the refresh interval during indexing on the destination Elasticsearch indices. The refresh interval can be set at a global level in elasticsearch.yml or on a per index basis by using the Index Settings or Index Template APIs.

For more information see Update Indices Settings.

By default, Elasticsearch refreshes every second. You will want to increase this value or turn off refresh completely during the indexing phase by setting the refresh_interval to -1. Remember to reset the refresh_interval to a positive value and do a force merge after the indexing phase has completed if you decide to temporarily turn off refresh, otherwise you will not be able to see the new documents in queries.

Another way to speed up bulk indexing is to set the number_of_replicas to 0 while indexing and then later increase the number of replicas. The following index template shows how one might configure a target index for better indexing throughput by controlling replicas and the refresh interval. The index template needs to be installed before running monstache.

{
  "index_patterns": ["test.*"],
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0,
    "refresh_interval": "30s"
  }
}

direct-read-no-timeout

boolean (default false)

When direct-read-no-timeout is true monstache will set the no cursor timeout flag on cursors opened for direct reads. The default is not to do this since having cursors without timeouts is not generally a good practice. However, for reading very large collections you may find it necessary to avoid cursor timeout errors. An alternative to enabling this setting is to increase the cursor timeout on your MongoDB server or look into using the direct-read-split-max and direct-read-concur options to limit the number of cursors opened for direct reads.

direct-read-split-max

int (default 9)

The maximum number of times to split a collection for direct reads. This setting greatly impacts the memory consumption of Monstache. When direct reads are performed, the collection is first broken up into ranges which are then read concurrently is separate go routines. If you increase this value you will notice the connection count increase in mongostat when direct reads are performed. You will also notice the memory consumption of Monstache grow. Increasing this value can increase the throughput for reading large collections, but you need to have enough memory available to Monstache to do so. You can decrease this value for a memory constrained Monstache process.

To disable collection splitting altogether, set this option to -1. In this case monstache will not try to segment the collection, but rather use a single cursor for the entire read.

direct-read-stateful

boolean (default false)

When this setting is set to true monstache will mark direct read namespaces as complete after they have been fully read in a collection named directreads in the monstache config database.

On subsequent restarts monstache will check this collection and only start direct reads for the namespaces not in the completed list.

This allow you to keep the list the direct read namespaces in the configuration but manage the list that has completed and should not be run again externally in MongoDB. Deleting the directreads collection and restarting monstache will force a full sync.

disable-change-events

boolean (default false)

When disable-change-events is true monstache will not listen to change events from the oplog or call watch on any collections. This option is only useful if you are using direct-read-namespaces to copy collections and would prefer not to sync change events.

disable-file-pipeline-put

boolean (default false)

This setting only applies to monstache versions 5 and 6.

When this option is true monstache will not attempt to auto create an ingest pipeline named attachment with a file field at startup when index-files is enabled. In this case the user must create the pipeline before running monstache. For example, the user must issue a command against Elasticsearch as follows prior to running monstache in order to index GridFS files:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract file information",
  "processors" : [
    {
      "attachment" : {
        "field" : "file",
        "indexed_chars" : -1
      }
    }
  ]
}

dropped-databases

boolean (default true)

When dropped-databases is false monstache will not delete the mapped indexes in Elasticsearch if a MongoDB database is dropped

dropped-collections

boolean (default true)

When dropped-collections is false monstache will not delete the mapped index in Elasticsearch if a MongoDB collection is dropped

elasticsearch-user

string (default "") (env var name MONSTACHE_ES_USER)

Optional Elasticsearch username for basic auth

elasticsearch-password

string (default "") (env var name MONSTACHE_ES_PASS)

Optional Elasticsearch password for basic auth

elasticsearch-urls

[]string (default [ "http://localhost:9200" ]) (env var name MONSTACHE_ES_URLS)

An array of URLs to connect to the Elasticsearch REST Interface

Note

This option may be passed on the command line as ./monstache --elasticsearch-url URL1 --elasticsearch-url URL2

If specified as an environment variable the value should be URLs separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_ES_URLS=http://es1:9200,http://es2:9200

elasticsearch-healthcheck-timeout-startup

int (default 15)

The number of seconds to wait on the initial health check to Elasticsearch to responed before giving up and exiting.

elasticsearch-healthcheck-timeout

int (default 5)

The number of seconds to wait for a post-initial health check to Elasticsearch to respond

elasticsearch-version

string (by default determined by connecting to the server)

When elasticsearch-version is provided monstache will parse the given server version to determine how to interact with the Elasticsearch API. This is normally not recommended because monstache will connect to Elasticsearch to find out which version is being used. This option is provided for cases where connecting to the base URL of the Elasticsearch REST API to get the version is not possible or desired.

elasticsearch-max-conns

int (default 4)

The size of the Elasticsearch HTTP connection pool. This determines the concurrency of bulk indexing requests to Elasticsearch. If you increase this value too high you may begin to see bulk indexing failures if the bulk index queue gets overloaded. To increase the size of the bulk indexing queue you can update the Elasticsearch config file:

thread_pool:
    bulk:
    queue_size: 200

For more information see Thread Pool.

You will want to tune this variable in sync with the elasticsearch-max-bytes option.

elasticsearch-retry

boolean (default false)

When elasticseach-retry is true a failed request to Elasticsearch will be retried with an exponential backoff policy. The policy is set with an initial timeout of 50 ms, an exponential factor of 2, and a max wait of 20 seconds. For more information on how this works see Back Off Strategy

elasticsearch-client-timeout

int (default 0)

The number of seconds before a request to Elasticsearch times out. A setting of 0, the default, disables the timeout.

elasticsearch-max-docs

int (default -1)

When elasticsearch-max-docs is given a bulk index request to Elasticsearch will be forced when the buffer reaches the given number of documents.

Warning

It is not recommended to change this option but rather use elasticsearch-max-bytes instead since the document count is not a good gauge of when to flush. The default value of -1 means to not use the number of docs as a flush indicator.

elasticsearch-max-bytes

int (default 8MB as bytes)

When elasticsearch-max-bytes is given a bulk index request to Elasticsearch will be forced when a connection buffer reaches the given number of bytes. This setting greatly impacts performance. A high value for this setting will cause high memory monstache memory usage as the documents are buffered in memory.

Each connection in elasticsearch-max-conns will flush when its queue gets filled to this size.

elasticsearch-max-seconds

int (default 1)

When elasticsearch-max-seconds is given a bulk index request to Elasticsearch will be forced when a request has not been made in the given number of seconds. The default value is automatically increased to 5 when direct read namespaces are detected. This is to ensure that flushes do not happen too often in this case which would cut performance.

elasticsearch-pem-file

string (default "") (env var name MONSTACHE_ES_PEM)

When elasticsearch-pem-file is given monstache will use the given file path to add a local certificate to x509 cert pool when connecting to Elasticsearch. This should only be used when Elasticsearch is configured with SSL enabled.

elasticsearch-pki-auth

TOML table (default nil)

Used to configure client to use PKI user auth for Elasticsearch

cert-file

string (default "") (env var name MONSTACHE_ES_PKI_CERT)

Path to the cert file e.g. the --cert argument to curl

key-file

string (default "") (env var name MONSTACHE_ES_PKI_KEY)

Path to the key file e.g. the --key argument to curl

elasticsearch-validate-pem-file

boolean (default true) (env var name MONSTACHE_ES_VALIDATE_PEM)

When elasticsearch-validate-pem-file is false TLS will be configured to skip verification

enable-easy-json

boolean (default false)

When enable-easy-json is true monstache will the easy-json library to serialize requests to Elasticsearch

enable-http-server

boolean (default false)

Add this flag to enable an embedded HTTP server at localhost:8080

enable-oplog

boolean (default false)

This option only applies to monstache v5 and v6. Enabling it turns on change event emulation feature that tails the MongoDB oplog directly. It should only be turned on when pairing monstache v5 or v6 with a MongoDB server at a server compatibility version less than 3.6.

enable-patches

boolean (default false)

Set to true to enable storing rfc7396 patches in your Elasticsearch documents

env-delimiter

string (default ,)

This option is only supported on the command line. The value for this delimiter will be used to split environment variable values when the environment variable is used in conjunction with an option of array type. E.g. with export MONSTACHE_DIRECT_READ_NS=test.test,foo.bar.

exit-after-direct-reads

boolean (default false)

The direct-read-namespaces option gives you a way to do a full sync on multiple collections. At times you may want to perform a full sync via the direct-read-namespaces option and then quit monstache. Set this option to true and monstache will exit after syncing the direct read collections instead of continuing to tail the oplog. This is useful if you would like to run monstache to run a full sync on a set of collections via a cron job.

fail-fast

boolean (default false)

When fail-fast is true, if monstache receives a failed bulk indexing response from Elasticsearch, monstache will log the request that produced the response as an ERROR and then exit immediately with an error status. Normally, monstache just logs the error and continues processing events.

If monstache has been configured with elasticsearch-retry true, a failed request will be retried before being considered a failure.

file-downloaders

int (default 10)

Number of go routines concurrently processing GridFS files when file index-files is turned on.

file-highlighting

boolean (default false)

When file-highlighting is true monstache will enable the ability to return highlighted keywords in the extracted text of files for queries on files which were indexed in Elasticsearch from gridfs.

file-namespaces

[]string (default nil) (env var name MONSTACHE_FILE_NS)

The file-namespaces config must be set when index-files is enabled. file-namespaces must be set to an array of MongoDB namespace strings. Files uploaded through gridfs to any of the namespaces in file-namespaces will be retrieved and their raw content indexed into Elasticsearch via either the mapper-attachments or ingest-attachment plugin.

Note

This option may be passed on the command line as ./monstache --file-namespace test.foo --file-namespace test.bar

If specified as an environment variable the value should be namespaces separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_FILE_NS=test.foo,test.bar

filter

[] array of TOML table (default nil)

When filter is given monstache will pass the MongoDB document from an insert or update operation into the filter function immediately after it is read from the oplog. Return true from the function to continue processing the document or false to completely ignore the document. See the section Middleware for more information.

namespace

string (default "")

The MongoDB namespace, db.collection, to apply the script to. If you omit namespace the filter function will be applied to all documents.

script

string (default "")

An inline script. You can use TOML multiline syntax here. The function should take 2 arguments, a doc and a namespace, and return true/false to include or filter the document.

path

string (default "")

The file path to load a script from. Use this or an inline script but not both. Can be a path relative to the directory monstache is executed from or an absolute path.

graylog-addr

string (default "") (env var name MONSTACHE_GRAYLOG_ADDR)

The address of a graylog server to redirect logs to in GELF

gtm-settings

TOML table (default nil)

The following gtm configuration properties are available. See gtm for details

channel-size

int (default 512)

Controls the size of the go channels created for processing events. When many events are processed at once a larger channel size may prevent blocking in gtm.

buffer-size

int (default 32)

Determines how many documents are buffered by a gtm worker go routine before they are batch fetched from MongoDB. When many documents are inserted or updated at once it is better to fetch them together.

buffer-duration

string (default 75ms)

A string representation of a golang duration. Determines the maximum time a buffer is held before it is fetched in batch from MongoDB and flushed for indexing.

max-await-time

string (default "")

A string represetation of a golang duration, e.g. "10s". If set, will be converted and passed at the maxAwaitTimeMS option for change streams. This determines the maximum amount of time in milliseconds the server waits for new data changes to report to the change stream cursor before returning an empty batch.

gzip

boolean (default false)

When gzip is true, monstache will compress requests to Elasticsearch. If you enable gzip in monstache and are using Elasticsearch prior to version 5 you will also need to update the Elasticsearch config file to set http.compression: true. In Elasticsearch version 5 and above http.compression is enabled by default. Enabling gzip compression is recommended if you enable the index-files setting.

http-server-addr

string (default :8080) (env var name MONSTACHE_HTTP_ADDR)

The address to bind the embedded HTTP server on if enabled

index-as-update

boolean (default false)

When index-as-update is set to true monstache will sync create and update operations in MongoDB as updates to Elasticsearch. This does not change the fact that Monstache always sends an entire copy of the data in MongoDB. It just means that any existing non-overlapping fields in Elasticsearch will be maintained.

By default, monstache will overwrite the entire document in Elasticsearch. This setting may be useful if you make updates to Elasticsearch to the documents monstache has previously synced out of band and would like to retain these updates when the document changes in MongoDB. You will only be able to retain fields in Elasticsearch that do not overlap with fields in MongoDB.

When this setting is turned on some guarantees about the order of operations applied in Elasticsearch are lost. The reason for this is that the version field cannot be set with this enabled. The version field by default is set to the timestamp of the event in MongoDB. Elasticsearch will only apply changes if the version number is greater or equal to the last value indexed maintaining serialization.

If you enable this setting and do not see serialized updates in MongoDB being indexed correctly then you can mitigate this problem with the following settings:

elasticsearch-max-conns = 1

[gtm-settings]
buffer-size = 2048
buffer-duration = 4s

index-files

boolean (default false)

When index-files is true monstache will index the raw content of files stored in GridFS into Elasticsearch as an attachment type. By default index-files is false meaning that monstache will only index metadata associated with files stored in GridFS. In order for index-files to index the raw content of files stored in GridFS you must install a plugin for Elasticsearch. For versions of Elasticsearch prior to version 5, you should install the mapper-attachments plugin. In version 5 or greater of Elasticsearch the mapper-attachment plugin is deprecated and you should install the ingest-attachment plugin instead. For further information on how to configure monstache to index content from GridFS, see the section GridFS support.

index-oplog-time

boolean (default false)

If this option is set to true monstache will include 2 automatic fields in the source document indexed into Elasticsearch. The first is oplog_ts which is the timestamp for the event copied directly from the MongoDB oplog. The second is oplog_date which is an Elasticsearch date field corresponding to the time of the same event.

This information is generally useful in Elasticsearch giving the notion of last updated. However, it's also valuable information to have for failed indexing requests since it gives one the information to replay from a failure point. See the option resume-from-timestamp for information on how to replay oplog events since a given event occurred.

For data read via the direct read feature the oplog time will only be available if the id of the MongoDB document is an ObjectID. If the id of the MongoDB document is not an ObjectID and the document source is a direct read query then the oplog time will not be available.

index-stats

boolean (default false)

When both stats and index-stats are true monstache will write statistics about its indexing progress in Elasticsearch instead of standard out.

The indexes used to store the statistics are time stamped by day and prefixed monstache.stats.. E.g. monstache.stats.2017-07-01 and so on.

As these indexes will accrue over time your can use a tool like curator to prune them with a Delete Indices action and an age filter.

logs

TOML table (default nil) (env var name MONSTACHE_LOG_DIR)

Allows writing logs to a file using a rolling appender instead of stdout. Supply a file path for each type of log you would like to send to a file. When the MONSTACHE_LOG_DIR environment variable is used then a log file for each log level will be generated in the given directory.

info

string (default "")

The file path to write info level logs to

warn

string (default "")

The file path to write warning level logs to

error

string (default "")

The file path to write error level logs to

trace

string (default "")

The file path to write trace level logs to. Trace logs are enabled via the verbose option.

stats

string (default "")

The file path to write indexing statistics to. Stats logs are enabled via the stats option.

log-rotate

TOML table (default nil)

Use to configure how log files are rotated/managed when logging to files. These options are passed through to the lumberjack logger.

max-size

int (default 500) (env var name MONSTACHE_LOG_MAX_SIZE)

MaxSize is the maximum size in megabytes of the log file before it gets rotated.

max-age

int (default 28) (env var name MONSTACHE_LOG_MAX_AGE)

MaxAge is the maximum number of days to retain old log files based on the timestamp encoded in their filename. Note that a day is defined as 24 hours and may not exactly correspond to calendar days due to daylight savings, leap seconds, etc. Use a value of zero to ignore the age of files.

max-backups

int (default 5) (env var name MONSTACHE_LOG_MAX_BACKUPS)

MaxBackups is the maximum number of old log files to retain. Use a value of zero to retain all old log files (though MaxAge may still cause them to get deleted.)

localtime

boolean (default false)

LocalTime determines if the time used for formatting the timestamps in backup files is the computer's local time. The default is to use UTC time.

compress

boolean (default false)

Compress determines if the rotated log files should be compressed using gzip. The default is not to perform compression.

mapper-plugin-path

string (default "")

The path to an .so file golang plugin.

mapping

[] array of TOML table (default nil)

When mapping is given monstache will be directed to override the default index and type assigned to documents in Elasticsearch. See the section Index Mapping for more information.

namespace

string (default "")

The MongoDB namespace, db.collection, to apply the mapping to.

index

string (default "same as namespace including the dot. e.g. test.test")

Allows you to override the default index that monstache will send documents to. By default, the index is the same as the MongoDB namespace.

type

string (default "_doc for ES 6.2+ and the name of the MongoDB collection otherwise")

Allows you to override the default type that monstache will index documents with. Overriding the type is not recommended for Elasticsearch version 6.2+.

pipeline

string (default "")

The name of an existing Elasticsearch pipeline to index the data with. A pipeline is a series of Elasticsearch processors to be executed. An Elasticsearch pipeline is one way to transform MongoDB data before indexing.

max-file-size

int (default 0)

When max-file-size is greater than 0 monstache will not index the content of GridFS files that exceed this limit in bytes.

merge-patch-attribute

string (default json-merge-patches)

Customize the name of the property under which merge patches are stored

mongo-url

string (default localhost) (env var name MONSTACHE_MONGO_URL)

The URL to connect to MongoDB which must follow the Standard Connection String Format

For sharded clusters this URL should point to the mongos router server and the mongo-config-url option must be set to point to the config server.

mongo-config-url

string (default "") (env var name MONSTACHE_MONGO_CONFIG_URL)

This config must only be set for sharded MongoDB clusters. Has the same syntax as mongo-url. This URL must point to the MongoDB config server.

Monstache will read the list of shards using this connection and then setup a listener to react to new shards being added to the cluster at a later time. It will then setup a new direct connection to each shard to listen for events.

Setting the mongo-config-url is not necessary if you are using change-stream-namespaces.

mongo-pem-file

string (default "") (env var name MONSTACHE_MONGO_PEM)

This setting only applies to the mgo driver in monstache versions 3 and 4. The driver in monstache 5 and 6 uses the connection string for all settings.

When mongo-pem-file is given monstache will use the given file path to add a local certificate to x509 cert pool when connecting to MongoDB. This should only be used when MongoDB is configured with SSL enabled.

mongo-validate-pem-file

boolean (default true) (env var name MONSTACHE_MONGO_VALIDATE_PEM)

This setting only applies to the mgo driver in monstache versions 3 and 4. The driver in monstache 5 and 6 uses the connection string for all settings.

When mongo-validate-pem-file is false TLS will be configured to skip verification

mongo-oplog-database-name

string (default local) (env var name MONSTACHE_MONGO_OPLOG_DB)

When mongo-oplog-database-name is given monstache will look for the MongoDB oplog in the supplied database

mongo-oplog-collection-name

string (default oplog.rs) (env var name MONSTACHE_MONGO_OPLOG_COL)

When mongo-oplog-collection-name is given monstache will look for the MongoDB oplog in the supplied collection. The collection defaults to oplog.rs which is what will be produced when replica sets are enabled. If you are using an old version of MongoDB with master based replication instead of replica sets, then you will need to configure this setting to oplog.$main.

Warning

If this setting was not supplied monstache would previously search for the first collection prefixed oplog in the local database. However, starting in monstache v4.13.1 and v3.20.1 this behavior has changed. Now, monstache will not do dynamic resolution. Since master based replication in MongoDB is no longer supported, monstache now defaults to oplog.rs and will only use another collection (e.g. oplog.$main) if you explicitly config it to do so.

mongo-dial-settings

TOML table (default nil)

This setting only applies to the mgo driver in monstache versions 3 and 4. The driver in monstache 5 and 6 uses the connection string for all settings.

The following MongoDB dial properties are available. Timeout values of 0 disable the timeout.

ssl

bool (default false)

Set to true to establish a connection using TLS.

timeout

int (default 15)

Seconds to wait when establishing an initial connection to MongoDB before giving up

read-timeout

int (default 30)

Seconds to wait when reading data from MongoDB before giving up. Must be greater than 0. This should be greater than 10 because Monstache waits 10s for new change events by retrying the query.

write-timeout

int (default 30)

Seconds to wait when writing data to MongoDB before giving up. Must be greated than 0. This should be greater than 10 because Monstache waits 10s for new change events by retrying the query.

mongo-session-settings

TOML table (default nil)

This setting only applies to the mgo driver in monstache versions 3 and 4. The driver in monstache 5 and 6 uses the connection string for all settings.

The following MongoDB session properties are available. Timeout values of 0 disable the timeout.

socket-timeout

int (default 0)

Seconds to wait for a non-responding socket before it is forcefully closed

sync-timeout

int (default 30)

Amount of time in seconds an operation will wait before returning an error in case a connection to a usable server can't be established. Must be greater than 0.

mongo-x509-settings

TOML table (default nil)

This setting only applies to the mgo driver in monstache versions 3 and 4. The driver in monstache 5 and 6 uses the connection string for all settings.

Allows one to configure x509 authentication with MongoDB. For more information see x509 auth.

Note

You must configure your mongo-url with the request parameter authMechanism=MONGODB-X509. You must also supply both of the following file paths:

client-cert-pem-file

string (default "")

The path to a PEM encoded file containing the client cert

client-key-pem-file

string (default "")

The path to a PEM encoded file containing the client key

namespace-drop-exclude-regex

regex (default "") (env var name MONSTACHE_NS_DROP_EXCLUDE_REGEX)

When namespace-drop-exclude-regex is given this regex is tested against the namespace, database.collection, of drops in MongoDB. For database drops the namespace will be database-name.$cmd. For collections drops the namespace will be database-name.collection-name. If the regex does not match the namespace then the operation will by synced.

namespace-drop-regex

regexp (default "") (env var name MONSTACHE_NS_DROP_REGEX)

When namespace-drop-regex is given this regex is tested against the namespace, database.collection, of drops in MongoDB. For database drops the namespace will be database-name.$cmd. For collections drops the namespace will be database-name.collection-name. If the regex matches the namespace then the operation will by synced.

namespace-exclude-regex

regex (default "") (env var name MONSTACHE_NS_EXCLUDE_REGEX)

When namespace-exclude-regex is given this regex is tested against the namespace, database.collection, of any insert, update, delete in MongoDB. If the regex matches monstache ignores the event, otherwise it continues processing event filters. By default monstache processes events in all databases and all collections with the exception of the reserved database monstache, any collections suffixed with .chunks, and the system collections. For more information see the section Namespaces.

namespace-regex

regexp (default "") (env var name MONSTACHE_NS_REGEX)

When namespace-regex is given this regex is tested against the namespace, database.collection, of any insert, update, delete in MongoDB. If the regex matches monstache continues processing event filters, otherwise it drops the event. By default monstache processes events in all databases and all collections with the exception of the reserved database monstache, any collections suffixed with .chunks, and the system collections. For more information see the section Namespaces.

oplog-date-field-format

string (default 2006/01/02 15:04:05)

Use this option to override the layout for formatting the oplog_date field. Refer to the Format function for the reference time values to use in the layout.

oplog-date-field-name

string (default oplog_date)

Use this option to override the name of the field used to store the oplog date string

oplog-ts-field-name

string (default oplog_ts)

Use this option to override the name of the field used to store the oplog timestamp

patch-namespaces

[]string (default nil) (env var name MONSTACHE_PATCH_NS)

An array of MongoDB namespaces that you would like to enable rfc7396 patches on

Note

This option may be passed on the command line as ./monstache --patch-namespace test.foo --patch-namespace test.bar

If specified as an environment variable the value should be namespaces separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_PATCH_NS=test.foo,test.bar

pipeline

[] array of TOML table (default nil)

When pipeline is given monstache will call the function specified to determine an array of aggregation pipeline stages to run. See the section Middleware for more information.

namespace

string (default "")

The MongoDB namespace, db.collection, to apply the script to. If you omit the namespace the pipeline function with be applied to all namespaces.

script

string (default "")

An inline script. You can use TOML multiline syntax here. The function should take 2 arguments, a namespace and a boolean indicating whether or not the data is a change stream. The function should return an array of aggregation pipeline stages. Note, for change streams the root of the pipeline will be the change event with a field fullDocument representing the changed doc. You should alter your pipeline stages according to this boolean. Monstache needs the change event data so do not replace the root of the document in your pipeline for change streams.

path

string (default "")

The file path to load a script from. Use this or an inline script but not both. Can be a path relative to the directory monstache is executed from or an absolute path.

pipe-allow-disk

boolean (default false)

Add this flag to allow MongoDB to use the disk as a temporary store for data during aggregation pipelines

post-processors

int (default 10)

Number of go routines concurrently calling the Process method in any golang middleware plugins installed via mapper-plugin-path.

pprof

boolean (default false)

When pprof is true and the http server is enabled, monstache will make profiling information available.

See Profiling for Go for more information.

boolean (default false)

When print-config is true monstache will print its configuration and then exit

prune-invalid-json

boolean (default false)

If your MongoDB data contains values like +Infinity, -Infinity, NaN, or invalid dates you will want to set this option to true. The Golang json serializer is not able to handle these values and the indexer will get stuck in an infinite loop. When prune-invalid-json is set to true Monstache will drop those fields so that indexing errors do not occur.

relate

[] array of TOML table (default nil)

Allows one to relate 2 namespaces together such that a change to one causes a sync of the associated namespace

namespace

string (default "")

The namespace of the collection that, when modified, triggers a sync of the with-namespace

with-namespace

string (default "")

The namespace of the collection or view that will be synced when namespace changes

src-field

string (default "_id")

The name of the field in namespace that will be extracted from the change doc and used as the value side of the query into with-namespace

match-field

string (default "_id")

The name of the field in with-namespace that will be used as the field name to match side of the query into with-namespace

match-field-type

string (default "")

Valid values for this property are objectId, string, int, long and decimal. If the property is given the value of the src-field will be converted into the type given (if possible) before being used to query against match-field.

keep-src

bool (default "false")

Whether or not to sync the original change event in addition to the one looked up in with-namespace. By default the original change is ignored and only the document from with-namespace is synced.

max-depth

int (default 0)

If max-depth is greater than 0 then the relationship will only fire if the number of relationships between this relate and the originating event is less than or equal to the value given. By default monstache will continue following relationships until none are left.

dot-notation

bool (default "false")

If match-field is a nested field like foo.bar then setting dot-notation to true produces the query { "foo.bar": 1 } to MongoDB. If dot-notation is not enabled then the query will be an exact match query sent as { foo { bar : 1 } }.

relate-buffer

int (default 1000)

Number of relate events allowed to queue up before skipping the event and reporting an error. This setting was introduced to prevent scenarios where relate queries get queued up and stall the pipeline. You can increase this value if you are hitting this limit, but monstache will take more memory to hold the events and you may find that MongoDB experiences high CPU due to Monstache performing many queries concurrently to try to clear the buffer. This limit is usually hit when you have a relate config and then do a mass insert or update against the relate namespace.

relate-threads

int (default 10)

Number of go routines concurrently processing relationships when relate is enabled. This dictates the concurrency of queries trying to unload the relate queue.

replay

boolean (default false)

Warning

Replay is currently deprecated in favor of direct-read-namespaces. Replay may be removed in a future release.

When replay is true, monstache replays all events from the beginning of the MongoDB oplog and syncs them to Elasticsearch.

If you've previously synced Monstache to Elasticsearch you may see many WARN statments in the log indicating that there was a version conflict. This is normal during a replay and it just means that you already have data in Elasticsearch that is newer than the point in time data from the oplog.

When resume and replay are both true, monstache replays all events from the beginning of the MongoDB oplog, syncs them to Elasticsearch and also writes the timestamps of processed events to monstache.monstache.

When neither resume nor replay are true, monstache reads the last timestamp in the oplog and starts listening for events occurring after this timestamp (tails starting at the end). Timestamps are not written to monstache.monstache. This is the default behavior.

resume

boolean (default false)

When resume is true, monstache writes the timestamp of MongoDB operations it has successfully synced to Elasticsearch to the collection monstache.monstache. It also reads that timestamp from that collection when it starts in order to replay events which it might have missed because monstache was stopped. If monstache is started with the cluster-nameoption set then resume is automatically turned on.

resume-from-timestamp

int64 (default 0)

This option only applies when the resume-strategy is 0 for timestamp based resume.

When resume-from-timestamp (a 64 bit timestamp where the high 32 bytes represent the number of seconds since epoch and the low 32 bits represent an offset within a second) is given, monstache will sync events starting immediately after the timestamp. This is useful if you have a specific timestamp from the oplog and would like to start syncing from after this event.

If you supply an integer that is greater than 0 but less than or equal to the max value of a 32-bit integer then monstache will interpret the value as seconds since the epoch and automatically shift the value 32 bits left.

resume-name

string (default default)

monstache uses the value of resume-name as an id when storing and retrieving timestamps to and from the MongoDB collection monstache.monstache. The default value for this option is the string default. However, there are some exceptions. If monstache is started with the cluster-name option set then the name of the cluster becomes the resume-name. This is to ensure that any process in the cluster is able to resume from the last timestamp successfully processed. Another exception occurs when worker is enabled. In that case the worker name becomes the resume-name.

resume-strategy

int (default 0)

The strategy to use for resuming streams from previous runs of monstache. Only applies when resume is enabled. This strategy is also used when cluster-name is set to ensure streams are resumed when the active process in the cluster switches.

Strategy 0 -default- Timestamp based resume of change streams. Compatible with MongoDB API 4.0+.

Stategy 1 Token based resume of change streams. Compatible with MongoDB API 3.6+.

Timestamps and tokens are written periodically to the database configured in config-database-name. Timestamps are written to the collection named monstache. Tokens are written to the collection named tokens.

resume-write-unsafe

boolean (default false)

When resume-write-unsafe is true monstache sets the safety mode of the MongoDB session such that writes are fire and forget. This speeds up writing of timestamps used to resume synching in a subsequent run of monstache. This speed up comes at the cost of no error checking on the write of the timestamp. Since errors writing the last synched timestamp are only logged by monstache and do not stop execution it's not unreasonable to set this to true to get a speedup.

routing-namespaces

[]string (default nil)

You only need to set this configuration option if you use golang and javascript plugins are do custom routing: override parent or routing attributes. This array should be set to a list of all the namespaces that custom routing is done on. This ensures that deletes in MongoDB are routed correctly to Elasticsearch.

script

[] array of TOML table (default nil)

When script is given monstache will pass the MongoDB document into the script before indexing into Elasticsearch. See the section Middleware for more information.

namespace

string (default "")

The MongoDB namespace, db.collection, to apply the script to. If you omit the namespace the mapping function with be applied to all documents.

routing

boolean (default false)

Set routing to true if you override the index, routing or parent metadata via _meta_monstache

script

string (default "")

An inline script. You can use TOML multiline syntax here. The function should take 2 arguments, a doc and a namespace, and return a modified doc. You can also return true to index the original document or false to ignore the document and schedule any previous documents with the same id for deletion.

path

string (default "")

The file path to load a script from. Use this or an inline script but not both. Can be a path relative to the directory monstache is executed from or an absolute path.

stats

boolean (default false)

When stats is true monstache will periodically print statistics accumulated by the indexer

stats-duration

string (default 30s)

Sets the duration after which statistics are printed if stats is enabled

stats-index-format

string (default monstache.stats.2006-01-02)

The time.Time supported index name format for stats indices. By default, stats indexes are partitioned by day. To use less indices for stats you can shorten this format string (e.g monstache.stats.2006-01) or remove the time component completely to use a single index.

time-machine-namespaces

[]string (default nil) (env var name MONSTACHE_TIME_MACHINE_NS)

Monstache is good at keeping your MongoDB collections and Elasticsearch indexes in sync. When a document is updated in MongoDB the corresponding document in Elasticsearch is updated too. Same goes for deleting documents in MongoDB. But what if you also wanted to keep a log of all the changes to a MongoDB document over its lifespan. That's what time-machine-namespaces are for. When you configure a list of namespaces in MongoDB to add to the time machine, in addition to keeping documents in sync, Monstache will index of copy of your MongoDB document at the time it changes in a separate timestamped index.

Say for example, you insert a document into the test.test collection in MongoDB. Monstache will index by default into the test.test index in Elasticsearch, but with time machines it will also index it into log.test.test.2018-02-19. When it indexes it into the time machine index it does so without the id from MongoDB and lets Elasticsearch generate a unique id. But, it stores the id from MongoDB in the source field _source_id. Also, it adds _oplog_ts and _oplog_date fields on the source document. These correspond to the timestamp from the oplog when the data changed in MongoDB. Finally, it routes the document by the MongoDB id so that you can speed up queries later to find changes to a doc.

This lets you do some cool things but mostly you'll want to sort by _oplog_date and filter by _source_id to see how documents have changed over time.

Because the indexes are timestamped you can drop then after a period of time so they don't take up space. If you just want the last couple of days of changes, delete the indexes with the old timestamps. Elastic curator is your friend here.

Note

This option may be passed on the command line as ./monstache --time-machine-namespace test.foo --time-machine-namespace test.bar

If specified as an environment variable the value should be namespaces separated only by the env-delimiter which defaults to a comma. E.g. MONSTACHE_TIME_MACHINE_NS=test.foo,test.bar

time-machine-index-prefix

string (default log)

If you have enabled time machine namespaces and want to change the prefix assigned to the index names use this setting.

time-machine-index-suffix

string (default 2006-01-02)

If you have enabled time machine namespaces and want to suffix the index names using a different date format use this setting. Consult the golang docs for how date formats work. By default this suffixes the index name with the year, month, and day.

time-machine-direct-reads

boolean (default false)

This setting controls whether or not direct reads are added to the time machine log index. This is false by default so only changes read from the oplog are added.

tpl

boolean (default false)

This option is only supported on the command line. When the tpl mode is turned on then any config file passed via -f will be interpreted and executed as a golang template before being loaded. The template will have access to all environment variables. The environment variables will be passed as a map to the template. The env map can be accessed as the dot . symbol in the golang template and values from the map obtained using the index function.

For example, the given an environment variable THRESHOLD, then with -tpl -f config.toml the config.toml might contain...

[[script]]
namespace = "mydb.mycollection"
script = """
module.exports = function(doc) {
    if ( doc.score > {{index . "THRESHOLD"}} ) {
      doc.important = true;
    }
    return doc;
}
"""

verbose

boolean (default false)

When verbose is true monstache with enable debug logging including a trace of requests to Elasticsearch

worker

string (default "") (env var name MONSTACHE_WORKER)

When worker is given monstache will enter multi-worker mode and will require you to also provide the config option workers. Use this mode to run multiple monstache processes and distribute the work between them. In this mode monstache will ensure that each MongoDB document id always goes to the same worker and none of the other workers. See the Workers section for more information.

workers

[]string (default nil)

An array of worker names to be used in conjunction with the worker option.

Note

This option may be passed on the command line as ./monstache --workers w1 --workers w2