Search Index Jobs¶
When you handle updates and upgrades of the Dataverse application or rollout your custom metadata schema (blocks), you will need to take care of your search index, based on Solr.
Inplace Re-Indexing¶
There are two main reasons when you might need to rebuild your search index:
Sometimes upgrading to a new Dataverse version, the Solr configuration has been changed by upstream. In these cases, release notes will advise you to do an “inplace reindex”.
You changed your metadata schema, renamed fields, changed type etc. A data migration is not possible for our index, instead we need to rebuild it.
For your convenience, a batch job has been added, containing all actions mentioned
in the docs. Simply deploy it during off-hours (or fork and create a CronJob
):
kubectl create -f https://gitcdn.link/repo/IQSS/dataverse-kubernetes/release/k8s/dataverse/jobs/inplace-reindex.yaml
Hint
Beware, this type of re-index does not guarantee for a clean index. See upstream index guide.
Update Solr schema with custom metadata fields¶
The Solr container comes with a default index configuration,
supporting the upstream metadata schemas.
This configuration resides in ${$COLLECTION_DIR}/conf
(see also
important directories of the image).
Dataverse provides an API endpoint to retrieve a Solr schema configuration fitting the metadata
schemas present in your Dataverse installation. We use a forked version of the
upstream script
at $SCRIPT_DIR/schema/update.sh
to generate an updated configuration and reload Solr.
Important
Most likely you will need to do Inplace Re-Indexing after deploying new schemas. Many, if not all schema changes will also require a rebuild of your index.
… gracefully when starting Solr¶
As the Solr index configuration is not persisted, but loaded from Dataverse, we need to ask Dataverse for it when Solr starts. This is done via an init container.
This is done gracefully with a fallback to the default upstream metadata. Unless you change those, worst case is loosing searchability of custom metadata when configuration is not available during startup.
Hint
To understand the above, please keep in mind that init, sidecar and
main Solr container share /schema
via emptyDir
volume.
… when updating metadata schemas¶
A sidecar container of Solr Pod
, executed by a webhook. This webhook is
fired by the metadata update Job
for you, once
metadata blocks have been uploaded.
Hint
To understand the above, please keep in mind that init, sidecar and
main Solr container share /schema
via emptyDir
volume.
See also
Webhooks implemented using https://github.com/adnanh/webhook and extendable if necessary.