Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  3 days ago
Are there any checks that can be done in MetaDB database to confirm that a resync was successful?https://metadb-project.slack.com/archives/C03ETMG3TGX/p1708107064841009?thread_ts=1708100301.261019&cid=C03ETMG3TGXFrom previous thread - we had a warning during the endsync - is it ok to run endsync with this type of warning?

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 34% of current records have not been confirmed by the new snapshot. The unconfirmed records will be marked as deleted. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


(From Nassib) This warning is not always a problem.  It means that a lot of the current data in Metadb have not (so far) been confirmed by the new snapshot.This could happen if many records were deleted in the source (FOLIO) during the time when the databases were not synchronized.But typically it indicates that a lot of snapshot data have not yet been read from Kafka.  If this is the reason for the warning, then running endsync would remove those records in Metadb until they are eventually streamed.  So the records would temporarily be unavailable in the current tables in Metadb.


  1 day ago
Ok thanks @nassib(Carole) In the case of a resync -- should counts from these 2 queries alignIn MetaDB:
SELECT count(*) FROM folio_inventory.instance__ where __current = true;In FOLIO
SELECT count(*) FROM tenant_mod_inventory_storage.instance;

(Nassib) Yes


(Carole) Also with each resync  - is it expected that the count of records in Metadb tables will increase by the number of records in the corresponding FOLIO table?
  1 day ago

(1) Yes.
(2) The Nassib) The counts should be the same when the tables are synchronized.
  1 day ago

(Carole) Ok -- thanksAfter the initial sync process - observed these 2 queries aligned
(1)SELECT count(*) FROM folio_inventory.instance__
(2)SELECT count(*) FROM tenant_mod_inventory_storage.instance

After instanceAfter a resync -  observed results from (1) was ~2X the size of (2) and I needed to filter by _current to get counts that more alignedSafe aligned

Safe to assume the filtering by _current is needed?
  1 day ago

(Nassib) Yes, non-current records are not retained by FOLIO.


  1 day ago
(Carole) OK, thanks -- understood
  1 day ago
thanks 

(Nassib) In other words, FOLIO discards the non-current records but Metadb retains them.  So we can only compare counts of the current records.


More on Resync

(Carole) When performing a resync of a data source in MetaDB
https://d1f3dtrg62pav.cloudfront.net/doc/#_resynchronizing_a_data_sourceShould the associated Debezum Connector snapshot.mode be set to something other than initial?For example - Is an ad-hoc snapshot or signalling tables (described below) required for the MetaDB resync process?
https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-ad-hoc-snapshotsOr can the same process as followed for the initial sync be assumed?


(John) Hi Carole - I only use the default which is initial . Essentially,  take the snapshot and then go into streaming update mode.   

My Debezium configs typically look like this (v1.8 or v1.9):

{
"name": "<connector_name>",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "<db name>",
"database.hostname": "<db host>",
"database.password": "<db user password>",
"database.port": "5432",
"database.server.name": "<prepend to kafka topic>",
"schema.exclude.list": ".*_mod_login,.*_mod_pubsub,public,.*pubsub_config,supertenant_mod_.*,.*_mod
_kb_ebsco_java,.*__system",
"table.exclude.list": ".*_mod_agreements\\.entitlement_tag,.*_mod_agreements\\.erm_resource_tag,.*_
mod_agreements\\.string_template,.*_mod_agreements\\.string_template_scopes,.*_mod_agreements\\.templat
ed_url,.*_mod_data_export_worker\\.batch_job_execution_params,.*_mod_oai_pmh\\.instances,.*_mod_remote_
storage\\.original_locations,.*_mod_remote_storage\\.item_notes,.*_mod_service_interaction\\.dashboard_
access,.*app_setting,.*alternate_name,.*databasechangelog,.*databasechangeloglock,.*directory_entry_tag
,.*license_document_attachment,.*license_supp_doc,.*license_tag,.*log_entry_additional_info,.*subscript
ion_agreement_supp_doc,.*subscription_agreement_document_attachment,.*subscription_agreement_ext_lic_do
c,.*subscription_agreement_tag,.*tenant_changelog,.*tenant_changelog_lock,.*marc_indexers.*,.*rmb_inter
nal.*,.*rmb_job.*",
"database.user": "<db user>",
"heartbeat.interval.ms": "30000",
"heartbeat.action.query": "UPDATE id_dbz.heartbeat_query set last_heartbeat_ts = now();",
"plugin.name": "pgoutput",
"publication.autocreate.mode": "filtered",
"tasks.max": "1",
"truncate.handling.mode": "include"
}
}




(Carole) We are running version 2.3.4 of Debezium (which was the latest at the time first setting things up) -- assuming its ok to run a version 2.X?


(John) I don’t see why not as long as it’s compatible with your version of Kafka and Postgresql.  Personally,  though,  I haven’t tried 2.x quite yet.   We are still using Kafka 2.8 and so 1.8 and 1.9 have suits our needs just fine.   I have it on my list to test with 2.x at some point soon.