Background
Article
A Deep Dive into SonarX's Data Completeness Checks
// In the dynamic world of blockchain, SonarX prioritizes data accuracy and completeness through a meticulous Data Quality process, overcoming challenges like network latency and data format variations to provide reliable insights for users.
A Deep Dive into SonarX's Data Completeness Checks

SonarX’s state-of-the-art architecture and stringent Data Quality (DQ) processes make us the golden source of truth for blockchain data analytics. Our commitment to providing the most comprehensive blockchain is demonstrated in our Data Quality Trust Center, where we provide transparency into our Data Quality processes so users can feel secure knowing that they are leveraging the best-in-class data for their use-cases.Stay tuned as we continue to push the boundaries of data quality in the Web3 space, setting new standards for completeness, accuracy, and reliability in blockchain data.

In the dynamic world of blockchain, where data is decentralized and constantly evolving, ensuring the completeness and accuracy of information is paramount. SonarX (formerly Sonarverse), as the forefront provider of Web3 data infrastructure, takes pride in its commitment to delivering complete and curated blockchain data. Our Data Quality process is designed to continuously perform rigorous checks, addressing various aspects to guarantee the reliability and completeness of the data we provide.

Blockchain data, when taken directly from nodes, tends to be inconsistent. Factors such as network latency, node synchronization, and variations in data formats can contribute to these inconsistencies. SonarX acknowledges these challenges and, through a meticulous Data Quality process, ensures that our users receive complete, accurate, and reliable data ready for consumption.

Read on to learn more about how SonarX ensures completeness and accuracy in our data.

Reorg Check

Objective

Ensure the relationship between block hash and parent hash on the block level, confirming block linkage and preventing potential content manipulation. This requires handling data adjustments resulting from blockchain reorganizations, ensuring data integrity in the face of conflicting blocks.

Problem Solved

Reorgs, or reorganizations, occur on a blockchain when a longer valid chain replaces a shorter one. This is usually due to the addition of new blocks or resolution of network forks. Blockchain reorgs help maintain consensus and integrity by ensuring that the majority of network participants agree on the valid transaction history.

Dupes (Duplicates)

Objective

Ensure the prevention of duplicated information in the Snowflake database and blockchain node data, preserving data integrity for metrics and roll-ups.

Problem Solved

By implementing duplication checks, the system identifies and addresses instances of transaction hash duplication across chains (e.g., XDC Network, Optimism, Arbitrum). This prevents adverse effects on metrics and roll-ups, ensuring accurate and reliable data for downstream processes.

Gaps

Objective

Identify and address missing or incomplete data sequences in the Snowflake database, ensuring a continuous and complete record of blockchain data.

Problem Solved

Checks are implemented to verify the presence of a comprehensive data record from the genesis to the latest block. Valid gaps, which refer to missing or incomplete sequences of data within the blockchain and are acknowledged in chains like Solana and Ethereum Beacon Chain, are confirmed and aligned with validation expectations. This ensures a thorough examination and validation of gaps, maintaining the integrity of the dataset.

Check Sums

Objective

Maintain the integrity of blockchain data during flattening or joining processes through the use of checksums, safeguarding the alignment with curated reporting layer tables.

Problem Solved

Utilizing checksums* across the entire chain addresses the complexity of blockchain data, particularly in deep object arrays. This serves as a safeguard against inaccurate data emissions, ensuring that the source of truth consistently aligns with the curated reporting layer tables.

*Checksums are the sum total of transactions, receipts, logs, or any field that we are verifying against a source of truth, such as an RPC call on a blockchain node.

Table Flow Validation

Objective

Confirm consistency between raw and processed tables, addressing new key-value pairs introduced by blockchain changes and updates.

Problem Solved

Validation mechanisms alert, address, and add any new key-value pairs in subsequent updates to the Snowflake database. This ensures that attributes and information remain consistent after processing, accommodating changes introduced by blockchain updates.

Nulls

Objective

Identify and address instances of empty (null) essential data fields in the blockchain, ensuring completeness and data integrity.

Problem Solved

Validation tests cross-check and address discrepancies when null data is encountered. This ensures the maintenance of data completeness, especially when re-ingestion reveals valid full data after initial null values.

However when cross-checked on the Polygon Explorer, the correct block hash was shown as 0xc2c119387cca7c8b87caa67761380a35a127ffb7dfc6c40f709cdbd05ac7f687.

SonarX’s validation tests ensure that each transaction has a matching receipt and checksums on the receipts. A mismatch between the receipt and checksums indicates potential inconsistencies in the data within the receipt, which may have occurred due to data tampering, or errors when processing the transaction information.

Match Checks

Objective

Verify the integrity of data within blocks by checking if the block hash matches the transaction and receipt hash, identifying potential tampering.

Problem Solved

Discrepancies between block level and other node calls are identified and addressed during data ingestion. This ensures the consistency of data across diverse calls, safeguarding against tampering and maintaining the integrity of block linkage.

Token Metadata Refresh

Objective

Ensure that token contract metadata fields are up to date.

Problem Solved

Regular reviews and updates of token metadata labels ensure consistency with both node and off-chain data, addressing the mutability of token information and preventing discrepancies.

Due to the mutability of token information, regular review and updates of token metadata are necessary to ensure consistency with both node and off-chain data. SonarX’s regular token metadata refresh ensures that any changes to the metadata are captured as soon as possible.