Data Product

No-nonsense

Data Quality Monitoring

GknowsysDQ ensures data integrity and quality in real-time data pipelines.

Prominent Features

Discover the robust features that make GknowsysDQ a top-tier data quality monitoring tool.

Data Quality Assurance

Receive instant alerts when data issues occur, ensuring reliable data processing.

Data Integrity

Prevent data corruption by halting data processing when issues are detected.

Quality Rules Library

Choose from a library of quality rules across data sources and stages.

Seamless Integration

Easily integrate GknowsysDQ into existing pipelines using our extensive APIs, and Warehouse and Data Lake integrations.

Data Security

Choose what access you share with GknowsysDQ and what Metadata is captured, or use access-less integrations.

Event Notifications

Stay informed of the quality issues via SMS, email, or a workflow management system.

Key Concepts

Understand the core concepts that power GknowsysDQ High Quality Data monitoring.

Individual steps in data processing, or the different states of the data like processes, ingested data, processed data, and more

Each data processing step is identified as a process. Each stage the data goes through, like ingestion, processed and report is identified as a component. Each type of component has expected behaviour and quality check types in GknowsysDQ.

Components interconnect based on dependency order to ensure smooth data flow

Relationships between the components show the interdependence of the components fot the purpose of calculating the quality. This is by nature the inverse order of execution represented by the orchestrator tools like Apache Airflow.

Represents your data transformation process, composed of various components

A pipeline is a logical representation of the data processing pipeline, represented as a DAG. Pipelines in GknowsysDQ are more nuanced than those in orchestration tools like Apache Airflow, in the sense that they also represent logica groups and stages.

Tests that validate the state of processes or data, associated with components

As tests, they are executable expectations of the data quality. There is a library of checks to choose from. There are some built-in checks based on the component type. And it is possible to write your own.

Comprehensive reports generated by health checks, providing insights into your pipeline's health

A quality rating generated by GknowsysDQ based on the checks applied. It is a comprehensive metrics calculated by weighted resolution of ratings of all dependencies and checks applied.

How GknowsysDQ Works

Import Data Pipelines

Import the processing stages as components and connect them to build pipelines.

Define Health Checks

Choose form the library, ask the GknowsysDQ GPT or Write your own checks.

Enjoy Better Data Quality

Experience heightened confidence in your data with our precision-assured reports.

gknowsysDQ is a cutting-edge quality monitoring and assessment tool, born from our extensive experience with processing vast, multi-tenant data through complex, multi-step processing pipelines. This tool aims to shift the traditional focus from merely processing and logging to analysing data states and their transitions, ensuring a high standard of data quality and reliability.

Historically, monitoring has focused on process logs, which are inadequate for detecting issues in dynamic datasets; even more so when processing happens in recurring cycles. With the sheer volume of data, processing often involves multiple steps, and intermediate steps are usually non-human-readable. Typically, errors are identified only at the final reporting stage. This is particularly problematic in SaaS systems, where scrutinizing every process report for each tenant is impractical. Such oversight can lead to errors in final reports, eroding user trust in the platform.

gknowsysDQ addresses these challenges by automating data QA in multi-tenant, multi-step data processing systems using a "fail-fast, fail-early" approach. This enhances data quality assurance and helps build trust in your data. gknowsysDQ takes an approach similar to log monitoring tools but focuses on data and data pipelines. In data-heavy applications, traditional log monitoring tools are inadequate because the output is data, not logs. The key message is to monitor the data itself, not just the logs.

gknowsysDQ offers robust data quality assurance and monitoring capabilities, complemented by an extensive alerting and notification system and event subscription mechanisms. It integrates seamlessly with various orchestrators and connects to a wide range of data stores. By performing live, synchronized data analysis integrated with the orchestrators, gknowsysDQ provides flexibility for data processing involvement. It also includes APIs that can be utilized in any codebase to trigger data quality assessments as needed, retrieve results, and make informed decisions.

A significant differentiator of gknowsysDQ is its approach to validating a dataset’s quality. In typical scenarios, data sets are interdependent, meaning a quality drop in one can affect others downstream. Traditional data quality tools often fail to highlight this impact. gknowsysDQ assigns a Data Confidence Rating (DCR) to a dataset, reflecting the percentage of failed checks, the importance of the failed checks, the dataset's relationships, and its contributions to downstream datasets. This DCR acts like a ripple in a pond, propagating the impact of a failed check through the pipeline to the DCR of the final report.

At its core, gknowsysDQ is a powerful rule engine with integrations into various data sources and alerting systems, offering flexibility in the use-cases it can be applied in and how.