What is Data Integrity?




Data integrity is the maintenance of, and the assurance of the accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context – even under the same general umbrella of computing. It is at times used as a proxy term for data quality, while data validation is a pre-requisite for data integrity. Data integrity is the opposite of data corruption. The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the data is the same as it was when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties.

Any unintended changes to data as the result of a storage, retrieval, or processing operation, including malicious intent, unexpected hardware failure, and human error, is the failure of data integrity. If the changes are the result of unauthorized access, it may also be a failure of data security. Depending on the data involved this could manifest itself as benign as a single pixel in an image appearing a different color than was originally recorded, to the loss of vacation pictures or a business-critical database, to even catastrophic loss of human life in a life-critical system.

There are two types of Integrity types

Physical integrity-

Physical integrity deals with challenges associated with correctly storing and fetching the data itself. Challenges with physical integrity may include electromechanical faults, design flaws, material fatigue, corrosion, power outages, natural disasters, acts of war and terrorism, and other special environmental hazards such as ionizing radiation, extreme temperatures, pressures and g-forces. Ensuring physical integrity includes methods such as redundant hardware, an uninterruptible power supply, certain types of RAID arrays, radiation-hardened chips, error-correcting memory, use of a clustered file system, using file systems that employ block-level checksums such as ZFS, storage arrays that compute parity calculations such as exclusive or use a cryptographic hash function and even having a watchdog timer on critical subsystems. Physical integrity often makes extensive use of error detecting algorithms known as error-correcting codes. Human-induced data integrity errors are often detected through the use of simpler checks and algorithms, such as the Damm algorithm or Luhn algorithm. These are used to maintain data integrity after manual transcription from one computer system to another by a human intermediary (e.g. credit card or bank routing numbers). Computer-induced transcription errors can be detected through hash functions. In production systems, these techniques are used together to ensure various degrees of data integrity. For example, a computer file system may be configured on a fault-tolerant RAID array, but might not provide block-level checksums to detect and prevent silent data corruption. As another example, a database management system might be compliant with the ACID properties, but the RAID controller or hard disk drive's internal write cache might not be.

Logical integrity-

This type of integrity is concerned with the correctness or rationality of a piece of data, given a particular context. This includes topics such as referential integrity and entity integrity in a relational database or correctly ignoring impossible sensor data in robotic systems. These concerns involve ensuring that the data "makes sense" given its environment. Challenges include software bugs, design flaws, and human errors. Common methods of ensuring logical integrity include things such as check constraints, foreign key constraints, program assertions, and other run-time sanity checks. Both physical and logical integrity often share many common challenges such as human errors and design flaws, and both must appropriately deal with concurrent requests to record and retrieve data, the latter of which is entirely a subject on its own.

Data integrity contains guidelines for data retention, specifying or guaranteeing the length of time data can be retained in a particular database. To achieve data integrity, these rules are consistently and routinely applied to all data entering the system, and any relaxation of enforcement could cause errors in the data. Implementing checks on the data as close as possible to the source of input (such as human data entry), causes less erroneous data to enter the system. Strict enforcement of data integrity rules results in lower error rates, and time saved troubleshooting and tracing erroneous data and the errors it causes to algorithms.

Data integrity also includes rules defining the relations a piece of data can have, to other pieces of data, such as a Customer record being allowed to link to purchased Products, but not to unrelated data such as Corporate Assets. Data integrity often includes checks and correction for invalid data, based on a fixed schema or a predefined set of rules. An example being textual data entered where a date-time value is required. Rules for data derivation are also applicable, specifying how a data value is derived based on algorithms, contributors, and conditions. It also specifies the conditions on how the data value could be re-derived.

The term data integrity also leads to confusion because it may refer either to a state or a process. Data integrity as a state defines a data set that is both valid and accurate. On the other hand, data integrity as a process describes measures used to ensure the validity and accuracy of a data set or all data contained in a database or other construct. For instance, error checking and validation methods may be referred to as data integrity processes.

Maintaining data integrity is important for several reasons. For one, data integrity ensures recoverability and searchability, traceability (to origin), and connectivity. Protecting the validity and accuracy of data also increases stability and performance while improving reusability and maintainability.

Data increasingly drives enterprise decision-making, but it must undergo a variety of changes and processes to go from raw form to formats more practical for identifying relationships and facilitating informed decisions. Therefore, data integrity is a top priority for modern enterprises.

Data integrity can be compromised in a variety of ways, making data integrity practices an essential component of effective enterprise security protocols. Data integrity may be compromised through Human error, whether malicious or unintentional Transfer errors, including unintended alterations or data, compromise during the transfer from one device to another bug, viruses/malware, hacking, and other cyber threats Compromised hardware, such as a device or disk crash Physical compromise to devices Since only some of these compromises may be adequately prevented through data security, the case for data backup and duplication becomes critical for ensuring data integrity. Other data integrity best practices include input validation to preclude the entering of invalid data, error detection/data validation to identify errors in data transmission, and security measures such as data loss prevention, access control, data encryption,and more.
What is Data Integrity? What is Data Integrity? Reviewed by Knowledge shop on June 07, 2020 Rating: 5

No comments:

Powered by Blogger.