Statistics for Data Science
上QQ阅读APP看书,第一时间看更新

Data quality or data cleansing

Do you think a data developer is interested in the quality of data in a database? Of course, a data developer needs to care about the level of quality of the data they support or provide access to. For a data developer, the process of data quality assurance (DQA) within an organization is more mechanical in nature, such as ensuring data is current and complete and stored in the correct format.

With data cleansing, you see the data scientist put more emphasis on the concept of statistical data quality. This includes using relationships found within the data to improve the levels of data quality. As an example, an individual whose age is nine, should not be labeled or shown as part of a group of legal drivers in the United States incorrectly labeled data.

You may be familiar with the term munging data. Munging may be sometimes defined as the act of tying together systems and interfaces that were not specifically designed to interoperate. Munging can also be defined as the processing or filtering of raw data into another form for a particular use or need.