Defining the machine learning problem
As defined by Tom Mitchell, the problem must be a well-defined machine learning problem. The three important questions to be solved at this stage include the following:
- Do we have the right problem?
- Do we have the right data?
- Do we have the right success criteria?
The problem should be such that the outcome that is going to be obtained as a solution to the problem is valuable for the business. There should be sufficient historical data that should be available for learning/training purposes. The objective should be measurable and we should know how much of the objective has been achieved at any point in time.
For example, if we are going to identify fraudulent transactions from a set of online transactions, then determining such fraudulent transactions is definitely valuable for the business. We need to have a sufficient set of online transactions. We should have a sufficient set of transactions that belong to various fraudulent categories. We should also have a mechanism to determine whether the outcome predicted as a fraudulent or nonfraudulent transaction can be verified and validated for the accuracy of prediction.