A dataset for natural language