UNSPSC - Non-English Data Classification Challenges and Overcomes
- A case study
INTRODUCTION
The client is one of the leading Healthcare and MRO product distributors in Asia, with more than 10 million products listed in their inventory. More than 3 million active products are retailed and featured on their e-commerce website.

The client wanted to classify all the products to UNSPSC. The Objectives of the product classification was to improve product movement visibility and achieve better spend management, category wise.
CHALLENGE
Japanese Language: The product listings and data set were 100% in Japanese language.

Insufficient Description: More than 30% of the data had insufficient product description, which would have helped identify the product and its application.

Missing Information: The Manufacturer’s Name and Part Numbers for more than 20% of the products were missing.

Local Manufacturers/Suppliers: About 90% of the Manufacturers and Suppliers were based out of Japan.

Multiple Industries: Data belonged to multiple industries, such as,
  • Healthcare, Lab supplies, and Pharmaceutical products,
  • MRO spare parts and Office supplies,
  • Automotive and after-market products, and
  • Electrical equipment and accessories, and
  • Electronic products, and more.
DATA TRANSLATION ISSUES
Following translation issues were experienced.

Mistranslation in both Google and Bing
Part number Description Google Translation Bing Translation Actual Product
60284 Unwireチューブラック 白 Unwire Chew Black White Unwire Tube Rack White Slant Rack

Translation conflicts between the Google and Bing
Part number Description Google Translation Bing Translation Actual Product
60270 チューブラック Chew Black Tube Racks Tube Rack

Product conflicts between input description and source
Part number Description Input Product Name Source Product Name Actual Product
1-7161-01 クリーンモップ CM-WC35S Lモ ップ材質:ポリエステル Clean Mop CM-WC35SL Mop Head Mop Head
SOLUTION
After evaluating and examining the input data in detail, we understood the challenges involved, and devised a customized strategy that was best suited to handle data processing and subsequent categorization.
  • We analyzed input data, to ensure that it included all the required information such as Manufacturer’s Name, Part Number and Description.
  • We translated the input data to English language with our in-house Google and Bing API translation tool. Data samples were checked manually to ensure accuracy in translation.
  • The products belonged to multiple industries. We grouped and categorized items industry wise, using our in-house product grouping tool. Based on various sectors of industry, data was assigned to different subject matter experts to maintain accuracy in classification.
  • We resolved data translation conflicts and issues by sourcing products from respective Manufacturer’s/Supplier’s website, to ensure that the product identification and further classification was suitably done.
  • By employing sourcing, we were able to source more than 90% of the products and also increase accuracy in classification. Subsequently, we were able to classify more than 85% of the products at the commodity level.
  • We had received product data that was incomplete and incorrect too. The product, type, family, size, name and even numbering was not in sync and unavailable. However, we classified the product based on available input descriptions and by cross checking it with the Manufacturer’s line of business (LOB).
  • Only when we had exhausted all possible options and were unable to identify and classify products due to insufficient details and descriptions, we categorized it as “Unable to Classify” and sent it back to the client requesting for more relevant information.
HOW THE CLIENT BENEFITED
  • Efficient Spend and product data management.
  • The client’s e-commerce site gained increased visibility
  • Navigability vastly improved by the standard widely used product taxonomy.
  • Products were retrievable at a higher and broad-based level and also down to its absolute breakdown.
BOTTOM LINE/BUSINESS BENEFITS
  • SEO is at a big advantage, the UNSPSC classification had a direct bearing on keywords. Thus, SEO strategies can be better applied.
  • Online sales are markedly improved thanks to product classification.
  • Categorization makes it easy to search and identify down to the last T.
  • Work-flow significantly improved as it facilitated easy identification of header and product name.