Semantics3 Amazon
As more merchants flock to the web to sell their products, there has been a deluge of data to be indexed by retailers who are looking to see where certain products are being sold and for how much. Parsing and extracting the value from all of this data is a huge challenge. YC-backed Semantics3 has created a database that aims to track every product sold online, and every price it has ever been sold at, providing retailers with an API to this database.
The company, which was founded by classmates at a computer engineering college program in Singapore, indexes several dozen of the top e-commerce sites online and provides a self-serve API so developers can tap into its constantly updated database of consumer products. Why would developers want to index this data? Retailers need to do UPC lookups, get detailed data for products (i.e. consumer electronics or clothing) sold on the web, price histories and more.
For instance, retailers could identify how much an item cost a month ago and how it has changed so they can optimize their pricing for a similar product. The API also gives retailers access to product data, including name, price, brand, model, color, size, UPC code, images, dimensions (width, height, length), weight, purchase links
Search and filtering on all of the parameters above.
Additionally, Semantics3 parses data, such as the condition of a product (i.e. new vs. used), shipping info and availability. The startup takes it a step further with its sales rank, which is a ranking for every product the site calculates as useful for figuring out what products to sell. So you could see products that have a high rank, but that only a small number of retailers carry.
All products are refreshed for current information according to this rank. The top 1 percent of items is refreshed every hour, and the top 20 percent of items are refreshed daily. Currently the database has more than 20 million products listed and is growing by 5 million products per month.
On the backend, Semantics3 has built a custom, high-powered data-parsing system that processes close to 500GB a day on 250 nodes, all managed by the startup’s five-person team.