Inclusion of sales price in EstiBot and its effect on appraisal

EstiBot is the most popular domain appraisal tool without question. I like their basic idea that price is calculated mainly from exact search volume of the domain name and its CPC (they use a large amount of other data for their prediction algorithm, with parameters including language, category, extension, etc). Similarly to PageRank, the exact algorithm is not public. EstiBot’s main strength is to compare domain prices calculated from these metrics, even if the predicted prices are often much higher than that of real sales.

And so, I check domains’ EstiBot value on a daily basis. Once I had a very prestigious domain, EnglishLanguage.com. I listed the domain for sale on Flippa and sold it for USD 15,000, which was a reasonable price according to the comments on the listing. At that time its EstiBot value was USD 88,000. A few months later I checked the EstiBot value again, and observed a sudden decrease to 16,000 USD. Given the high popularity of EstiBot, the appraisal value alone is able to affect sales price, so it was a quite frustrating observation, especially for the new owner. Also, this correction did not affect similar domain names such as EnglishClub.com, which had the same 50,000 USD EstiBot value as before. So, according to EstiBot, a few months ago EnglishLanguage.com was almost twice as much valuable as EnglishClub.com, and now its value is only the third of that. Obviously, this is wrong, since no parameters, such as search volume, CPC etc changed during this timeframe. So, in this case the inclusion of historical sales data made a huge difference between ranking of domains covering the same topic. But how often does it happen?

I made an attempt to check EstiBot prediction and real sales price in case of 180 domains from my recent bidding history on GoDaddy to see how many times corrections were made. It is important to note that I mainly select domains based on their previous content and success (e.g. PageRank) and not based on their name. Thus, I was not really expecting an outstanding prediction rate, as EstiBot rather predicts the value of the name and not domain history. So let’s see what I got:

Sale vs Estibot prices of recent GoDaddy auctions: EstiBot will reset the appraisal of large percent of domains

Sale vs Estibot prices of recent GoDaddy auctions: EstiBot will reset the appraisal of large percent of domains

As you can see, among the 180 tested domains, corrections (model update) based on real sales price were made in 40 cases – these are the points on the diagonal where the appraisal and sales price are the same. As you see, where corrections were not made, there was no correlation between predicted and real sales price. That alone would be not so surprising, since as I stated above, EstiBot is known to focus on search volume rather than SEO profile of a domain. However, there is a major flow: the corrected domain names all possessed pagerank, mainly between 4 and 7 – and these sales prices came from the domains’ pagerank and certainly not from their name.

In order to test EstiBot appraisal on domains where the sales price is not influenced by its backlink profile, EstiBot appraisal accuracy was tested in case of four character domains. Sale prices were taken from NameBio and then compared to EstiBot predictions. As you can see from the figure, sale prices of 4 character domains are overestimated by EstiBot. In the figure red points indicate domains with related sales – in some cases related domain meaning exactly the domain in question. Black points indicate domain names with no related sales history. There is very low correlation observed between predicted and real sales price in case of the latter group, so this is probably the reason why EstiBot includes actual sales history in their model.

Prediction of 4 character domains with Estibot

Prediction of 4 character domains with Estibot

So this quick investigation reveals two major flaws resulting from the inclusion of real sale prices in EstiBot:

  • The price correction does not seem to influence similar domain names’ sale price. Thus, the EstiBot prediction of a more valuable name with a real sales value might be unrealistically decreased as compared to a similar but not yet sold domain name (see my example of EnglishLanguage.com and EnglishClub.com), so not just the prediction values but also ranking of domains is affected.
  • In case of domain names with higher PageRank, the sales price correction results in unrealistically high EstiBot value. As an example, I sold e-polymers.org, a PageRank 7 domain for thousands of dollars. This sale was not registered in EstiBot and its prediction value is only 5 USD, pointing to the fact that pagerank is not a parameter in EstiBot prediction algorithm. I agree that a prediction of $5 for the name is just correct. Oppositely, estimated EstiBot value and real sales price of webtoolkit4.me are 530 USD. Obviously, this domain was sold for its PageRank and not its name. In reality, the name itself has no value for this domain.

Thus, at EstiBot prediction you should really check the domain sales history, which is also indicated in their result to see if it is a real prediction or just a sales data. Historical sales data should only be used for improving the prediction algorithm but not directly as a prediction. Still, there is no better prediction tool than Estibot, but it is for sure that sporadic inclusion of sales price in some cases just messes up the prediction. Why not keep showing the predicted price and show sales data as an additional information?

Scientific background: Since there is a very slight difference between the sales price and the Estibot price of the corrected domains in many cases, it can be assumed that these sale prices are not directly corrected but added to the training set of an already overfitted model. I.e. overfitting in this case means that EstiBot tries to find a domain with similar parameters (was registered in the same year, the same search volume, same volume of Google hits etc. ), and, the process of finding more domains where all parameters are similar ends up in finding one domain alone. The consequence of the problematic overfitting is that data within the training set (domains used to build the prediction model) are “predicted” with irrationally high precision, while data not included in the training set (external data) are predicted with no correlation at all. The consequence of an overfitted model is that it rather becomes a database of the stored data and not a model for prediction.

3 Comments

  1. Logan November 30, 2015
  2. Change December 10, 2015
    • Eszter Hazai Eszter Hazai December 16, 2015