How can Google identify and interpret entities from unstructured content?

Reddi1 · Post by **Reddi1** » Thu Jan 30, 2025 6:42 am

As explained in the last post How does Google process information from Wikipedia for the Knowledge Graph?, knowledge databases such as the Knowledge Graph face the challenging task of balancing completeness and accuracy of information. A necessary prerequisite for completeness is that Google is able to identify, interpret and extract information from unstructured data sources. More on this in this post.

Table of contents [ Hide ]

1 Google's journey to semantic understanding
2 The problem with knowledge databases like Wikipedia and Wikidata
3 Closed vs. Open Extraction of Information
3.1 Example process for a closed extraction of facts/information
4 Detection of tail entities
4.1 Tip for nerds!
4.2 Tip for nerds!
5 Named Entity Recognition
5.1 Tip for nerds!
6 Extraction of events
7 Machine Learning as a central technology for processing unstructured data
8 Assigning new entities to classes and types via unsupervised machine learning
9 Methods to Ensure Up-to-Dateness
10 The Knowledge Vault as Knowledge Graph 2.0
10.1 Tip for nerds!
10.2 Tip for nerds!
11 Conclusion: Google is only at the beginning of extracting unstructured information
Google's journey to semantic understanding
The issue of extracting semantic information about objects oman phone number data or entities from unstructured documents has occupied Google since the late 1990s. For example, there is a Google patent from 1999 entitled Extracting Patterns and Relations from Scattered Databases Such as the World Wide Web (pdf) . It is one of the first Google patents on semantic issues.

Read more in the article How smart is Google? Real semantic understanding or just statistics ?

The first step in the first years of the Knowledge Graph was the extraction of structured and semi-structured data. Google is already pretty good at extracting and processing information from Wikipedia or Wikidata, for example. You can find out more in the articles How does Google process information from Wikipedia for the Knowledge Graph? and Everything you need to know about entity types, classes & attributes .

But this can only have been the beginning, since the limitations of such a methodology are obvious.