Beginners’ guide to Knowledge Graphs and Scene-Graphs

12 min readJan 18, 2023

Beginners visualization of graphed entites — Nodes and edges connected in a graphed structure

This Tutorial is meant for Beginners. In this Tutorial, I’ll be giving an introduction to Graphed entities and their usage. So, a little knowledge of graph data structures will be enough for this part of the tutorial. You can have a computing background and you’ll be able to understand the importance of Scene Graphs and Knowledge Base in the real world today.

OpenAI dall-e was a major motivation for me to find more optimized and effective ways to generate scenes. Traditional transformer models require huge training datasets and a lot of computational power. This made me attract more to knowledge bases and project information in a uniquely structured way.

In this blog we will cover the following major concepts:

What are graphs and why are they important?
What are knowledge graphs?
What are scene graphs and why are they related to knowledge graphs
How do they work and are implemented?
How Knowledge bases are being currently used?
Scene graphs and neural models.
Image Captioning Intuition
Further Reading
References

Here’s me trying to breakdown the topic for you on a lighter note

Note: You can clone the code I am using from GitHub link or feel free to use your own.

Graphs??

The basic concepts of graphs include non-linear data structures with nodes and vertices carrying information. They have been around for a long time now and are massively used to solve real-world problems starting from domains like social networks to transport networks, you name it.

What makes graphs so special is that they encode relationships between different data items. This is why graphs are so powerful — they provide a complete and rich representation of the given input.

Moreover, graphs help ensure preserve the context of the data inside each node with complex relational structures. New neural pathways help create stronger bonds and meaning preservation. In a way, graphs are similar to the brain in that they both create new connections making historical information more secure.

So what is a Knowledge Graph then?

A knowledge graph is a representation of real-world entities and the relationships between them.

It uses RDF triplets, to store information in a structured way. Knowledge graphs are used in various applications such as natural language processing and artificial intelligence to extract meaningful insights from data. Now you would ask what is an RDF triplet, don’t worry I got you covered

A sample knowledge graph — Above is a sample knowledge graph made from a text. A farmer grows wheat on a farm. the farmer owns a farm and a tractor. farmer uses a tractor to plow the farm.

RDF (Resource Description Framework) triplets are the basic building blocks of RDF, which is a standard for representing information in a machine-readable format. Each RDF triplet consists of three parts:

Subject: The subject of the triplet represents the resource that the statement is about. It is typically identified by a URI (Uniform Resource Identifier)
Predicate: The predicate of the triplet represents the relationship between the subject and the object. It is typically identified by a URI.
Object: The object of the triplet represents the value or resource that is related to the subject. It can be identified by a URI or a literal value (e.g. a string or a number).

Together, these three parts form a statement that describes a relationship between two resources. For example from the sample graph above, a triplet “a farmer grows wheat” can be represented as:

Subject: “Farmer” (URI)

Predicate: “grows” (URI)

Object: “wheat” (URI)

It is used to connect data from different sources in a structured way, which allows for more efficient querying and analysis of the data. These are vital to storing textual data since they can form great meaningful and brief information on subjects. Knowledge representation in nodal form has been running for a long time.

Note: check out the amazing article from Sebastian Dery for more understanding of the subject matter

why knowledge graphs?

Knowledge Graphs (KG) are a powerful tool for representing and organizing information in a structured way. There are several reasons why you might choose to use a KG over other forms of data representation:

Connectivity: KGs are designed to represent relationships between entities, making it easy to navigate and understand the connections between different pieces of information.
Scalability: KGs can handle large amounts of data and can be easily expanded as new information is added.
Flexibility: KGs can be used to represent a wide variety of information, from simple facts to complex relationships between entities.
Interoperability: KGs use a common data model and ontologies, making it easy to integrate information from different sources and share it with others.
Searchability: KGs can be queried using natural language, making it easy to find specific information or to explore the graph and discover new insights.
Machine-Readable: KGs are represented in a machine-readable format, such as RDF or OWL, making it easy to perform automated reasoning, and to use the data with AI and Machine Learning algorithms.
Personalization: KGs can be used to model the user’s context, preferences, and interactions, enabling personalized and context-aware applications.

What are scene graphs and why are they relevant here?

Scene graphs are a type of knowledge graph that is used to represent the objects and relationships in an image or video. They are used in computer vision and natural language processing to help machines understand the contents of images and videos.

A scene graph is a directed graph that represents the objects and their relationships in a scene. The nodes in a scene graph represent the objects, and the edges represent the relationships between the objects. The relationships can be hierarchical (e.g. a car is part of a street scene) or semantic (e.g. a person is sitting on a chair).

Scene graphs are useful for a wide range of computer vision tasks, such as image captioning, object detection and recognition, and visual question answering. By representing the objects and relationships in an image or video in a structured way, scene graphs make it easier for machines to understand and reason about the contents of the image or video.

How are Scene-graphs and KG’s different?

Scene graphs and knowledge graphs are different types of data structures and they serve different purposes, but they can be used together to improve image understanding.

Knowledge Graphs are used to represent real-world entities and their relationships used to represent information in a structured format in a general sense. It could represent a structure for people, things concepts, etc.

Meanwhile, Scene Graphs are used to represent the objects, attributes, and spatial relationships between objects like containment, proximity, and actions in an image or video specifically — a 3D environment.

Since both of these methods are used to map the context and information in an image, we can make our models more effective with a fusion of both concepts for more thorough information retrieval.

So basically… When knowledge graphs are applied to images, they can provide additional information and context to the scene graph, allowing the system to understand the semantic meaning of the objects, concepts, and their relationships in the image. This information can be used to generate more accurate and detailed captions and to perform other tasks such as object recognition, and relationship extraction.

Why are they relevant??

Scene graphs are also relevant to the knowledge graph topic because they can be used to generate a knowledge graph from an image or video. By extracting the objects and relationships from the image or video, it is possible to create a knowledge graph that represents the entities and their relationships in the scene. This can be useful for tasks such as image retrieval, image-based question answering, and image-based recommendation.

Scene graphs are a structured way of representing objects and relationships in an image or video, which is useful for a wide range of computer vision tasks. They are also relevant in the knowledge graph topic because they can be used to generate a knowledge graph from an image or video, which can be useful for tasks such as image retrieval, image-based question answering, and image-based recommendation.

How do KG’s work?

Implementing a knowledge graph or a scene graph typically involves several steps:

Data Collection: The first step is to collect data from various sources. This can include structured data from databases, unstructured data from text documents, and multimedia data from images and videos.
Data Preprocessing: The next step is to preprocess the data to remove any irrelevant or duplicate information and to ensure that the data is in a format that can be easily used for the knowledge graph or scene graph. This step may involve data cleaning, data integration, and data transformation.
Entity and Relationships Extraction: The third step is to extract the entities and relationships from the data. This can be done using techniques such as named entity recognition, relationship extraction, and object detection.
Knowledge Graph Construction: Once the entities and relationships have been extracted, the next step is to construct the knowledge graph. This can be done using a graph database, such as Neo4j or Titan, or by using a triple store, such as Virtuoso or Fuseki.
Scene Graph Construction: If you want to create a scene graph, you would need to use computer vision techniques such as object detection, semantic segmentation, and instance segmentation to extract the objects and relationships from the image or video.
Data Populating: Once the knowledge graph or the scene graph has been constructed, the next step is to populate it with the entities and relationships that have been extracted.
Data Querying: Once the knowledge graph or scene graph is populated, it can be queried to extract useful information or to answer specific questions.
Data Maintenance: Finally, it is essential to maintain the knowledge graph or scene graph by regularly updating it with new data, monitoring for errors, and ensuring that it remains accurate and up to date.

It is important to note that these steps can vary depending on the specific use case and the technology used to implement the knowledge graph or scene graph. Additionally, some different libraries and frameworks can help you with these steps, such as OpenAI GPT-3 and Google’s Tensor.

Note: I would recommend taking this free OpenHPI course for knowledge representation as it is thorough and a greatly explained course.

Knowledge Engineering with Semantic Web Technologies

The knowledge contained in the World Wide Web is available in interlinked documents written in natural language. To…

open.hpi.de

How are knowledge graphs used today?

Knowledge graphs are currently used in a wide range of applications, some of which include:

Search engines:

Knowledge graphs are used to provide more accurate and relevant search results by connecting data from different sources and understanding the relationships between entities. This allows search engines to understand the intent of the user’s query and provide more relevant results.

Recommendation systems:

Knowledge graphs are used to recommend items or content to users by understanding the relationships between the items and the user’s interests. For example, a knowledge graph can be used to recommend similar movies to a user based on the movies they have previously watched.

Natural Language Processing:

Knowledge graphs are used to improve natural language processing tasks such as named entity recognition, relation extraction, and question answering. They are used to provide more context and understanding of the text, which can improve the accuracy of the NLP models.

Information Retrieval:

Knowledge graphs are used to improve the efficiency and accuracy of information retrieval systems by providing a structured representation of the data and understanding the relationships between the entities.

These are just a few examples of how knowledge graphs are currently being used, and new use cases are being discovered all the time. With the ability to connect data from different sources and understand the relationships between entities, knowledge graphs have the potential to improve many different areas of business and society.

Scene graph & neural models

Scene graphs, as described earlier, are a structured representation of objects and their relationships in an image or video. They are used in computer vision and natural language processing to help machines understand the contents of images and videos. One way to generate scene graphs is through the use of neural models.

Above is the image generated by Dall-e by the same prompt below

A farmer grows wheat on a farm, the farmer owns a farm and a tractor. farmer uses a tractor to plow the farm.

Neural models, such as Graph R-CNN, VG-RAM, Neural Motifs, and Scene Graph Generation by Iterative Message Passing, are a type of machine learning model inspired by the structure and function of the human brain. These models are trained on large datasets of images and their corresponding scene graphs and can detect objects and relationships in a snap.

Graph R-CNN is a neural model that uses a convolutional neural network (CNN) to detect objects in the image and a graph neural network (GNN) to detect relationships between the objects. VG-RAM uses a CNN to detect objects and a recurrent neural network (RNN) to detect relationships between objects. Neural Motifs use a CNN to detect objects in the image and a sequence-to-sequence model to detect relationships between the objects. Scene Graph Generation by Iterative Message Passing uses a CNN to detect objects in the image and then uses a message-passing neural network to learn relationships between the objects.

One of the current research work in this field is to improve the performance of these models by incorporating more complex and informative relationships between objects. Another focus is on the scalability of these models, to handle large and complex images and videos. Researchers are also working on incorporating external knowledge such as text, to improve the understanding of the scene.

Additionally, researchers are also focusing on the interpretability of these models, to better understand how the model is making its predictions and to make the model more transparent.

TLDR; Scene Graphs are a powerful tool for understanding the contents of images and videos, and neural models are an effective way to generate scene graphs. With ongoing research in this field, we can expect to see continued improvements in the performance, scalability, and interpretability of these models.

Image captioning with scene-graphs

Image captioning is the process of generating natural language descriptions of an image. It is a task that is used in computer vision and natural language processing to help machines understand the contents of an image.

Scene graphs and knowledge graphs are both used to represent the objects and relationships within an image in a structured way. Scene graphs are specifically used to represent the objects and relationships within an image, while knowledge graphs can be used to represent a broader range of entities and relationships.

In image captioning, scene graphs are used to represent the objects and relationships within the image, and then the information is used to generate natural language descriptions of the image. Scene graphs provide a structured representation of the objects and relationships within an image, which makes it easier for the machine to understand the contents of the image.

Knowledge graphs are also used in image captioning to provide more context and understanding of the image. The knowledge graph can be used to provide more information about the objects and relationships within the image, which can improve the accuracy of image captioning.

Current large language models, such as BERT and GPT-3, and transformers, such as Transformer-XL, are used to generate natural language descriptions of the image. These models are trained on large datasets of images and their corresponding captions and can generate natural language descriptions of an image based on the objects and relationships represented in the scene graph.

Conclusion

knowledge graphs and scene graphs are powerful tools for representing and organizing information in a structured way. A knowledge graph is a representation of real-world entities and the relationships between them, while a scene graph represents the objects and their relationships in an image or video. The process of implementing a knowledge graph or a scene graph involves several steps, including data collection, preprocessing, entity and relationship extraction, construction of the graph, data population, querying, and maintenance. Knowledge graphs are currently used in a wide range of applications such as search engines, recommendation systems, natural language processing, information retrieval, business intelligence, and healthcare. They provide a holistic view of the data and help to extract meaningful insights. The scene graph is mostly used in computer vision tasks like image captioning.

I will post more on knowledge graphs in the coming articles. Would love to hear your feedback, Stay tuned for more :)

References

Knowledge Graphs:

“Knowledge Graphs: An Overview” by Leo Sauermann and Richard Cyganiak, Journal of Web Semantics, 2016.
“Knowledge Graphs: The Future of Search” by Pranav Bhasin, SEMrush, 2017.
“Building and Using Knowledge Graphs” by Tom Mitchell, Communications of the ACM, 2015.
“Knowledge Graphs: The Next Frontier for Search and Personalization” by Rajat Shuvro, Medium, 2018.
https://web.stanford.edu/~vinayc/kg/notes/How_To_Create_A_Knowledge_Graph_From_Text.html

Scene Graphs:

“Scene Graph Generation by Iterative Message Passing” by Xiaolong Wang, David Fouhey, Alexander Gupta, Abhinav Gupta, ICCV, 2017.
“Scene Graphs to the Rescue” by Andrew Zisserman, CVPR, 2018.
“Scene Graphs in Computer Vision: From Objects to Relations” by Sadeep Jayasumana, Tinghui Zhou, Roozbeh Mottaghi, Alexei A. Efros, CVPR, 2018.
“Scene Graph Generation from Objects, Phrases and Attributes” by David Konopnicki, De-An Huang, and Alexander C. Berg, ECCV 2018.