DBpedia Explorer (WADe) — Scholarly HTML Technical Report

Abstract

This report describes DBpedia Explorer (WADe), a service-oriented Web application that provides interactive visual exploration of a large knowledge collection. The system is built as a Single Page Application (SPA) plus a FastAPI backend that translates user tasks into SPARQL queries executed against the public DBpedia endpoint. The client renders search, category navigation (broader/narrower), entity inspection, and multiple extensions (facets, category map, related entities). The UI is enriched with RDFa and JSON-LD, and each resource of interest is shareable via URL and QR code.

1. Introduction

Large knowledge collections such as DBpedia provide rich RDF data but require expertise to query and interpret. This project bridges that gap by offering a human-friendly SPA that exposes core exploration tasks while preserving machine-readable semantics.

The main goals are: (1) enable useful visual navigation over DBpedia categories and entities, (2) provide modular extensions demonstrating non-trivial operations, and (3) follow Web engineering practices (service-oriented architecture, documentation, reproducible deployment).

2. Requirements & Deliverables Mapping

WADe requirement	How the project satisfies it
Modular Web tool (SPA or extension) with visualizations	React SPA with multiple UI faces and 3+ extensions (facets, category map, related entities).
Knowledge model expressed in SKOS/OWL, accessed via SPARQL	DBpedia categories treated as SKOS-like concepts; broader/narrower navigation; SPARQL queries against DBpedia endpoint.
At least 3 extensions (filtering, layered visualization, trend/comparison, etc.)	Type facets (intelligent filtering), Category map (layered local graph), Related entities (comparison/similarity via shared categories).
Service-oriented architecture + design docs	SPA + FastAPI + DBpedia endpoint, documented in Section 4 (Falsehoods, UI Faces, C4, Design Docs).
Client representations include HTML5 + schema.org + RDFa	Pages include RDFa attributes and JSON-LD injection (schema.org) for category/entity pages.
Each resource exposed by URL and/or QR code	ShareBar component provides copyable URL + QR code for category/entity pages.
Fully implemented and deployed solution	Dockerized containers deployed on Google Compute Engine with Docker Compose.

3. Data Sources & Knowledge Model

3.1 External data source: DBpedia

The application uses the public DBpedia SPARQL endpoint as a read-only source of RDF data. Core interactions are implemented as SPARQL queries.

3.2 Knowledge model reuse (SKOS/OWL/vocabularies)

DBpedia categories are used as knowledge organization units. Category pages are presented as SKOS-like concepts, and relationships are exposed using RDFa: broader categories and narrower categories correspond to hierarchical navigation.

Note: The project intentionally reuses an existing public knowledge base (DBpedia) rather than creating a new taxonomy from scratch.

4. Architecture & Design (Falsehood • UI Faces • C4 • Design Docs)

4.1 Falsehoods

Data completeness: DBpedia resources may miss labels, abstracts, or rdf:type assertions.
Endpoint performance: SPARQL response times vary; timeouts and transient failures occur.
Model heterogeneity: Entities in the same category can have diverse types and properties.
Result stability: DBpedia evolves; the same query may return different results over time.

Mitigations include defensive UI rendering, strict pagination, limited result sets, and tolerant handling of empty/partial data.

4.2 UI Faces

Search page: query entities/categories, show totals and pagination.
Category page: SKOS-like navigation (broader/narrower), sample entities, category-focused extensions.
Entity page: abstract, categories, rdf:types, entity-focused extensions (related entities).

4.3 C4 Model

C1 — System Context

Users interact with the SPA in a browser. The SPA communicates with the backend API. The backend queries the DBpedia SPARQL endpoint.

C2 — Container Diagram

React SPA: routing, rendering, extensions, RDFa/JSON-LD, QR sharing.
FastAPI backend: REST API, SPARQL query generation, pagination and response shaping.
DBpedia endpoint: external knowledge base providing RDF via SPARQL.

Figure 1 — C4 (Level 1) System Context. End users interact with the SPA, which calls the backend API; the backend queries the external DBpedia SPARQL endpoint.

Figure 2 — C4 (Level 2) Container Diagram. Main runtime containers and protocols.

Figure 3 — Request/Data Flow (Search). End-to-end steps from user action to rendered results.

4.4 Design Docs

Extension system

The SPA supports a slot-based extension registry. Extensions declare an identifier, a target slot, and a render function. This enables modular functionality such as facets, category map, and related entities.

Input/output data formats

Client ↔ API: JSON over HTTP (REST endpoints).
API ↔ DBpedia: SPARQL queries, JSON results (application/sparql-results+json).
Client annotations: RDFa in HTML5 + JSON-LD (schema.org).

Data/task flow example (Search)

User enters query and submits.
SPA sends GET /api/search with q/kind/limit/offset.
Backend constructs SPARQL query and sends it to DBpedia.
Backend transforms bindings into compact JSON results.
SPA renders results and pagination.

5. API Design (REST)

The backend exposes a REST API used by the SPA. Endpoints are stateless and return JSON. (If you include OpenAPI examples later, link screenshots or paste representative fragments.)

5.1 Endpoint overview

Endpoint	Purpose	Notes
GET /health	Health check	Used for deployment verification
GET /api/search	Search entities/categories	Supports kind, limit, offset; returns totals
GET /api/category/{id}	Category details	Broader/narrower + entity count
GET /api/category/{id}/entities	Entities in category	Paginated list
GET /api/category/{id}/facets/types	Type facets	Top rdf:types (best-effort)
GET /api/entity/{id}	Entity details	Label, abstract, types, categories
GET /api/entity/{id}/related	Related entities	Similarity via shared categories (heuristic)

5.2 Input/output examples

Search request: /api/search?q=physics&kind=all&limit=20&offset=0

Search response (example):

{
  "query": "physics",
  "kind": "all",
  "limit": 20,
  "offset": 0,
  "count": 20,
  "total": 161,
  "entityTotal": 90,
  "categoryTotal": 71,
  "results": [
    { "uri": "http://dbpedia.org/resource/Category:Physics", "label": "Physics", "kind": "category" }
  ]
}

6. SPARQL Queries of Interest

The backend composes SPARQL queries to implement search, navigation, and extensions. Below are representative examples.

6.1 Search (entities/categories)

PREFIX rdfs: 
PREFIX dct:  

# NOTE: pseudocode — the backend uses safe formatting + limits
SELECT ?uri (SAMPLE(?label) AS ?label)
WHERE {
  # ... entity or category patterns ...
  ?uri rdfs:label ?label .
  FILTER(lang(?label) = "en")
  FILTER(CONTAINS(LCASE(STR(?label)), LCASE("physics")))
}
GROUP BY ?uri
LIMIT 20
OFFSET 0

6.2 Category membership (dct:subject)

PREFIX dct: 

SELECT ?entity
WHERE {
  ?entity dct:subject  .
}
LIMIT 25

6.3 Type facets (rdf:type counts)

PREFIX dct: 
PREFIX rdf: 

SELECT ?type (COUNT(DISTINCT ?e) AS ?count)
WHERE {
  ?e dct:subject  .
  ?e rdf:type ?type .
}
GROUP BY ?type
ORDER BY DESC(?count)
LIMIT 15

Pragmatic note: Some DBpedia categories yield sparse or noisy results due to data modeling; the UI and extensions treat these as best-effort.

7. Extensions (3+)

7.1 Extension 1 — Type Facets (Intelligent filtering)

Displays the most common rdf:type values among entities in the selected category. Users can select types and apply a filter. This demonstrates intelligent filtering driven by SPARQL aggregation.

7.2 Extension 2 — Category Map (Layered visualization)

Visualizes a local neighborhood of the current category (broader and narrower links), enabling quick contextual navigation. This demonstrates a layered visualization for hierarchical knowledge structures.

7.3 Extension 3 — Related Entities (Comparison/similarity)

Suggests entities related to the current entity by computing overlap of categories. This provides a lightweight similarity function for exploration.

8. Use of DBpedia SPARQL, SKOS, and OWL

This project relies on the public DBpedia SPARQL endpoint as its primary knowledge source and reuses existing Semantic Web vocabularies rather than defining a new ontology. The goal is to demonstrate pragmatic reuse of established knowledge models in a real Web application.

8.1 DBpedia SPARQL endpoint

All knowledge presented in the application is retrieved dynamically from DBpedia via SPARQL queries. The backend acts as a controlled intermediary that:

translates user-driven tasks into SPARQL queries,
applies limits and pagination to protect the public endpoint,
converts SPARQL JSON bindings into compact JSON structures for the SPA.

Typical SPARQL patterns used include:

dct:subject for category membership,
rdfs:label for human-readable labels,
rdf:type for type-based aggregation and filtering.

Official documentation:

8.2 Use of SKOS for category navigation

DBpedia categories are treated as concept-like resources and exposed in the UI following SKOS (Simple Knowledge Organization System) principles.

Although DBpedia categories are not explicitly typed as skos:Concept in all cases, they naturally form a hierarchical knowledge organization that aligns well with SKOS semantics.

In the application:

category pages are rendered as SKOS-like concepts,
broader and narrower relationships are visualized as hierarchical navigation,
RDFa annotations use skos:broader and skos:narrower to express structure.

This enables both human-friendly exploration and machine-readable interpretation of category hierarchies.

Official SKOS documentation:

W3C SKOS Reference

8.3 Use of OWL and RDF types

OWL and RDF are used primarily to expose and aggregate semantic typing information for entities. DBpedia entities are associated with multiple rdf:type assertions, typically drawn from the DBpedia Ontology (dbo: namespace).

In the application, rdf:type information is used to:

display semantic types on entity pages,
compute type-based facets for categories,
support filtering and comparison operations.

While full OWL reasoning is outside the scope of the project, the reuse of OWL-based vocabularies demonstrates how lightweight semantic typing can enhance exploratory interfaces.

Official documentation:

9. Linked Data Principles

Use URIs as names for things: DBpedia resource URIs are exposed and clickable.
Use HTTP URIs: All entities and categories use HTTP URIs.
Provide useful information via standards: Data is retrieved via SPARQL; pages include RDFa and JSON-LD.
Include links to other URIs: Category pages link to broader/narrower categories and entities; entity pages link to categories.

10. Deployment

The system is deployed as Docker containers (SPA + API) orchestrated with Docker Compose on Google Compute Engine. Images are published to GitHub Container Registry (GHCR) and pulled by the VM.

10.1 Deployment steps (high level)

Build and push images to GHCR (private).
Provision a GCE VM and install Docker + Compose.
Authenticate the VM to GHCR using a token.
Run docker compose pull and docker compose up -d.

Verify deployment via /health and by interacting with the SPA.

11. User Guide & Case Studies

This section provides a user-oriented guide to the DBpedia Explorer (WADe) application. The goal is to demonstrate how end users can explore, understand, and compare large knowledge collections without prior expertise in Semantic Web technologies. The guide is structured around three concrete case studies, each corresponding to a common exploration task.

Case Study 1 — Exploratory Search and Result Navigation

User goal. Obtain an overview of entities and categories related to a general topic and navigate large result sets efficiently.

Context. Large knowledge bases such as DBpedia contain thousands of resources for common topics. Presenting all results at once would overwhelm users. The search interface therefore focuses on progressive disclosure, pagination, and clear visual differentiation between result types.

Steps.

The user opens the landing page of the application.
The user enters a query (e.g., physics) and explicitly initiates the search.
The system displays a limited number of results per page, visually distinguishing entities from categories.
The user navigates between pages to explore additional results.

Search results for the query "physics", showing both categories and entities, along with pagination controls.

Outcome. The user gains an immediate high-level understanding of the knowledge space related to the query and can decide which results are worth deeper exploration. Pagination and result counts support orientation and prevent cognitive overload.

Case Study 2 — Category Exploration with Hierarchy and Faceted Filtering

User goal. Explore the internal structure of a knowledge domain and narrow down large result sets using meaningful semantic criteria.

Context. Categories in DBpedia act as organizational hubs. However, they may contain thousands of heterogeneous entities. This case study demonstrates how hierarchical navigation and faceted filtering support sense-making.

Steps.

The user opens a category page from the search results (e.g., a physics-related category).
The system presents broader and narrower categories, enabling hierarchical navigation.
The user inspects the Type Facets extension to see the most common rdf:type values among entities.
The user selects one or more types and applies the filter to update the entity list.
The user optionally uses the Category Map extension to visualize the local category neighborhood.

Category page showing hierarchical navigation (broader/narrower) and the Type Facets extension used for intelligent filtering.

Outcome. The user can progressively refine the explored knowledge domain, moving from a broad category to a focused subset of entities. Faceted filtering and layered visualization reduce complexity while preserving semantic meaning.

Case Study 3 — Entity Inspection and Discovery of Related Knowledge

User goal. Understand an individual entity in context and discover related entities based on shared semantic characteristics.

Context. Individual entities in DBpedia often have rich semantic links to categories and other entities. Presenting this information in a structured and visual manner supports comparison and deeper understanding.

Steps.

The user opens an entity page from a category or search result.
The system displays the entity’s label, abstract description, categories, and rdf:type assertions.
The user inspects the Related Entities extension to discover semantically similar entities.
The user shares the entity page using the generated URL or QR code.

Entity page presenting descriptive information, semantic categories, and rdf:type assertions.

Outcome. The user develops a contextual understanding of the entity and can easily transition to related knowledge. Semantic relationships are exposed in a human-readable form, while sharing features support reuse beyond the application.

12. Conclusion

DBpedia Explorer (WADe) demonstrates how a modular SPA can expose Semantic Web datasets through human-friendly exploration and reusable extensions. The system is service-oriented, uses SPARQL and RDF-based knowledge models, provides machine-readable annotations, and is deployed using modern Web engineering practices.