Fork me on GitHub
#rdf
<
2020-09-23
>
simongray08:09:13

Hey guys. I’m evaluating Apache Jena (through the Aristotle library) and Neo4j (using the neosemantics RDF plugin) as triplestores for an RDF dataset (wordnets). Ideally, I would have used something even more Datomic-like, but I think that requires investing a significant amount of time writing some RDF boilerplate. I think Jena/aristotle is interesting, but I’m enticed by neo4j offering a bunch of more graph-based visualisations. However, Neo4j locks you out of some random features if you don’t pay for the enterprise edition, removing features from the community edition over time, which is making me worried that I am backing the wrong horse. Another gripe I have with Neo4j is that it seems to run out of memory and crash for certain queries. Do you any experience with either?

rickmoynihan22:09:35

Jena or RDF4j are both safe bets, in that they’re active and will be around for a long time to come. All graph databases in my experience are pretty memory hungry, and some queries will inevitably risk that. I can’t say much about visualisation features for graph data, other than that I’ve rarely found them useful in practice. graphviz layout algorithms become too big to big to be useful, and force directed graph visualisations rarely seem useful to me.

rickmoynihan22:09:35

I think genuinely useful visualisations tend to be heavily curated.

rickmoynihan22:09:20

But I guess it depends on what you’re doing.

simongray11:09:20

It’s really all about delivering features in very little development time and keeping the system relatively open for potential future development efforts. The datasets are a couple of wordnets (that will eventually be linked) and the users of the system are academics with soft technical skills. Since this needs to be done purely within the limited budget of a single academic project, any kind of generic user interface will be appreciated — in addition to whatever specialised interface I actually have time to develop. Unfortunately, I have bunch of different requirements to live up to - including periodic synchronisation with an ancient SQL database - and only a few months to do it.

EmmanuelOga01:10:40

for visualization there's a Gephi SPARQL plugin: https://github.com/gephi/gephi/wiki/SemanticWebImport

EmmanuelOga01:10:23

if you already have RDF as source, you would just have to write some SPARQL queries

EmmanuelOga01:10:02

I've been playing with RDF4J quite a bit, it has a pretty nice API. Here's a breadth first search I implemented recently:

fun bfs(start: Resource, maxDepth: Int = 5, statementLimit: Int = 1024) =
        LinkedHashModel().also {
            sparqlDb!!.connection.use { conn ->
                val queue = mutableListOf<Pair<Resource, Int>>(Pair(start, 0))
                val seen = mutableMapOf<Resource, Boolean>()

                while (queue.isNotEmpty() && it.size < statementLimit) {
                    val (next, depth) = queue.removeFirst()

                    if (!seen.containsKey(next)) {
                        for (st in conn.getStatements(next, null, null)) {
                            st.`object`.let { r ->
                                if (r is Resource && depth < maxDepth) queue.add(Pair(r, depth + 1))
                            }
                            it.add(st)
                        }
                    }
                    seen[next] = true
                }
            }
        } 

EmmanuelOga01:10:18

Kotlin, but you get the idea

EmmanuelOga01:10:26

I also been eyeing https://github.com/jgrapht/jgrapht. It has a collection of graph algorithms, visualizations, serialization to graphviz and other formats that you could load into gephi, etc.