Fork me on GitHub
#rdf
<
2022-12-02
>
Takis_02:12:04

Hello, as far as i understand RDF doesn't use object internal structure, so there is no schema,and modelling is based on vocabulary like english words. It seems natural and simple but why its not widely used? Its the many join perfomance cost? Or RDF adds complexity making modelling and quering hard?

quoll05:12:43

There's a long history, and many reasons šŸ™‚

quoll05:12:01

To start with, when it was first released, the only formal serialization of RDF was RDF/XML. Consequently, a lot of people thought it was just a type of XML, and a complex one at that. They'd walk away from it before they understood it

ā˜ļø 2
quoll05:12:56

n3 was around, but it wasn't an official spec. Eventually Turtle came along, which addressed some of those problems, but by then, a lot of people were already against RDF

quoll05:12:14

Since then, what I've observed is that many people have never been exposed to graph models, and don't follow how to use graph query languages (it doesn't help that Facebook/Meta released "GraphQL" which is not a general purpose graph query language)

quoll05:12:29

Not being exposed to it, people aren't interested in learning it. They're very resistant. I encounter a lot of this. Why? Well, it's just because they haven't heard of it, and it doesn't conform to what they know, and why invest all that time and effort into something that doesn't have a lot of market penetration?

quoll05:12:56

They also can't see the immediate benefits. I have been mapping my organization's RDBMS data into a graph, and after taking initial steps, I sat down with the data architect of the original system. I showed him everything I'd done. His response was, "What does this offer that we don't already do?" I tried to explain that it was simpler, and the graph database was optimized for walking across the data, but he didn't see any benefits at all, and dismissed me. A few weeks later, I was working with a group who were talking about how they could cache transitive closure data that comes out of the database, and how best to do it. I was confused and asked them why they would not just query for it. They challenged me, "Well how long would that take?" I felt a bit defensive, and said that it was fast. (My experience on my notebook had been around 800ms, but if I asked for the entire dataset, then I'd seen it take up to 2 seconds). "How fast is fast though?" "Ummmm, well, it's taken up to..." "Is it less than an hour?" "A... AN HOUR?!??" "Yes, is it less than that? The existing system takes longer." "Ummm, it's usually less than a second"

ā¤ļø 2
quoll05:12:45

So even though the benefit here was HUGE, people just weren't looking for it, and didn't understand it.

Bart Kleijngeld10:12:11

I recognize a lot in this story. Our senior data architect chose RDF for modeling our conceptual information models because it's easy to merge and split distributed models across the company, and crucial to be able to rely on the globally identifier IRIs for resources. But like @takis_ says, it's also much more simple. When modeling data it's a breeze to be able to operate in an Open World Assumption, and to simply state what I want to model, such as subproperties, or inverse relations. Also, it's easy to append facts that extend existing standards or models without changing anything to existing models. I'm completely sold, but many people only see challenges fail to see many of the benefits. Now to be fair, it can be quite an investment: finding people, changing technology stacks and processes. But I believe that the problems you'll solve will ultimately make up for that.

Takis_11:12:06

thank you for information and experience, i will try it and see, if its simple and useful sooner or later will be adopted i think

Mattias12:12:18

Side note - a schema and SQL is a bit like strict types, itā€™s a guard rail that many feel uncomfortable without. I think trying it out for a while is the only way, but many never take the time.

šŸ‘ 1
rickmoynihan12:12:40

@takis_: I agree with the above answers; RDF/XML in particular has a lot to answer for! :rolling_on_the_floor_laughing: However, I also have another set of reasonsā€¦ The main one is that I think there are significant differences between RDF, the semantic web, and linked data. All are of course highly related ideas; but theyā€™re also separate, independent ideas, however most people, even practitioners in the field think theyā€™re the same thing ā€” i.e. that theyā€™re all just synonyms for the same thing; and theyā€™re really not. Basically in the early days, you had Timbl proselytising a vision for data on the web. Data on the web is then naturally a graph; so you have graph data, and URIā€™s are identifiers etcā€¦ so you basically get RDF; and if you put it on the web at the right places such that it dereferences you get linked data. There was also a lot of hand-wavy talk of agents using that dataā€¦ e.g. stuff like software agents booking you a holidayā€¦ Then throw into the mix that you had the convergence of academics who had been doing research into description logics and KR coming out of the AI winter, looking for a new bandwagon to tie their interests too. I believe they also wanted a common DL that they could all use, so every research group didnā€™t need to invent their own one for their research projects. This effort became OWL, and they tied it and collectively sold it as the semantic web vision; and Timbl etc bought into it as it gave them some formal rigour, and a wider community etc. However that community of early adopters were heavily on the academic side; and academics are interested in papers; not practical ā€œtrivial problemsā€ (which often turn out to be significant engineering challenges). Anyway, the Semantic Web had a lot of hype from this group; but it was heavily based on logic programmingā€¦ Now logic programming is very cool; but itā€™s not suited to every problem; for example see the failure of Japanā€™s 5th generation computing project: https://en.wikipedia.org/wiki/Fifth_Generation_Computer_Systems ), the semantic web had many of these same problems, but also included distributed systems problems etc, so they made a hard problem even harder šŸ™‚ Anyway in my view the problem was that the ā€œSemantic Webā€ vision was never entirely realisticā€¦ itā€™s simply not suited to everyone; in the same sort of way the web is. So basically it under delivered on the promises; and much of the work was overly academic and complicated. Also in the standardisation processes back then there was a lot of design by committee without having proven parts of the value proposition. However weā€™re still left with lots of really good stuff ā€” but sadly because the technologies and their pros and cons were heavily conflated, people donā€™t see the good bitsā€¦ for example a lot of the value in RDF is in having an extensible, accretive data model. That is arguably a much bigger benefit for far more people than OWL will ever be; but very few people talk about it, so it gets ignored. IMHO thereā€™s a huge amount of unexploited low hanging fruit in the worlds of RDF and Linked Data ā€” it just needs packaged and marketed appropriately.

šŸ‘ 2
1
rickmoynihan12:12:57

There are other dimensions to this tooā€¦ for example triple stores have different performance properties to relational databases; and there are less options available, so you might not find one that fits your needs precisely, even though such a thing could exist.

Takis_14:12:46

thank you rick : ) i like all databases actually, sometime i will try graph databases also

Takis_15:12:23

i think important is to see the programming way, for example tables you program in relational way, tables with each column 1 value, documents you have arrays and hash-maps so you program in functional way more, in graphs you program in logic programming way. I think deciding the programming way you like matters, instead of thinking what is the best.

rickmoynihan16:12:06

I think thereā€™s obviously a lot of truth to this; itā€™s undeniably true that the data model massively effects how you put data in and out of a system and in turn how you model it; which has wider implications for what is relevant, what is simple, what is hard etc. Though Iā€™m not sure Iā€™d characterise the differences in terms of programming as you have.

Mattias15:12:29

@rickmoynihan Thanks for a fantastic history briefing! Much appreciated šŸ˜€

rickmoynihan16:12:02

Youā€™re welcome. Itā€™s a very rough approximation of the history, as I understand it. Itā€™s certainly not entirely accurate ā€” but I think it does offer an explanation as to the current state, and why things arenā€™t more widely adopted. Another perspective to view it from is as a continuation of two earlier AI communitiesā€¦ The neats and scruffies: https://en.wikipedia.org/wiki/Neats_and_scruffies

āœ… 1
respatialized15:12:17

I think we might be nearing a comeback for KR and some of the more ā€œold fashionedā€ AI stuff now that deep learning is hitting the limits of what it can accomplish without folding in other approaches, which may mean a similar resurgence in RDF, knowledge graphs, and the like. I think itā€™s useful to compare and contrast two projects that Metaā€™a AI team released recently for an illustration of why. The first is https://arstechnica.com/information-technology/2022/11/after-controversy-meta-pulls-demo-of-ai-model-that-writes-scientific-papers, a supposedly ā€œscientific AIā€ that was so adept at producing pseudoscience that they took it offline the week its public demo was released. Thatā€™s because it was just another https://dl.acm.org/doi/10.1145/3442188.3445922 doing probabilistic copy-and-paste, albeit one trained exclusively on scientific literature. It had no direct representation of truth, which to me is about as basic a requirement for science as you can possibly imagine. The second is https://arstechnica.com/information-technology/2022/11/meta-researchers-create-ai-that-masters-diplomacy-tricking-human-players/amp/, an AI able to compete with human players at Diplomacy, a strategy game that involves bluffing, subterfuge, and anticipating the plans and actions of other players - so not just modeling the truth but also what others believe to be the truth. According to Ernest Davis, the system required to make Cicero an effective player is substantially more complex than the ā€œjust throw more data and 60,000 TPUs at a specific neural network architectureā€ approach that has grabbed headlines recently. He https://garymarcus.substack.com/p/what-does-meta-ais-diplomacy-winning > Strikingly, and in opposition to much of the Zeitgeist, Cicero relies quite heavily on hand-crafting, both in the data sets, and in the architecture; in this sense it is in many ways more reminiscent of classical ā€œGood Old Fashioned AIā€ than deep learning systems that tend to be less structured, and less customized to particular problems. There is far more innateness here than we have typically seen in recent AI systems. I think graph data and RDF are much more relevant to the second approach than the first, so there may be a lot of unexplored territory in ā€œgood old fashioned AIā€ now that computers are substantially faster than when the AI winter took hold.

rickmoynihan16:12:24

@afoltzm: I certainly think thereā€™s a role to play for GOFAI and KR in the future of AI ā€” and I share your concerns around purely stochastic approaches. Infact arguably there are vanishingly few purely stochastic processes; for example an often neglected fact is the role of techniques like mini max in alpha goā€™s design. Alpha go didnā€™t learn minimax, it was engineered in by people. Then even if stochastic processes do win out; itā€™s hard to imagine them not essentially learning things like FOPL and set theory etc and then having to use it. Cicero is interesting; thanks for sharing Iā€™ve not seen it before. About 20 years ago I worked at a defunct startup working with multi-agent systems. So there was lots of GOFAI. We used a defeasible reasoning engine we (by which I mean our CTO) had developed, and had an agent communication language with semantics based on mutual beliefā€¦ and agents could then reason about what they believed other agents believed; and yes you could also get arbitrarily metaā€¦ reasoning about if they believe I believe X, but I believe Y then Z etc. It was all super cool stuff; but something perhaps better explored in academia rather than industry šŸ™‚

respatialized16:12:27

Indeed. I was also thinking about AlphaGo when writing my comment - specifically that it was designed to use Monte Carlo tree search (it certainly didn't "learn" search on its own!)

respatialized16:12:19

You should subscribe to Gary Marcus's Substack; he has a similar affinity for GOFAI and it's a useful way to keep up with research developments (without the breathless tech press hype) through his commentary.

šŸ‘ 1
quoll17:12:12

Defeasibly logic is interesting, but I think itā€™s too many steps ahead of where industry is. Like: ā€¢ Step 1 - low hanging fruit. e.g. just representing data in a graph can make it easier to navigate, expand upon, etc. ā€¢ Step 2 - description of the graphed data, in schema and ontology. This documents the structure, and can automate some of the connections. ā€¢ Step 3 - reasoning. This can be rules, which are easy to implement. Or they can be based on description logics. This is the promise of OWL. ā€¢ Step 4 - Advanced reasoning. Defeasible logic is here. Each step is a reasonable effort from the previous step, but from what Iā€™ve seen, the effort is totally worth it. What does NOT work, is trying to skip ahead from where an organization is. The technology might work, but I havenā€™t yet seen a benefit from bringing it into production in a business

quoll17:12:43

Iā€™ve seen organizations gaining benefit all the way out at step 3

rickmoynihan13:12:10

yeah I agree with all of that @U051N6TTC. I think thereā€™s even more value at the low hanging fruit stage though; for example property oriented thinking; using URIs (or just namespaced) global indentifiers; just having extensible data etcā€¦