1. Connected Data London / 16 Nov 17 RDF vs Property Graph Is your Graph Semantic? Dr. Jesús Barrasa - Neo4j
2. Connected Data London / 16 Nov 17 Quick refresher on LPG & RDF
3. Connected Data London / 16 Nov 17 RDF Resource Description Framework W3C recommendation model for information exchange. Feb‘99
4. Connected Data London / 16 Nov 17 The Semantic Web “RDF is a key tech for developing the Semantic Web”. Article on SciAm in 2001 (1) https://www.scientificamerican.com/article/the-semantic-web/
5. Connected Data London / 16 Nov 17 Persisted RDF Specialized RDF stores (triple/quad stores) Semantic Graph Databases
6. Connected Data London / 16 Nov 17 LPG Sweden (2000 - 2007) efficient (graph native) storage fast query and traversal humane model: close to the way we humans understand and reason about the world
8. Connected Data London / 16 Nov 17 A look at the models
9. Connected Data London / 16 Nov 17 GRAPH = VERTICES + EDGES
10. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples)
11. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples) ppl://an n is_a _:Person
12. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples) ppl://an n @ann is_a user_ID _:Person
13. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples) ppl://an n @ann Ann Smith is_a name _:Person user_ID
14. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples) ppl://an n ppl://da n @ann Ann Smith is_a name _:Person likes user_ID
15. Connected Data London / 16 Nov 17 ppl://ann is a person ppl//ann user ID is @ann ppl://ann name is Ann Smith ppl://dan likes ppl://ann RDF statements (triples) ppl://an n ppl://da n @ann Ann Smith is_a name _:Person likes Vertices Edges user_ID
16. Connected Data London / 16 Nov 17 There is a person that is described by her name: Ann, her user ID: @ann and a globally unique identifier: <ppl://ann> There is another person with a unique identifier: <ppl://dan> Dan likes Ann LPG connected objects (with properties)
17. Connected Data London / 16 Nov 17 There is a person that is described by her name: Ann, her user ID: @ann and a globally unique identifier: <ppl://ann> There is another person with a unique identifier: <ppl://dan> Dan likes Ann LPG connected objects (with properties) { name: Ann, app_user_ID: @ann, uri: ppl://ann } :Person
18. Connected Data London / 16 Nov 17 There is a person that is described by her name: Ann, her user ID: @ann and a globally unique identifier: <ppl://ann> There is another person with a unique identifier: <ppl://dan> Dan likes Ann LPG connected objects (with properties) { uri: ppl://dan} { name: Ann, app_user_ID: @ann, uri: ppl://ann } :Person:Person
19. Connected Data London / 16 Nov 17 There is a person that is described by her name: Ann, her user ID: @ann and a globally unique identifier: <ppl://ann> There is another person with a unique identifier: <ppl://dan> Dan likes Ann LPG connected objects (with properties) LIKES { date: 02/03/17} { uri: ppl://dan} { name: Ann, app_user_ID: @ann, uri: ppl://ann } :Person:Person
20. Connected Data London / 16 Nov 17 There is a person that is described by her name: Ann, her user ID: @ann and a globally unique identifier: <ppl://ann> There is another person with a unique identifier: <ppl://dan> Dan likes Ann LPG connected objects (with properties) LIKES { date: 02/03/17} { uri: ppl://dan} { name: Ann, app_user_ID: @ann, uri: ppl://ann } Vertices Edges :Person:Person
21. Connected Data London / 16 Nov 17 RDF Graph Vertices Every statement produces two vertices in the graph. Some are uniquely identified by URIs: Resources Some are property values: Literals Edges Every statement produces an edge. Uniquely identified by URIs Vertices or Edges have NO internal structure
22. Connected Data London / 16 Nov 17 RDF Graph Vertices Every statement produces two vertices in the graph. Some are uniquely identified by URIs: Resources Some are property values: Literals Edges Every statement produces an edge. Uniquely identified by URIs Vertices or Edges have NO internal structure Property Graph Vertices Unique Id + set of key-value pairs Edges Unique Id + set of key-value pairs Vertices and Edges have internal structure
23. Connected Data London / 16 Nov 17 The query languages
24. Connected Data London / 16 Nov 17 Query: Who likes this person named Ann? Cypher MATCH (who)-[:LIKES]->(a:Person) WHERE a.name CONTAINS ‘Ann’ RETURN who SPARQL prefix ms: <http://myschma.me/> prefix rdf: <http://www[...]#> SELECT ?who { ?a rdf:type ms:Person . ?a ms:name ?asName . FILTER regex(?asName,’Ann’) ?who ms:likes ?a . }
25. Connected Data London / 16 Nov 17 End of refresher
26. Connected Data London / 16 Nov 17 Modelling differences
27. Connected Data London / 16 Nov 17 #1 RDF does not uniquely identify instances of relationships: No two relationships of the same type between a pair of nodes Dan cannot like Ann three times. Just once :(
28. Connected Data London / 16 Nov 17 RDF Dan has liked Ann three times prefix sc: <http://schema.org/> INSERT DATA { <http://dan> sc:name "Dan" . <http://ann> sc:name "Ann" . <http://dan> sc:likes <http://ann> . <http://dan> sc:likes <http://ann> . <http://dan> sc:likes <http://ann> . } PREFIX sc: <http://schema.org/> SELECT (COUNT(?p) AS ?count) where { <http://dan> ?p <http://ann> FILTER(?p = sc:likes) } ╒════════╕ │COUNT(l)│ ╞════════╡ │1 │ └────────┘
29. Connected Data London / 16 Nov 17 LPG in Neo4j Dan has liked Ann three times CREATE (d {name: "Dan"})-[:LIKES]->(a {name: "Ann"}) CREATE (d)-[:LIKES]->(a) CREATE (d)-[:LIKES]->(a) MATCH (d {name: "Dan"})-[l:LIKES]->(a {name: "Ann"}) RETURN COUNT(l) ╒════════╕ │COUNT(l)│ ╞════════╡ │3 │ └────────┘ { name: Dan } { name: Ann }
30. Connected Data London / 16 Nov 17 #2 In RDF you cannot qualify individual relationships: no attributes on predicates ...
31. Connected Data London / 16 Nov 17 LPG in Neo4j Connection NYC-SFO is $300 and 4100Km CREATE ( {name: "NYC"})-[:CONNECTION { distanceKm : 4100, costUSD: 300}]->( {name: "SFO"}) MATCH ( {name: "NYC"})-[c:CONNECTION]->( {name: "SFO"}) RETURN c.costUSD, c.distanceKm ╒═════════╤════════════╕ │c.costUSD│c.distanceKm│ ╞═════════╪════════════╡ │300 │4100 │ └─────────┴────────────┘ { name: NYC } { name: SFO } { distanceKm: 4100, costUSD: 300 }
32. Connected Data London / 16 Nov 17 RDF Connection NYC-SFO is $300 and 4100Km prefix sc: <http://schema.org/> INSERT DATA { <http://nyc> sc:name "NYC" . <http://sfo> sc:name "SFO" . <http://nyc> sc:connection <http://sfo> . sc:connection sc:distanceKm 4100 }
33. Connected Data London / 16 Nov 17 Alternatives in RDF (?) Modeling workarounds (intermediate nodes) Reification (1) Singleton Property (2) (1) https://www.w3.org/TR/2004/REC-rdf-primer-20040210/#reification (2) http://dl.acm.org/citation.cfm?id=2567973
34. Connected Data London / 16 Nov 17 Modeling workaround Connection between NYC and SF: 300 USD / 4100 in Km. LPG (Neo4j) RDF
35. Connected Data London / 16 Nov 17 Reification https://www.w3.org/TR/2004/REC-rdf-primer-20040210/#reification
36. Connected Data London / 16 Nov 17 Singleton property (2) http://dl.acm.org/citation.cfm?id=2567973
37. Connected Data London / 16 Nov 17 #3 RDF can have multivalued properties, in the LPG you use Arrays. Arrays in LPG
38. Connected Data London / 16 Nov 17 Example The genre of this album is Jazz, or more precisely Orchestral Jazz. prefix schema: <http://schema.org/> INSERT DATA { <http://g.co/kg/m/0567wt> schema:name "Sketches of Spain" ; schema:genre "Jazz", "Orchestral Jazz" . } } CREATE (s:Album { name: "Sketches of Spain", genre: [ "Jazz","Orchestral Jazz" ] } ) { name: Sketches of Spain genre: [ Jazz, Orchestral Jazz ] } CypherSPARQL
39. Connected Data London / 16 Nov 17 #4 RDF uses quads for named graph definition. No equivalent in LPG (for now) INSERT DATA { GRAPH <http://graph1> { <http://dan> schema:likes <http://ann> } } GRAPH <http://graph2> { <http://ann> schema:likes <http://emma> } }
40. Connected Data London / 16 Nov 17 #1 RDF vs Graph DBs. Apples to apples #2 the semantics of semantics
41. Connected Data London / 16 Nov 17 #1 RDF vs Graph DBs. Apples to apples What are we comparing? Is it models or is it stores? “RDF vs Graph DBs…” RDF Stores Native Graph DB (Neo4j) ❏ Very strongly index based (some graph DBs are also strongly index based!! Graph as a feature) ❏ RDF used on mostly additive, typically slow changing if not immutable data sets ❏ Neo4j is Graph native ❏ Neo4j excels with highly dynamic datasets & transactional UC where data Integrity is key
42. Connected Data London / 16 Nov 17 #1 RDF vs Graph DBs. Apples to apples Separating the model from the storage ○ RDF does not impose any particular type of data storage ○ Micro-demo: A Turing test of RDFness(*) “RDF vs Graph DBs…” (*) https://jesusbarrasa.wordpress.com/2016/11/17/neo4j-is-your-rdf-store-part-1/
44. Connected Data London / 16 Nov 17 #1 Demo <RDF> =
45. Connected Data London / 16 Nov 17 #2 The semantics of semantics “suppose you entered the details ‘Philip owns a Mercedes’ where ‘Philip’ and ‘Mercedes’ are both entities and ‘owns’ is a relationship. An inference engine can deduce that Mercedes in this instance is a car whereas in ‘Juan is married to Mercedes’ it would deduce that Mercedes is a person [...] contrast this with the inability of a database to understand anything it isn’t explicitly told then you should be able to see the potential advantages”
46. Connected Data London / 16 Nov 17 #2 The semantics of semantics “suppose you entered the details ‘Philip owns a Mercedes’ where ‘Philip’ and ‘Mercedes’ are both entities and ‘owns’ is a relationship. An inference engine can deduce that Mercedes in this instance is a car whereas in ‘Juan is married to Mercedes’ it would deduce that Mercedes is a person [...] contrast this with the inability of a database to understand anything it isn’t explicitly told then you should be able to see the potential advantages” WRONG!!
47. Connected Data London / 16 Nov 17 #2 The semantics of semantics The semantics in RDF are (JUST) RULES!!! You can write these rules yourself or you can use ontologies (declarative definition using standard vocabularies like RDFS/OWL) ...but both are an optional layer on top of RDF. Watch out! Reasoning with ontology languages quickly gets intractable/undecidable Or did you mean vocabulary-based semantics?
48. Connected Data London / 16 Nov 17 #2 The semantics of semantics Yes, inference is expensive. When considering it, you should: 1) run it over as small a dataset as possible 2) use only the rules you need 3) consider alternatives to inference Thanks for the tip, vendor of “the world’s only Enterprise Triple Store” ;-)