Many thanks to Francis Avila (favila) and others.
The on-prem docs recommend to use idents to represent enums though this advice likely dates from before d/pull and attribute+entity predicates.
Ident
-
- They are entities ⇒ can attach extra info (as eg. with Scala enum values) and make that queryiable ⇒ strictly more powerful
-
d/pull, contrary to d/entity, returns them as entity IDs, not the keywords
-
you get a VAET index of them
-
because of the semantics of ident lookup you can rename them safely
Keyword pros
-
Can be checked with
:db.attr/preds
(ident is entity ID so cannot compare against a set of allowed keywords, would need :db/ensure) -
d/pull returns them as kwds
Writing queries where we want to check that a particular attribute value is in a given set or not.
(comment
;; Set filtering cannot be introspected by the query engine
;; This can be good if the set is large
;; and there's no index datomic could use
;; to retrieve matching datoms.
;; Evaluation cannot be parallel,
;; but the intermediate result set will be smaller
;; and none of the unification machinery will get involved.
;; As a literal:
[:find ?e
:where
[?e :person/favourite-colour ?val]
[(#{:blue :green} ?val)]]
;; As a parameter:
[:find ?e
:in $ ?allowed-val-set
:where
[?e :person/favourite-colour ?val]
[(contains? ?allowed-val-set ?val)]]
#{:green :blue}
;; Using unification
;; If you bind the items you are filtering by to a var
;; datalog will perform filtering implicitly via unification.
;; This is good if your filter value is indexed,
;; because now the query planner can see it
;; and possibly use better indexes or parallelize IO.
;; However, this may produce larger intermediate result sets
;; and consume more memory because of unification.
[:find ?e
:where
;; Could use an index
[(ground [:green :blue]) [?val ...]]
[?e :person/favourite-colour ?val]
]
[:find ?e
:where
;; Reverse clause order:
;; Now it *probably doesn't* use an index?
;; Depends on how smart the planner is.
;; Worst-case, it's as bad as a linear exhaustive
;; equality check of each val
;; which may or may not be worse than a hash-lookup
;; depending on the size of the set.
[?e :person/favourite-colour ?val]
[(ground [:green :blue]) [?val ...]]]
;; As a parameter:
[:find ?e
:in $ [?val ...]
:where
[?e :person/favourite-colour ?val]]
[:green :blue]
;; Use a rule with literals
;; In most cases this will be the same as the previous approach,
;; but without the "maybe"s because you don't need to trust the query planner.
;; This is the most explicit and predictable,
;; and definitely parallelizeable (rules inherently are).
;; But you *must* use literal values.
[:find ?e
:in $
:where
(or [?e :person/favourite-colour :green]
[?e :person/favourite-colour :blue])]
;; In any given case I would benchmark all three.
)
Summary: There’s three different basic techniques, and they can have dramatically different perf depending on the situation
Note
|
The Datomic Cloud docs seem better, e.g. things are interlinked etc. |
The documentation is sometimes quite terse and some things are hard (impossible?) to find.
-
According to my tests, naming an attribute with
at the beginning will cause problems (because it is used for reverse lookup) and should not be used. I would expect it mentioned at the
:db/ident section of attributes
. I was also unable to find the docs to confirm my vague memory of `<ns>/
<attr. name>
being used for reverse lookups (it is under Pull, not in the query reference, which makes sense if I think about it. Also using the docs' Search for 'reverse' helps so perhaps this is just my bad searching skills than the docs' fult). -
I would appreciate a guide for people that know SQL, explaining how to translate its constructs to Datalog. Mongo has a nice example. Especially I struggled to find how to implement SQL’s
WHERE <column> IN (val1, val2, …)
(it turns out you can use a set as a function,ground
, oror
- all with unique pros and cons. A page with FAQ of "How to do X in Datalog/Datomic…" would be very useful. -
I would appreciate "best practices" / guidance to select the most appropriate of multiple approaches, such as for the
where .. in …
case *