Notes from playing with Datomic

Many thanks to Francis Avila (favila) and others.

Enums: idents vs. type/keyword

The on-prem docs recommend to use idents to represent enums though this advice likely dates from before d/pull and attribute+entity predicates.

Ident

- They are entities ⇒ can attach extra info (as eg. with Scala enum values) and make that queryiable ⇒ strictly more powerful
d/pull, contrary to d/entity, returns them as entity IDs, not the keywords
you get a VAET index of them
because of the semantics of ident lookup you can rename them safely

Keyword pros

Can be checked with :db.attr/preds (ident is entity ID so cannot compare against a set of allowed keywords, would need :db/ensure)
d/pull returns them as kwds

Where <val> in <set>

Writing queries where we want to check that a particular attribute value is in a given set or not.

(comment
 ;; Set filtering cannot be introspected by the query engine
 ;; This can be good if the set is large
 ;; and there's no index datomic could use
 ;; to retrieve matching datoms.
 ;; Evaluation cannot be parallel,
 ;; but the intermediate result set will be smaller
 ;; and none of the unification machinery will get involved.

 ;; As a literal:
 [:find ?e
   :where
   [?e :person/favourite-colour ?val]
   [(#{:blue :green} ?val)]]

 ;; As a parameter:
 [:find ?e
   :in $ ?allowed-val-set
   :where
   [?e :person/favourite-colour ?val]
   [(contains? ?allowed-val-set ?val)]]
 #{:green :blue}

 ;; Using unification
 ;; If you bind the items you are filtering by to a var
 ;; datalog will perform filtering implicitly via unification.
 ;; This is good if your filter value is indexed,
 ;; because now the query planner can see it
 ;; and possibly use better indexes or parallelize IO.
 ;; However, this may produce larger intermediate result sets
 ;; and consume more memory because of unification.

 [:find ?e
   :where
  ;; Could use an index
  [(ground [:green :blue]) [?val ...]]
  [?e :person/favourite-colour ?val]
  ]

 [:find ?e
  :where
  ;; Reverse clause order:
  ;; Now it *probably doesn't* use an index?
  ;; Depends on how smart the planner is.
  ;; Worst-case, it's as bad as a linear exhaustive
  ;; equality check of each val
  ;; which may or may not be worse than a hash-lookup
  ;; depending on the size of the set.
  [?e :person/favourite-colour ?val]
  [(ground [:green :blue]) [?val ...]]]

 ;; As a parameter:
 [:find ?e
   :in $ [?val ...]
   :where
   [?e :person/favourite-colour ?val]]
 [:green :blue]

 ;; Use a rule with literals
 ;; In most cases this will be the same as the previous approach,
 ;; but without the "maybe"s because you don't need to trust the query planner.
 ;; This is the most explicit and predictable,
 ;; and definitely parallelizeable (rules inherently are).
 ;; But you *must* use literal values.
 [:find ?e
  :in $
  :where
  (or [?e :person/favourite-colour :green]
      [?e :person/favourite-colour :blue])]


;; In any given case I would benchmark all three.

 )

Summary: There’s three different basic techniques, and they can have dramatically different perf depending on the situation

Notes on Datomic on-prem docs

Note	The Datomic Cloud docs seem better, e.g. things are interlinked etc.

The documentation is sometimes quite terse and some things are hard (impossible?) to find.

According to my tests, naming an attribute with at the beginning will cause problems (because it is used for reverse lookup) and should not be used. I would expect it mentioned at the :db/ident section of attributes. I was also unable to find the docs to confirm my vague memory of `<ns>/<attr. name> being used for reverse lookups (it is under Pull, not in the query reference, which makes sense if I think about it. Also using the docs' Search for 'reverse' helps so perhaps this is just my bad searching skills than the docs' fult).
I would appreciate a guide for people that know SQL, explaining how to translate its constructs to Datalog. Mongo has a nice example. Especially I struggled to find how to implement SQL’s WHERE <column> IN (val1, val2, …) (it turns out you can use a set as a function, ground, or or - all with unique pros and cons. A page with FAQ of "How to do X in Datalog/Datomic…" would be very useful.
I would appreciate "best practices" / guidance to select the most appropriate of multiple approaches, such as for the where .. in … case *

holyjak/datomic-field-notes.adoc

Notes from playing with Datomic

Enums: idents vs. type/keyword

Where <val> in <set>

Notes on Datomic on-prem docs