Open Source Software

My open source code is all visible on my Github page but I collect a few highlights here along with their goals.

1. ocaml-vw

OCaml bindings to the Vowpal Wabbit machine learning system which is maybe most popular for its Contextual Bandit support but also includes more “standard” support for online classification and regression, Latent Dirichlet Allocation, matrix factorization and others.

2. word2phrase.py

Python port of T. Mikolov’s word2phrase algorithm originally proposed in his Distributed Representations of Words and Phrases and their Compositionality from NeurIPS 2013

3. ccard

Fast command line utility for the approximate counting of strings. Uses the Loglog-Beta algorithm for distinct values estimation component and Metrohash for hashing. Helpful if you work with lots of text data at the command line.

4. flajolet

OCaml library providing approximate/sketching data structures for things such as distinct values estimation (hyperloglog), approximate set membership (bloom filters), frequency estimation (count min sketch), top-k queries (aka approximate frequency tables). Named for French Computer Scientist Philippe Flajolet.

5. ocaml-h3

OCaml bindings to Uber’s H3 hierarchical spatial indexing system.