I am interested in machine learning and its applications in different areas of computer science. Below you may find my peer-reviewed publications and projects I have worked on. My Google Scholar profile (although not as up-to-date as this page).

Predicting human olfactory perception from chemical features of odor molecules

In early 2015, we formed a team, Biolab Ljubljana, to enter a competition on predicting odor of molecules. Given 4000+ features providing information about the chemical structure of a molecule, the task was to predicts its intensity, pleasantness and 19 semantic odor categories ranging from garlic and fishy to spicy, and musky. Our team created a ensemble of different machine learning methods, including gradient-boosted trees, ridge regression and random forest. We achieved 3rd place, and the final aggregated model was close to the theoretical limits of prediction (compared to an individual’s test-retest internal variance). The report was published in Science, where you can find more information about the task.

Link to the full paper: Keller et al. – Predicting human olfactory perception from chemical features of odor molecules

Bayesian approximate methods for structured noise data

With Prof. Erik Strumbelj we looked at the problem of uncertainty in the beginning of a sports season for both basketball and football. Most models trained on previous seasons do not account for the large shifts in team strength before the start of a new season, and subsequently give erroneous overconfident predictions. Bayesian statistics, on the other hand, can handle this uncertainty much better.

We developed a model specifically for this kind of sports data, which is count data. Assuming the counts come from a Poisson distribution, we used a non-informative Gamma prior from which to draw samples for this count data for each game, where the likelihood term encompasses all previous games in the season. This means that earlier in the season the model relies more on the prior and is thus more uncertain, which led to better predictions in the beginning than models that do not account for this.

Workshop pre-print: Dimitriev, Štrumbelj – Approximate Bayesian Binary, Ordinal Regression with Structured Uncertainty in the Inputs .

Full-length draft: Dimitriev, Štrumbelj – Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs

Iterative unsupervised image segmentation

Building upon my diploma thesis with Prof. Matej Kristan, I improved and polished the algorithm and the experimental part. We first oversegment the image into several hundred superpixels and extract useful color and texture features. Our approach is an iterative one. Starting with as many labels as there are superpixels, we gradually reduce this number. We use SVMs, although any supervised classifier can be used, to learn how to distinguish superpixels that belong to one label from all the rest. If they can not be distinguished well enough, they are assigned the same label for the next iteration, and this continues until the algorithm converges and there are no changes. We also use a Markov random field over the superpixel labels to penalize neighboring superpixels that have different labels with penalty proportional to their similarity (so that dissimilar neighbors with different labels are not punished as much as similar ones). The journal version is the more recent one, which includes an extended framework and more approaches.

Full journal paper (preprint): Dimitriev, Kristan – A regularization-based approach for unsupervised image segmentation.

Blog post for the journal version.

Full conference paper: Dimitriev, Kristan – A regularization-based approach for unsupervised image segmentation.

Presented at ERK 2015.

Machine learning for gene microarray data

With Prof. Bosnić we looked at the performance of several machine learning algorithms and several feature selection methods on four gene expression microarray data sets. This data is characterized by much larger set of features than samples, e.g. 10-50 samples and 10000-50000 features. Most algorithms are not suited for such data, so we looked at their performance, as well as the usefulness of feature selection techniques for this data, across all four data sets.

Short paper: Dimitriev, Bosnić – Learning From Microarray Gene Expression Data.

Presented at IS 2015.

Visual object tracking challenge (VOT2014)

In 2014, my bachelor’s advisor (along with several others) organized a visual tracking workshop at ECCV 2014 where everyone would submit state-of-the-art trackers, which were compared with each other and the baseline (that they had to outperform). I used an improved version of the normalized cross-correlation tracker, which performed well for its simplicity.

Workshop paper: Kristan et al. – The Visual Object Tracking VOT2014 challenge results.

Autonomous robotic boat navigation

In a joint project between industry (Harpha Sea) and academia, we looked at ways to improve the autonomous navigation of a robotic marine vessel using image segmentation, tracking, structure from motion, and image stabilization. I worked on the image segmentation part, which used part of my bachelor’s thesis. An extended abstract appeared in a Slovenian computer science and electrical engineering conference, where we presented our group’s work.

Extended abstract: Dimitriev et al. – Advanced computer vision methods for autonomous navigation of robotic vessels.