This page has been written a few years ago during my PhD and is now a bit outdated.

My current research interests now mostly include the use of deep learning methods in innovative applications. For instance, we designed a system able to reconstruct a 3D volume out of a video clip of 2D ultrasound images without any tracking information [1]. We also work on real-time segmentation of ultrasound images and how we can leverage it to register pre-operative data or improve system calibrations [2].

I am also interested in creating new segmentation and registration algorithms that would incorporate the latest progress in terms of machine learning.

- [1] Deep Learning for Sensorless 3D Freehand Ultrasound Imaging. R. Prevost et al., MICCAI 2017.
- [2] Precise Ultrasound Bone Registration with Learning-based Segmentation and Speed of Sound Calibration. M. Salehi & R. Prevost et al., MICCAI 2017.

## >> Model-based Image Segmentation

Within the wide field of medical imaging research, image segmentation is one of the earliest and most important topics. Retrieving the shape of an organ is indeed of high interest for diagnosis, therapy planning or medical research in general. Model-based methods achieve a good trade-off between customization (models carry prior knowledge on the target structure) and genericity (which is important for research efficiency), and are therefore quite popular. The goal of our work is to build an efficient segmentation framework that is able to leverage all kinds of external information, i.e. pre-segmented databases via statistical learning, other images from the same patient via co-segmentation and user input via live interactions with the algorithm. This work is based on the implicit template deformation framework [Mory et al., 2012], which is a variational method relying on an implicit representation of shapes.

Level-set methods have been popular since they are very flexible and quite efficient, but not always robust and cannot guarantee the topology of the segmentation result (which is usually a very important prior in medical applications). On the other hand, atlas-based methods are much more reliable since they implicitly use a lot of model/context information; they now often achieve state-of-the-art results in a number of clinical applications. However, they have a huge computational burden (as they require one or multiple 3D registrations) and are only applicable in clinical settings with a standardized acquisition. The implicit template deformation algorithm is a hybrid approach that tries to combine the advantages of the two aforementioned classes of methods.

In this framework, we seek the segmentation as the zero level-set of an implicit function $\phi : \Omega \rightarrow \mathbb{R}$, but not any implicit function. The notion of model comes from the choice of the space $\mathbb{S}$ of admissible implicit functions. The segmentation $\phi$ is defined as a deformed version of an initial implicit function $\phi_0 : \Omega \rightarrow \mathbb{R}$ that will act as a shape prior. The set $\mathbb{S}$ is thus defined as: $$ \mathbb{S} = \{ \phi = \phi_0 \circ \psi \ \text{ with } \ \psi : \Omega \rightarrow \Omega \} $$ The transformation $\psi$ therefore becomes the unknown of the problem (that can be any energy minimization). It should naturally be constrained to control the deviation of the segmentation from the shape prior $\phi_0$ and have realistic results. As in atlas-based methods, we can enforce $\psi$ to be diffeomorphic, which provides the guarantee that the segmentation will have the same topology as the original template $\phi_0$.

When the chosen image-based energy can be expressed a region competition, its gradient (i.e. the forces to be applied) is only located on the boundary of the current segmentation. This allows us to design very efficient implementations (more details in the papers). Our optimized CPU implementation in C++ is able to segment a 3D image in a few seconds on a standard laptop. This great efficiency allows us to take into account user interactions in real-time while the algorithm is running [Mory et al., 2012].

Naturally the segmentation heavily depends on the model $\phi_0$, that should be chosen carefully. In the movie hereabove, the templates were synthetic shapes. yet, one feels that this choice is certainly suboptimal. Can we do better by using a database of pre-segmented shapes ? We showed in [Prevost et al., 2013] that we can learn shape information via a dedicated process in which both an optimal template and principal modes of variations are estimated from a collection of shapes. This learning strategy does not require one-to-one correspondences between shape sample points and is not biased by a pre-alignment of the training shapes. We then generalized the implicit template deformation formulation to automatically select the most plausible deformation as a shape prior. This novel framework provide significantly better segmentation results while maintaining the two main properties of implicit template deformation: topology preservation and computational efficiency.

## >> Joint Co-Segmentation and Registration

In the computer vision community, the term of co-segmentation denotes the task of finding an object of interest in a collection of images. When dealing with 2D natural images, the consistency of the objectâ€™s appearance (i.e. its color histogram) through the different images can be leveraged to obtain an unsupervised segmentation [Rubio et al., 2012; Vicente et al., 2010]. Yet objects undergo a projective transformation and a change of point of view may result in a high variation of the object silhouette: shape consistency is difficult to exploit. Conversely, in most medical applications we have access to three-dimensional data and the problem of pose observation is eluded. We can therefore give more importance to the shape of the object itself. On the other hand, we want to be able to use images from different acquisition modalities, so the structure to be segmented cannot be assumed to have the same appearance in all images. The settings are therefore very different and standard methods from computer vision (e.g. [Hochbaum & Singh, 2009]) are not adequate for clinical problems.

Although segmentation and registration are often seen as two separate problems, several approaches have already been proposed to perform them simultaneously. Most of them rely on an iconic registration guiding the segmentation (e.g. [Wang & Vemuri, 2005; Pohl et al., 2006; Lu & Duncan, 2012]). Yet they assume that the segmentation is known in one of the images, which is not the case in the applications of co-segmentation that we consider. Moreover, in several multimodal settings, iconic registration might be bound to fail since visible structures do not always correspond to each other (for example in US/CEUS images). Instead of registering the images themselves, Wyatt and Noble developed a maximum-aposteriori formulation to perform registration on label maps resulting from a Markov random field segmentation step [Wyatt & Noble, 2002]. However no shape model is enforced and noise or misclassifications may degrade the results.

The co-segmentation framework that we present in the next subsection is inspired by [Yezzi et al., 2003]. We rely on the same idea that one single shape should segment multiple images. However we apply it to different segmentation frameworks and in particular implicit approaches instead of parametric active contours.

We consider here any variational algorithm that consists in minimizing an energy of the following form

\begin{equation} E_I(\phi) = \int_\Omega f(\phi(x)) \ r_I(x) \ dx + \lambda \ R(\phi) \end{equation}where $f$ is a real-valued function and $r_I(x)$ denotes a pointwise score that is negative when $x$ probably belongs to the target object in the image $I$, and positive when it does not. This is a standard setting in which the optimal implicit function $\phi$ must achieve a trade-off between an image-based term and a regularization term $R$.

We are interested in the case where a pair of images $I_1$ and $I_2$ containing the same object are available. For instance, in medical imaging $I_1$ and $I_2$ can be two images of the same organ acquired from different modalities or at different times. If those images were perfectly aligned with respect to the target organ, the energy can be straightforwardly generalized to perform co-segmentation : \begin{equation} \min_{\phi, G_r} E_{I_1, I_2}(\phi) = \frac{1}{2} \int_{\Omega} f(\phi(x)) \ r_{I_1}(x) \ dx + \frac{1}{2} \int_{\Omega} f(\phi(x)) r_{I_2}(x) \ dx + \lambda \ R(\phi) \end{equation} Unfortunately, such an assumption almost never holds in medical applications. A more realistic hypothesis is to assume that the target object, segmented by $\phi$, is not deformed between the two acquisitions, but only undergoes an unknown global transformation Gr as in [Yezzi et al., 2003]. The co-segmentation energy thus reads:

\begin{equation} \min_{\phi, G_r} E_{I_1, I_2}(\phi, G_r) = \frac{1}{2} \int_{\Omega} f(\phi(x)) \ r_{I_1}(x) \ dx + \frac{1}{2} \int_{\Omega} f(\phi \circ G_r (x)) \ r_{I_2}(x) \ dx + \lambda \ R(\phi) \end{equation}Minimizing $E_{I1,I2}$ with respect to $\phi$ and $G_r$ simultaneously can be therefore interpreted as performing jointly segmentation (via $\phi$) and global registration (via $G_r$). We thus couple the two problems of co-segmentation and registration within a common framework, i.e. a single energy minimization problem. This generalizes a more common co-segmentation approach (e.g. [Han et al., 2011; Zagrodsky et al., 2005]) where the images are first aligned in a preprocessing step.

The genericity of the approach is two-fold: (i) it can be applied to a vast class of variational methods, in particular the implicit template deformation and the robust ellipsoid detection; (ii) it can be used in various clinical settings (e.g. multi-modal organ segmentation, motion tracking and so on). We showed that in allowed an improvement of the state-of-the-art methods in two different clinical applications: kidney segmentation in 3D contrast-ehanced ultrasound images, and registration of 3D+t sequences of perfusion CT images via the co-segmentation of the organ of interest.

## >> Machine Learning via Random Forests

In most fields there is nowadays a tremendously increasing number of data, which make them more difficult to process; fortunately though, we can analyze and learn from them. Statistical learning is particularly useful in computer vision (and therefore medical imaging) applications since the human visual system is extremely complex to model.

Random Forests, which are derived from decision trees, were introduced a long time ago [Breiman, 1984] but only recently actually popularized [Criminisi et al., 2013]. They are indeed a very useful framework for statistical learning, as they can tackle a large range of problems (from multi-label classification to regression but also density estimation or manifold learning), and are very efficient at testing time (since they are based on a chain of simple rules). We work on both classification and regression forests.

A common application of classification forests in computer vision is to learn the appearance of different parts of the images. In medical images, it is often a binary classification of each pixel as inside or outside an organ of interest. Contrary to SVM approaches for instance, Random Forests can not only predict this binary decision at each pixel but also give a confidence value on this prediction. We can therefore consider such response maps as a probability maps of the organ of interest (which are often subsequently used within a more elaborate segmentation framework).

One of our main contributions include the auto-context random forests, that are illustrated below and were inspired from the auto-context framework introduced in [Tu, 2010]. The motivation is that it is not always possible to infer the probability of a pixel by considering only local information and we need to use context. We therefore learn a first classifier on some image features (here, a simple and standard random forest), but then exploit this forest as a notion of context. To do so, we learn a second forest - in chain - that will use features from both the original image and the probability map estimated from the first forest. This allows a significant improvement on classification performances and embeds both regularity and structural information in the pixelwise prediction.

## >> Kidney Detection via Robust Ellipsoid Detection

A large number of methods (e.g. Hough transforms [Guil & Zapata, 1997; McLaughlin, 1998]) has already been proposed to detect ellipses in images [Wong et al., 2012]. However their extension to 3D, though possible, are usually computationally expensive mainly because of the number of parameters to estimate (9 for a 3D ellipsoid). Furthermore, they do not explicitly use the fact that only one ellipsoid is present in the image. On the other hand, statistical approaches like robust Minimum Volume Ellipsoid (MVE) estimators [Van Aelst & Rousseeuw, 2009] are better suited but require prior knowledge on the proportion of outliers (here the noise, artifacts or neighboring structures), which may vary from one image to another and is thus not available. We therefore propose an original variational framework, that is robust and fast, to estimate the best ellipsoid in an image.

In the considered framework, we represent ellipsoid using an implicit function $\phi : \Omega \rightarrow \mathbb{R}$ that can be parametrized by the center of the ellipsoid $\mathbf{c} \in \mathbb{R}^d$ and its sizes and orientations encoded by a $d \times d$ positive-definite matrix $M$. The detection method should be robust to outliers, i.e. bright voxels coming from noise, artifacts or other neighboring structures. Excluding those outliers is done by estimating a weighting function $w : \Omega \rightarrow [0, 1]$ that provides a confidence score for any point $x$ to be an inlier. The ellipsoid estimation is then formulated as an energy minimization problem with respect to $c$, $M$ and $w$:

Such a formulation has a statistical interpretation: it consists in fitting a Gaussian distribution to a subset of the bright pixels. It also turns out that its minimization is surprisingly and extremely easy using an alternate scheme on the parameters of the ellipsoid and the confidence function $w$. Closed-form solutions are available and are fast to compute: the algorithm converges in a few iterations, typically in the order of a second on a 3D volume.

We used this method for automatic detection of the kidney in 3D ultrasound and contrast-enhanced ultrasound images. The figure below shows some results on slices of two different data sets (for visualization purposes, but the method is actually performed in 3D). Such an ellipsoid is then used as the initialization of a model-based segmentation algorithm.