Publications in learning deep networks
  - Marc'Aurelio Ranzato, Martin Szummer. Semi-supervised 
  Learning of Compact Document Representations with Deep Networks July 
  2008 Proc. Intl. Conf. on Machine Learning (ICML) 2008 792-799 
  
 Finding good representations of text documents is crucial in information
retrieval and classification systems. Today the most popular document
representation is based on a vector of word counts in the document. This
representation neither captures dependencies between related words, nor
handles synonyms or polysemous words.
In this paper, we propose an algorithm to learn text document representations
based on semi-supervised autoencoders that are stacked to form a deep
network.  The model can be trained efficiently on partially labeled corpora,
producing very compact representations of documents, while retaining as
much class information and joint word statistics as possible.  We show that
it is advantageous to exploit even a few labeled samples during training.
Publications in spoken dialog systems
-  M. Szummer, M. Henderson, C. Breslin, M. Gašić, D. Kim, B. Thomson,
  P. Tsiakoulis, S. Young. The BUDS POMDP Dialogue System Advances in Neural Information Processing Systems (NIPS) Demo 2012.
- 
M. Gašić, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis,
S. Young. POMDP-based dialogue manager adaptation to extended domains
SIGDIAL 2013 Best Paper Award.
- 
M. Gašić, D. Kim, P. Tsiakoulis, C. Breslin, M. Henderson,
M. Szummer, B. Thomson,
S. Young. Incremental
    on-line adaptation of POMDP-based dialogue managers to extended
    domains, 2014, Interspeech.
- 
P. Tsiakoulis, C. Breslin, M. Gašić, M. Henderson D. Kim, M. Szummer, B. Thomson, S. Young.
Dialogue Context Sensitive HMM-Based Speech Synthesis
ICASSP 2014
- 
M. Gašić, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis and
S. Young. On-line policy optimisation of Bayesian spoken dialogue systems via human interaction, ICASSP 2013
- 
S. Young, C. Breslin, M. Gašić, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis and E. Tzirkel Hancock.
 Evaluation of Statistical POMDP-based Dialogue Systems in Noisy Environments.
 International Workshop Spoken Dialogue Systems (IWSDS) 2014
- 
C. Breslin, M. Gašić, M. Henderson, D. Kim, M. Szummer, B. Thomson,
P. Tsiakoulis, K. Yu and S. Young. Continuous ASR for Flexible Incremental Dialogue Intl. Conf. Acoustics Speech and Signal Processing (ICASSP).
Publications in information retrieval
  - Martin Szummer, Emine Yilmaz
Semi-supervised Learning to Rank with Preference Regularization October 2011. 
    Conf. Information and Knowledge Management
    (CIKM) Poster 
 We propose a semi-supervised learning to rank algorithm. It learns from both labeled data (pairwise preferences or absolute labels) and unlabeled data. The data can consist of multiple groups of items (such as queries), some of which may contain only unlabeled items. We introduce a preference regularizer favoring that similar items are similar in preference to each other. The regularizer captures manifold structure in the data, and we also propose a rank-sensitive version designed for top-heavy retrieval metrics including NDCG and mean average precision. 
 The regularizer is employed in SSLambdaRank, a semi-supervised version of LambdaRank. This algorithm directly optimizes popular retrieval metrics and improves retrieval accuracy over LambdaRank, a state-of-the-art ranker that was used as part of the winner of the Yahoo! Learning to Rank challenge 2010. The algorithm runs in linear time in the number of queries, and can work with huge datasets.
- Chang Wang, Emine Yilmaz, Martin Szummer
Relevance Feedback Exploiting Query-Specific Document Manifolds October 2011. 
    Conf. Information and Knowledge Management (CIKM) 
 We incorporate relevance feedback into a learning to rank framework by exploiting query-specific document similarities. Given a few judged feedback documents and many retrieved but unjudged documents for a query, we learn a function that adjusts the initial ranking score of each document. Scores are fit so that documents with similar term content get similar scores, and scores of judged documents are close to their labels. By such smoothing along the manifold of retrieved documents, we avoid overfitting, and can therefore learn a detailed query-specific scoring function with several dozen term weights.
- Daniel Sheldon, Milad Shokouhi, Martin Szummer, Nick Craswell
LambdaMerge:
    Merging the Results of Query Reformulations February 2011. 
    Web Search and Data Mining
    (WSDM) Poster 
 Search engines can automatically reformulate user queries in a variety of ways, often leading to multiple queries that are candidates to replace the original. However, selecting a replacement can be risky: a reformulation may be more effective than the original or significantly worse, depending on the nature of the query, the source of reformulation candidates, and the corpus.  In this paper, we explore methods to mitigate this risk by issuing several versions of the query (including the original) and merging their results. We focus on reformulations generated by random walks on the click graph, a method that can produce very good reformulations but is also variable and prone to topic drift.  Our primary contribution is LambdaMerge, a supervised merging method that is trained to directly optimize a retrieval metric (such as NDCG or MAP) using features that describe both the reformulations and the documents they return.  In experiments on Bing data and GOV2, LambdaMerge outperforms the original query and several unsupervised merging methods. LambdaMerge also outperforms a supervised method to predict and select the best single formulation, and is competitive with an oracle that always selects the best formulation.
- Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. Efficient
  Object Category Recognition using Classemes September
  2010. European Conference on Computer Vision (ECCV)
  
 We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition.  The descriptor is the output of a large
number of weakly trained object category classifiers on the image.  The
trained categories are selected from an ontology of visual concepts,
but the intention is not to encode an explicit decomposition of the scene.
Rather, we accept that existing object category classifiers often encode
not the category per se but ancillary image characteristics; and that these ancillary characteristics can combine to represent visual classes unrelated to the constituent categories' semantic meanings.
 The advantage of this descriptor is that it allows object-category queries
to be made against image databases using efficient classifiers (efficient
at test time) such as linear support vector machines, and allows these
queries to be for novel categories.  Even when the representation is
reduced to 200 bytes per image, classification accuracy on object category
recognition is comparable with the state of the art (36% versus 42%), but
at orders of magnitude lower computational cost.
 
 
- Filip Radlinski, Martin Szummer, Nick Craswell. Metrics for Assessing Sets of Subtopics July 2010. SIGIR Conf Research and Development in Information Retrieval.
 To evaluate the diversity of search results, test collections have been
developed that identify multiple intents for each query. Intents are
the different meanings or facets that should be covered in a search results
list. This means that topic development involves proposing a set of
intents. We propose four measurable properties of query-to-intent mappings,
allowing for more principled topic development for such test collections.
 
 
- Filip Radlinski, Martin Szummer, Nick Craswell. Inferring
  Query Intent from Reformulations and Clicks April 2010. World
  Wide Web Conference (WWW).
 Many researchers have noted that web search queries are often ambiguous or
unclear. We present an approach for identifying the popular meanings of
queries using web search logs and user click behavior. We show our approach
to produce more complete and user-centric intents than expert judges by
evaluating on TREC queries. This approach was also used by the TREC 2009 Web
Track judges to obtain more representative topic descriptions from real
queries.
 
 
- Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. Learning
  Query-dependent Prefilters for Scalable Image Retrieval (
  supplement ) June 2009 Proc Comp. Vision Pattern Recogn. (CVPR)
  
 We describe an algorithm for similar-image search which is designed to be
efficient for extremely large collections of images.  For each query, a
small response set is selected by a fast prefilter, after which
a more accurate ranker may be applied to each image in the response set.
We consider a class of prefilters comprising disjunctions of conjunctions
("ORs of ANDs") of Boolean features. AND filters can be implemented
efficiently using skipped inverted files, a key component of
web-scale text search engines.  These structures permit search in time
proportional to the response set size.  The prefilters are learned from
training examples, and refined at query time to produce an approximately
bounded response set.
 We cast prefiltering as an optimization problem: for each test query,
select the OR-of-AND filter which maximizes training-set recall for an
adjustable bound on response set size.  This may be efficiently implemented
by selecting from a large pool of candidate conjunctions of Boolean
features using a linear program relaxation.  Tests on object class
recognition show that this relatively simple filter
is nevertheless powerful enough to capture some
semantic information.
 
 
- Martin Szummer, Nick Craswell. Behavioral 
  Classification on the Click Graph April 2008 World Wide Web 
  Conference 1241-1242 
 A bipartite query-URL graph, where an edge 
  indicates that a document was clicked for a query, is a useful construct for 
  finding groups of related queries and URLs. Here we use this behavior graph 
  for classification. We choose a click graph sampled from two weeks of image 
  search activity, and the task of ``adult'' filtering: identifying content in 
  the graph that is inappropriate for minors. We show how to perform 
  classification using random walks on this graph, and two methods for 
  estimating classifier parameters.
 
 
- Onno Zoeter, Michael Taylor, Ed Snelson, John Guiver, Nick Craswell, 
  Martin Szummer. A 
  Decision Theoretic Framework for Ranking using Implicit Feedback July 
  2008 SIGIR 2008 Workshop on Learning to Rank for Information Retrieval 
  
 This paper presents a decision theoretic ranking system that incorporates 
  both explicit and implicit feedback. The system has a model that predicts, 
  given all available data at query time, different interactions a person might 
  have with search results. Possible interactions include relevance labelling 
  and clicking. We define a utility function that takes as input the outputs of 
  the interaction model to provide a real valued score to the user's session. 
  The optimal ranking is the list of documents that, in expectation under the 
  model, maximizes the utility for a user session. The system presented is based 
  on a simple example utility function that combines both click behavior and 
  labelling. The click prediction model is a Bayesian generalized linear model. 
  Its notable characteristic is that it incorporates both weights for 
  explanatory features and weights for each query-document pair. This allows the 
  model to generalize to unseen queries but makes it at the same time flexible 
  enough to keep in a `memory' where the model should deviate from its feature 
  based prediction. Such a click-predicting model could be particularly useful 
  in an application such as enterprise search, allowing on-site adaptation to 
  local documents and user behaviour. The example utility function has a 
  parameter that controls the tradeoff between optimizing for clicks and 
  optimizing for labels. Experimental results in the context of enterprise 
  search show that a balance in the tradeoff leads to the best NDCG and good 
  (predicted) clickthrough.
 
 
- Nick Craswell, Martin Szummer. Random 
  Walks on the Click Graph July 2007 SIGIR Conf Research and 
  Development in Information Retrieval 239-246 
 Search engines can 
  record which documents were clicked for which query, and use these 
  query-document pairs as 'soft' relevance judgments. However, compared to the 
  true judgments, click logs give noisy and sparse relevance information. We 
  apply a Markov random walk model to a large click log, producing a 
  probabilistic ranking of documents for a given query. A key advantage of the 
  model is its ability to retrieve relevant documents that have not yet been 
  clicked for that query and rank those effectively. We conduct experiments on 
  click logs from image search, comparing our ('backward') random walk model to 
  a different ('forward') random walk, varying parameters such as walk length 
  and self-transition probability. The most effective combination is a long 
  backward walk with high self-transition probability.
 
 
Publications on Handwriting Adaptation 
(Personalization)
  - Martin Szummer, Christopher M. Bishop. Discriminative 
  Writer Adaptation October 2006 10th Intl. Workshop on Frontiers in 
  Handwriting Recognition (IWFHR) 293-298 
 We propose a general method 
  for adapting a writer-independent classifier to an individual writer. We 
  employ a mixture of experts formulation, where the classifiers are trained on 
  weighted clusters of writers. The clusters are determined by which experts 
  classify individual writing correctly. The method adapts by choosing the 
  appropriate combination of classifiers for a new user. It applies to any 
  probabilistic discriminative classifier, and adapts discriminatively without 
  modeling the input feature distribution. We apply the method to online 
  character recognition. Specifically, we use a mixture of neural networks as 
  well as a mixture of logistic regressions. We train the mixture via conjugate 
  gradient ascent or via the EM algorithm on 192,000 Latin characters of 98 
  classes and 216 writers, and show adaptation results for 21 
  writers.
 
 
Publications on Conditional Random Fields, applied to 
hand-drawing analysis
  - Martin Szummer, Pushmeet Kohli, Derek Hoiem.
Learning Random Fields using Graph Cuts. October 2010. Book chapter in book on MRFs, MIT press, edited by Andrew Blake, Carsten Rother, Pushmeet Kohli.
- Martin Szummer, Pushmeet Kohli, Derek Hoiem. Learning 
  CRFs using Graph Cuts October 2008 European Conference on Computer 
  Vision 
 Many computer vision problems are naturally formulated as 
  random fields, specifically MRFs or CRFs. The introduction of graph cuts has 
  enabled efficient and optimal inference in associative random fields, greatly 
  advancing applications such as segmentation, stereo reconstruction and many 
  others. However, while fast inference is now widespread, parameter learning in 
  random fields has remained an intractable problem. This paper shows how to 
  apply fast inference algorithms, in particular graph cuts, to learn parameters 
  of random fields with similar efficiency. We find optimal parameter values 
  under standard regularized objective functions that ensure good 
  generalization. Our algorithm enables learning of many parameters in 
  reasonable time, and we explore further speedup techniques. We also discuss 
  extensions to non-associative and multi-class problems. We evaluate the method 
  on image segmentation and geometry recognition.
 
 
- Carsten Rother, Vladimir Kolmogorov, Victor Lempitsky, Martin Szummer. Optimizing 
  Binary MRFs via Extended Roof Duality June 2007 Proc Comp. Vision 
  Pattern Recogn. (CVPR) 
 Many computer vision applications rely on the 
  efficient optimization of challenging, so-called non-submodular, binary 
  pairwise MRFs. A promising graph cut based approach for optimizing such MRFs 
  known as "roof duality" was recently introduced into computer vision. We study 
  two methods which extend this approach. First, we discuss an efficient 
  implementation of the "probing" technique introduced recently by Boros et al. 
  2006. It simplifies the MRF while preserving the global optimum. Our code is 
  400-700 faster on some graphs than the implementation of [Boros 2006]. Second, 
  we present a new technique which takes an arbitrary input labeling and tries 
  to improve its energy. We give theoretical characterizations of local minima 
  of this procedure. We applied both techniques to many applications, including 
  image segmentation, new view synthesis, super-resolution, diagram recognition, 
  parameter learning, texture restoration, and image deconvolution. For several 
  applications we see that we are able to find the global minimum very 
  efficiently, and considerably outperform the original roof duality approach. 
  In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and 
  simulated annealing, we nearly always find a lower energy.
 
 
- Philip J. Cowans, Martin Szummer. A 
  Graphical Model for Simultaneous Partitioning and Labeling January 
  2005 AI & Statistics 
 In this work we develop a graphical model 
  for describing probability distributions over labeled partitions of an 
  undirected graph which are conditioned on observed data. We show how to 
  efficiently perform exact inference in these models, by exploiting the 
  structure of the graph and adapting the sum-product and max-product 
  algorithms. We demonstrate our approach on the task of segmenting and labeling 
  hand-drawn ink fragments, and show that a significant performance increase is 
  obtained by labeling and partitioning simultaneously.
 Best Student Paper Award
 
 
- Yuan Qi, Martin Szummer, Thomas P. Minka. Bayesian 
  Conditional Random Fields January 2005 AI & Statistics 
  269-276 
 We propose Bayesian Conditional Random Fields (BCRFs) for 
  classifying interdependent and structured data, such as sequences, images or 
  webs. BCRFs are a Bayesian approach to training and inference with conditional 
  random fields, which were previously trained by maximizing likelihood (ML) 
  (Lafferty et al., 2001). Our framework avoids the problem of overfitting, and 
  offers the full advantages of a Bayesian treatment. Unlike the ML approach, we 
  estimate the posterior distribution of the model parameters during training, 
  and average predictions over this posterior during inference. We apply two 
  extensions of expectation propagation (EP), the power EP and the novel 
  transformed EP methods, to incorporate the partition function. For algorithmic 
  stability and accuracy, we flatten the approximation structures to avoid 
  two-level approximations. We demonstrate the superior prediction accuracy of 
  BCRFs over conditional random fields trained with ML or MAP on synthetic and 
  real datasets
 
 
- Yuan Qi, Martin Szummer, Thomas P. Minka. Diagram 
  Structure Recognition by Bayesian Conditional Random Fields June 2005 
  Proc Comp. Vision Pattern Recogn. (CVPR) C. Schmid and S. Soatto and C. 
  Tomasi 191-196 
 Hand-drawn diagrams present a complex recognition problem. 
  Elements of the diagram are often individually ambiguous, and require context 
  to be interpreted. We present a recognition method based on Bayesian 
  conditional random fields (BCRFs) that jointly analyzes all drawing elements 
  in order to incorporate contextual cues. The classification of each object 
  affects the classification of its neighbors. BCRFs allow flexible and 
  correlated features, and take both spatial and temporal information into 
  account. BCRFs estimate the posterior distribution of parameters during 
  training, and average predictions over the posterior for testing. As a result 
  of model averaging, BCRFs avoid the overfitting problems associated with 
  maximum likelihood training. We also incorporate Automatic Relevance 
  Determination (ARD), a Bayesian feature selection technique, into BCRFs. The 
  result is significantly lower error rates compared to ML- and MAP-trained 
  CRFs.
 
 
- Martin Szummer. Learning 
  Diagram Parts with Hidden Random Fields August 2005 Intl Conf 
  Document Analysis and Recognition (ICDAR) 1188-1193 
 Many diagrams 
  contain compound objects composed of parts. We propose a recognition framework 
  that learns parts in an unsupervised way, and requires training labels only 
  for compound objects. Thus, human labeling effort is reduced and parts are not 
  predetermined, instead appropriate parts are discovered based on the data. We 
  model contextual relations between parts, such that the label of a part can 
  depend simultaneously on the labels of its neighbors, as well as spatial and 
  temporal information. The model is a Hidden Random Field (HRF), an extension 
  of a Conditional Random Field. We apply it to find parts of boxes, arrows and 
  flowchart shapes in hand-drawn diagrams, and also demonstrate improved 
  recognition accuracy over the conditional random field model without 
  parts.
 
 
- Martin Szummer, Yuan Qi. Contextual 
  Recognition of Hand-drawn Diagrams with Conditional Random Fields 
  October 2004 9th Intl. Workshop on Frontiers in Handwriting Recognition 
  (IWFHR) F. Kimura and H. Fujisawa 32-37 
 Hand-drawn diagrams present a 
  complex recognition problem. Fragments of the drawing are often individually 
  ambiguous, and require context to be interpreted. We present a recognizer 
  based on conditional random fields (CRFs) that jointly analyze all drawing 
  fragments in order to incorporate contextual cues. The classification of each 
  fragment influences the classification of its neighbors. CRFs allow flexible 
  and correlated features, and take temporal information into account. Training 
  is done via conditional MAP estimation that is guaranteed to reach the global 
  optimum. During recognition we propagate information globally to find the 
  joint MAP or maximum marginal solution for each fragment. We demonstrate the 
  framework on a container versus connector recognition task.
 
 
- Martin Szummer, Philip J. Cowans. Incorporating 
  Context and User Feedback in Pen-Based Interfaces October 2004 AAAI 
  Fall symposium, Making Pen-Based Interaction Intelligent and Natural R. 
  Davis and J. Landay et al. 159-166 FS-04-06 
 We propose a joint 
  probabilistic model for grouping and labeling hand-drawn ink strokes. We 
  demonstrate that simultaneous grouping and labeling yields superior accuracy 
  to labeling alone. Our probabilistic formulation has many advantages, exact 
  inference is feasible, and we obtain confidence estimates. We show how to 
  incorporate user feedback by conditioning our model, and discuss different 
  types of inference tasks suited for various user interactions.
 
 
- Balaji Krishnapuram, Christopher M. Bishop, Martin Szummer. Generative 
  Models and Bayesian Model Comparison for Shape Recognition October 
  2004 9th Intl. Workshop on Frontiers in Handwriting Recognition (IWFHR) 
  F. Kimura and H. Fujisawa 20-25 
 Recognition of hand-drawn shapes is an 
  important and widely studied problem. By adopting a generative probabilistic 
  framework we are able to formulate a robust and flexible approach to shape 
  recognition which allows for a wide range of shapes and which can recognize 
  new shapes from a single exemplar. It also provides meaningful probabilistic 
  measures of model score which can be used as part of a larger probabilistic 
  framework for interpreting a page of ink. We also show how Bayesian model 
  comparison allows the trade-off between data fit and model complexity to be 
  optimized automatically.
 
 
Learning from partially labeled data 
(semi-supervised learning)
  - Martin Szummer, Tommi Jaakkola. Information 
  Regularization with Partially Labeled Data January 2003 Advances in 
  Neural Information Processing Systems (NIPS) 1025-1032 15 
  
 Classification with partially labeled data requires using a large number 
  of unlabeled examples (or an estimated marginal P(x)), to further constrain 
  the conditional P(y|x) beyond a few available labeled examples. We formulate a 
  regularization approach to linking the marginal and the conditional in a 
  general way. The regularization penalty measures the information that is 
  implied about the labels over covering regions. No parametric assumptions are 
  required and the approach remains tractable even for continuous marginal 
  densities P(x). We develop algorithms for solving the regularization problem 
  for finite covers, establish a limiting differential equation, and exemplify 
  the behavior of the new regularization approach in simple cases.
 
 
- Chen-Hsiang Yeang, Martin Szummer. Markov 
  Random Walk Representations with Continuous Distributions August 2003 
  Proc. Uncertainty in Artificial Intelligence, UAI U. Kjærulff and C. 
  Meek 600-607 
 Representations based on random walks can exploit discrete 
  data distributions for clustering and classification. We extend such 
  representations from discrete to continuous distributions. Transition 
  probabilities are now calculated using a diffusion equation with a diffusion 
  coefficient that varies inversely with the data density. We relate this 
  diffusion equation to a path integral and derive the corresponding path 
  probability measure. The framework is useful for incorporating continuous data 
  densities and prior knowledge.
 
 
- Martin Szummer, Tommi Jaakkola. Partially 
  labeled classification with Markov random walks January 2002 
  Advances in Neural Information Processing Systems (NIPS) 945-952 14 
  
 To classify a large number of unlabeled examples we combine a limited 
  number of labeled examples with a Markov random walk representation over the 
  unlabeled examples. The random walk representation exploits any low 
  dimensional structure in the data in a robust, probabilistic manner. We 
  develop and compare several estimation criteria/algorithms suited to this 
  representation. This includes in particular multi-way classification with an 
  average margin criterion which permits a closed form solution. The time scale 
  of the random walk regularizes the representation and can be set through a 
  margin-based criterion favoring unambiguous classification. We also extend 
  this basic regularization by adapting time scales for individual examples. We 
  demonstrate the approach on synthetic examples and on text classification 
  problems.
 
 
- Martin Szummer, Tommi Jaakkola. Kernel 
  expansions with unlabeled examples January 2001 Advances in Neural 
  Information Processing Systems (NIPS) 626-632 13 
 Modern 
  classification applications necessitate supplementing the few available 
  labeled examples with unlabeled examples to improve classification 
  performance. We present a new tractable algorithm for exploiting unlabeled 
  examples in discriminative classification. This is achieved essentially by 
  expanding the input vectors into longer feature vectors via both labeled and 
  unlabeled examples. The resulting classification method can be interpreted as 
  a discriminative kernel density estimate and is readily trained via the EM 
  algorithm, which in this case is both discriminative and achieves the optimal 
  solution. We provide, in addition, a purely discriminative formulation of the 
  estimation problem by appealing to the maximum entropy framework. We 
  demonstrate that the proposed approach requires very few labeled examples for 
  high classification accuracy.
 
 
Image retrieval and texture modeling
  - Wolfgang Sörgel, Sabine Girod, Martin Szummer, Bernd Girod. Computer 
  Aided Diagnosis of Bone Lesions in the Facial Skeleton March 1998 
  Aachen, Germany Workshop Bildverarbeitung für die Medizin 
 We 
  present a system for computer aided diagnosis of bone tumors in the facial 
  skeleton. There are many different lesions with radiographic 
  manifestation in the jaws. Our system helps performing the differential 
  diagnosis of these. The input is a digitized orthopantomograph (OPG) in which 
  the user marks the position of the lesion with a single mouse click. An active 
  contour model then automatically finds the boundaries of the lesion. 
  Graylevel histograms, MRSAR texture features and Gabor filter 
  features are computed for the lesion region. These features are then combined 
  and used to query a database containing expert-diagnosed reference cases. The 
  result is a number of similar cases, with tumor position marked and with 
  available expert annotations. We show good agreement between our results and 
  differential diagnosis given by humans. The system is also a suitable tool for 
  training and education.
 
 
- Martin Szummer, Rosalind W. Picard. Indoor-Outdoor 
  Image Classification January 1998 Bombay, India IEEE International 
  Workshop on Content-Based Access of Image and Video Databases, CAIVD 
  42-51 
 We show how high-level scene properties can be inferred from 
  classification of low-level image features, specifically for the 
  indoor-outdoor scene retrieval problem. We systematically studied the 
  features: (1) histograms in the Ohta color space (2) multiresolution, 
  simultaneous autoregressive model parameters (3) coefficients of a 
  shift-invariant DCT. We demonstrate that performance is improved by computing 
  features on subblocks, classifying these subblocks, and then combining these 
  results in a way reminiscent of ``stacking.'' State of the art single-feature 
  methods are shown to result in about 75-86% performance, while the new 
  method results in 90.3% correct classification, when evaluated on a diverse 
  database of over 1300 consumer images provided by Kodak.
 
 
- Rosalind W. Picard, Thomas P. Minka, Martin Szummer. Modeling 
  User Subjectivity In Image Libraries September 1996 Lausanne, 
  Switzerland IEEE Intl Conf On Image Processing (ICIP) 777-780 2 
 In 
  addition to the problem of which image analysis models to use in digital 
  libraries, e.g. wavelet, Wold, color histograms, is the problem of how to 
  combine these models with their different strengths. Most present systems 
  place the burden of combination on the user, e.g. the user specifies 50% 
  texture features, 20% color features, etc. This is a problem since most users 
  do not know how to best pick the settings for the given data and search 
  problem. This paper addresses this problem, describing research in progress 
  for a system that (1) automatically infers which combination of models best 
  represents the data of interest to the user and (2) learns continuously during 
  interaction with each user. In particular, these two components -- inference 
  and learning -- provide a solution that adapts to the subjective and 
  hard-to-predict behaviors frequently seen when people query or browse image 
  libraries.
 
 
- Martin Szummer, Rosalind W. Picard. Temporal 
  Texture Modeling September 1996 Lausanne, Switzerland IEEE Intl 
  Conf On Image Processing (ICIP) 823-826 3 
 Temporal textures are 
  textures with motion. Examples include wavy water, rising steam and fire. We 
  model image sequences of temporal textures using the spatio-temporal 
  autoregressive model (STAR). This model expresses each pixel as a linear 
  combination of surrounding pixels lagged both in space and in time. The model 
  provides a base for both recognition and synthesis. We show how the least 
  squares method can accurately estimate model parameters for large, causal 
  neighborhoods with more than 1000 parameters. Synthesis results show that the 
  model can adequately capture the spatial and temporal characteristics of many 
  temporal textures. A 95% recognition rate is achieved for a 135 element 
  database with 15 texture classes.
 
 
- Martin Szummer. An 
  Image Browser that learns from User Interaction December 1995 
  
 Large image databases with millions of images are being built. It 
  is very tedious to browse these databases; the user will only have time to see 
  a small fraction of the images. Currently, there are very few tools that 
  assist the user in finding the right selection of images. This project 
  combines learning algorithms and machine vision techniques to create a 
  flexible and powerful image browser. The user is presented with a selection of 
  images. They select positive and negative examples of the type of images they 
  want to see or avoid seeing. The browser analyzes the examples and chooses the 
  best search metrics. It then uses these metrics to find images similar to the 
  examples. The results form a hierarchy that the user can browse with a tree 
  browser. Next, the user selects more positive and negative examples, and the 
  process repeats.
 
 
- Martin Szummer. Temporal 
  Texture Modeling September 1995 346 MIT Media Lab Perceptual 
  Computing 
 Temporal textures are textures with motion. Examples include 
  wavy water, rising steam and a crowd milling about. We model image sequences 
  of temporal textures using the spatio-temporal autoregressive model (STAR). 
  This model expresses each pixel as a linear combination of surrounding pixels 
  lagged both in space and in time. The model provides a basis both for 
  recognition and synthesis. We show how the least squares method can accurately 
  estimate model parameters for large, causal neighborhoods with more than 1000 
  parameters. Synthesis results show that the model can adequately capture the 
  spatial and temporal characteristics of many temporal 
textures.
 
 
Natural Language Processing
  - Wlodek Zadrozny, Marcin Szummer, Stanislaw Jarecki, David Johnson, Leora 
  Morgenstern. NL 
  Understanding with a Grammar of Constructions August 1994 Kyoto, Japan 
  Intl. Conf. on Computational Linguistics COLING 1289-1293 15 
 We 
  present an approach to natural language understanding based on a computable 
  grammar of constructions. A "construction" consists of a set of features of 
  form and a description of meaning in a context. A grammar is a set of 
  constructions. This kind of grammar is the key element of Mincal, an 
  implemented natural language, speech-enabled interface to an on-line calendar 
  system. The system consists of a NL grammar, a parser, an on-line calendar, a 
  domain knowledge base (about dates, times and meetings), an application 
  knowledge base (about the calendar), a speech recognizer, a speech generator, 
  and the interfaces between those modules. We claim that this architecture 
  should work in general for spoken interfaces in small domains. In this paper 
  we present two novel aspects of the architecture: (a) the use of 
  constructions, integrating descriptions of form, meaning and context into one 
  whole; and (b) the separation of domain knowledge from application knowledge. 
  We describe the data structures for encoding constructions, the structure of 
  the knowledge bases, and the interactions of the key modules of the 
  system.