Donate to CSRC Contact Us Subscribe to our Mailing List Home
   
 
eventsmenu.gif Colloquium Archive Upcoming Colloquium Other Events Acsess Colloquium
  Colloquia Archive

Untitled Document

DATE: Friday, October 5, 2007
TITLE: Bayesian Multi-LDA for Matrix Factorization
TIME: 3:30 PM
LOCATION: GMCS-214
SPEAKER: Ian Porteous
Information & Computer Science
University of California at Irvine
ABSTRACT:

Matrix factorization algorithms like SVD and Non-negative matrix factorization (NMF) are arguably among the most widely used algorithms throughout machine learning and data mining. Applications of these techniques can be found in bioinformatics, robotics, computer vision, text analysis, information retrieval, collaborative filtering and so on. However, SVD and NMF do not assign probabilities to predictions and therefore can not provide uncertainty estimates. As an example a company wants to send gift cards to customers for items they are likely to want, but have not yet purchased. The company can run a matrix factorization algorithm on the joint product-customer matrix of ratings to find people who are likely to purchase item X, but the company will have no estimate of uncertainty for the predictions.

Probabilistic models such as PLSA and their Bayesian extensions such as LDA which do assign probabilities to predictions have been proposed as text models in the bag-of-words representation. Furthermore, an extension of PLSA to the case of user recommendation systems has been proposed. However, these models treat customers and products differently. In particular they discover user communities but not product groups. We propose a symmetrized LDA model, which we call "Multi-LDA" which draws information from related products as well as related customers. Additionally, Multi-LDA is not limited to matrix factorization, but applies to tensor factorization as well. For example Multi-LDA could be applied to customer-product-date data.

In addition to describing Multi-LDA, I will discuss how Multi-LDA relates to NMF and LDA and discuss some experimental results from customer-product ratings, customer-movie ratings,  and handwritten digits. I will also outline a Nonparametric version, which is able to estimate the number of customer and product groups automatically.

HOST: Kristin Duncan
   

Computational Science Research Center :: 5500 Campanile Drive :: San Diego, CA 92182-1245 :: (619) 594-3430
©2007 Computational Science Research Center, SDSU - All rights reserved.

Last updated: February 21, 2008 8:38 AM