Bayesian Multi-LDA for Matrix Factorization
TITLE:
Bayesian Multi-LDA for Matrix Factorization
DATE:
Friday, October 5th, 2007
TIME:
3:30 PM
LOCATION:
GMCS 214
SPEAKER:
Ian Porteous, Information & Computer Science, University of California at Irvine
ABSTRACT:
Matrix factorization algorithms like SVD and Non-negative matrix factorization (NMF) are arguably among the most widely used algorithms throughout machine learning and data mining. Applications of these techniques can be found in bioinformatics, robotics, computer vision, text analysis, information retrieval, collaborative filtering and so on. However, SVD and NMF do not assign probabilities to predictions and therefore can not provide uncertainty estimates. As an example a company wants to send gift cards to customers for items they are likely to want, but have not yet purchased. The company can run a matrix factorization algorithm on the joint product-customer matrix of ratings to find people who are likely to purchase item X, but the company will have no estimate of uncertainty for the predictions.
Probabilistic models such as PLSA and their Bayesian extensions such as LDA which do assign probabilities to predictions have been proposed as text models in the bag-of-words representation. Furthermore, an extension of PLSA to the case of user recommendation systems has been proposed. However, these models treat customers and products differently. In particular they discover user communities but not product groups. We propose a symmetrized LDA model, which we call “Multi-LDA” which draws information from related products as well as related customers. Additionally, Multi-LDA is not limited to matrix factorization, but applies to tensor factorization as well. For example Multi-LDA could be applied to customer-product-date data.
In addition to describing Multi-LDA, I will discuss how Multi-LDA relates to NMF and LDA and discuss some experimental results from customer-product ratings, customer-movie ratings, and handwritten digits. I will also outline a Nonparametric version, which is able to estimate the number of customer and product groups automatically.
HOST:
Kristin Duncan
DOWNLOAD: