<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-17171206</id><updated>2011-12-23T11:01:38.859-08:00</updated><category term='flash'/><category term='venture'/><category term='crowds'/><category term='rotten tomatoes'/><category term='teleportation'/><category term='latex'/><category term='recommender systems'/><category term='localization'/><category term='boost'/><category term='telefonica i+d'/><category term='last.fm'/><category term='events'/><category term='privacy'/><category term='algorithms'/><category term='sigir2010'/><category term='telefonica r+d'/><category term='owl'/><category term='www'/><category term='awk'/><category term='rockbox'/><category term='job'/><category term='GSoC'/><category term='RUP'/><category term='recsys09'/><category term='data analysis'/><category term='bibtex'/><category term='ted talks'/><category term='video editing'/><category term='emi'/><category term='serendipity'/><category term='c++'/><category term='facebook'/><category term='3d fax'/><category term='binutils'/><category term='boost serialization C++'/><category term='jack'/><category term='thesis audio'/><category term='java'/><category term='hci'/><category term='musicstrands'/><category term='acm'/><category term='tim armstrong'/><category term='rest'/><category term='music recommenders'/><category term='Eclipse'/><category term='music business'/><category term='dsl'/><category term='extreme Programming'/><category term='nearest neighbors'/><category term='radiohead'/><category term='mp3'/><category term='framework'/><category term='methodologies'/><category term='error'/><category term='conferences'/><category term='nvidia'/><category term='google'/><category term='acm multimedia conference'/><category term='opportunities'/><category term='pig'/><category term='graphical models'/><category term='nvidia cuda allosphere stereo'/><category term='wsdm09'/><category term='collaborative filtering'/><category term='3D scene'/><category term='os x'/><category term='being digital'/><category term='mda'/><category term='data files'/><category term='adsl'/><category term='survey'/><category term='course'/><category term='cbs'/><category term='ratings'/><category term='hive'/><category term='iptv'/><category term='fellows'/><category term='aggregator'/><category term='new york'/><category term='teaching'/><category term='imagenio'/><category term='pulseaudio'/><category term='web science'/><category term='lastfm'/><category term='acm multimedia conference augsburg acmmm07'/><category term='spin-offs'/><category term='startup'/><category term='music'/><category term='open research day'/><category term='explicit feedback'/><category term='sylicon valley'/><category term='company'/><category term='interaction'/><category term='scrum'/><category term='ipod'/><category term='us'/><category term='user modeling'/><category term='wsdm'/><category term='university'/><category term='mobile'/><category term='allosphere'/><category term='data mining'/><category term='publications'/><category term='web'/><category term='recommenders'/><category term='experts'/><category term='wilco'/><category term='mapreduce'/><category term='hadoop'/><category term='firefox'/><category term='modding'/><category term='presentation skills'/><category term='standard'/><category term='netflix'/><category term='catalan'/><category term='web 2.0'/><category term='software engineering'/><category term='kdd'/><category term='link'/><category term='techtransfer'/><category term='xml'/><category term='strands'/><category term='business'/><category term='threads'/><category term='description language'/><category term='rock'/><category term='slow'/><category term='roy fielding'/><category term='semantic web'/><category term='models'/><category term='Social Networks'/><category term='spain'/><category term='gaming'/><category term='cinelerra'/><category term='explicit'/><category term='classifiers'/><category term='netflix prize'/><category term='software'/><category term='installing linux'/><category term='content-based'/><category term='presentation data'/><category term='stats'/><category term='multitouch screens'/><category term='pattern language'/><category term='architecture'/><category term='amarok'/><category term='boston'/><category term='noise'/><category term='www2009'/><category term='mystrands'/><category term='object-oriented'/><category term='trust'/><category term='quora'/><category term='umap09'/><category term='game dynamics'/><category term='conference'/><category term='graph'/><category term='annotator'/><category term='barcelona'/><category term='rdf'/><category term='agile'/><category term='antonia font'/><category term='peer review'/><category term='CLAM'/><category term='recsys'/><category term='madrid'/><category term='windows'/><category term='recsys11'/><category term='pipes'/><category term='svm'/><category term='science'/><category term='linux'/><category term='internships'/><category term='computer science'/><category term='research'/><category term='domain-specific'/><category term='patterns'/><category term='programming'/><category term='3D audio'/><category term='sigir09'/><category term='sorting'/><category term='streaming'/><category term='tourism'/><category term='Scons'/><category term='mamagement'/><category term='context'/><category term='matrix factorization'/><category term='scan linux ubuntu epson 2000'/><category term='time'/><category term='implicit'/><category term='jabref'/><category term='campus party'/><category term='search'/><category term='vancouver'/><title type='text'>TechnoCalifornia</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default?start-index=101&amp;max-results=100'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>184</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-17171206.post-8599625116147452790</id><published>2011-11-02T21:29:00.000-07:00</published><updated>2011-11-02T22:31:45.054-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='recsys11'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Recsys 2011 - Notes and Pointers</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://recsys.acm.org/2011/images/Chicago_night1.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 518px; height: 201px;" src="http://recsys.acm.org/2011/images/Chicago_night1.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I found &lt;a href="http://recsys.acm.org/2011/index.shtml"&gt;Recsys&lt;/a&gt; this year of very high quality in general. There were many good papers and presentations. The &lt;a href="http://recsys.acm.org/2011/industry_track.shtml"&gt;Industry track&lt;/a&gt; was also very high-quality, with very interesting talks from companies such as Twitter, Facebook, or eBay. Jon Sanders and I also gave two presentations explaining how recommendations have evolved since the Netflix Prize (more on this soon).&lt;br /&gt;&lt;br /&gt;Here are my rough notes with pointers to some papers I considered especially interesting. I have grouped them in 5 categories that I think summarize the main topics in the conference: (1) Transparency and explanations, (2) Implicit Feedback, (3) Context, (4) Metrics and evaluation, and (5) Others. Note that the selection is completely biased towards my personal interests.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(1) TRANSPARENCY &amp;amp; EXPLANATIONS.&lt;/span&gt; One of the recurring themes was the fact that user trust and perceived quality of the recommendations was very much influence not by accuracy alone, but by how transparent the system was, and the amount of "explanations" that were added.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Daniel Tunkelang(LinkedIn) did a very interesting tutorial on "Recommendations as a Conversation with the User", where he focused on these kinds of issues. See his slides in &lt;a href="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/"&gt;his blog&lt;/a&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Neel Sundaresan (eBay) also stressed in his keynote that adding explanations can sometimes be more important than getting the recommendation right.&lt;/li&gt;&lt;li&gt;In the paper "&lt;a href="http://www.usabart.nl/portfolio/KnijnenburgReijmerWillemsen-recsys2011.pdf"&gt;Each to His Own: How Different Users Call for Different Interaction Methods in Recommender Systems&lt;/a&gt;", the authors found that depending on how experts are users in the domain, they prefer different kind of recommendations and interaction models. For example, in one of the extremes, novices, prefer top-10 non-personalized to their personalized recommendations. In general a hybrid model of interaction is better than either implicit or explicit-only.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;(2) IMPLICIT FEEDBACK.&lt;/span&gt; A lot of papers this years on using implicit consumption data instead of (or in combination with) ratings.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The best paper, by Yehuda Koren and Joe Sill, addressed the issue of non-linearity in ratings. "&lt;a href="http://labs.yahoo.com/node/640"&gt;OrdRec: An Ordinal Model for Predicting Personalized Item Rating Distributions&lt;/a&gt;" modifies the standard Matrix Factorization approach to adapt to the fact that user ratings are ordinal, but not numerical. The way they model ratings, with a set of thresholds, can be used in combination with any model, not only SVD-like approaches. This paper effectively addresses most of the issues I raised in my previous post "&lt;a href="http://technocalifornia.blogspot.com/2011/04/recommender-systems-were-doing-it-all.html"&gt;We are doing everything wrong...&lt;/a&gt;"&lt;/li&gt;&lt;li&gt;In "&lt;a href="http://unical.academia.edu/NicolaBarbieri/Papers/803078/Modeling_Item_Selection_and_Relevance_for_Accurate_Recommendations"&gt;Modeling Item Selection and Relevance for Accurate Recommendations: A Bayesian Approach&lt;/a&gt;" they define the concept of a "Free probabilstic model" where they try to predict independently the probabilty of play and rating. &lt;/li&gt;&lt;li&gt;In "Multi-Value Probabilistic Matrix Factorization for IP-TV Recommendations", the authors present a Matrix Factorization model that allows for multiple observations of the same item. In particular, it is applied for IPTV recommendations where the fact that the user watched part of an episode is interpreted as negative feedback.&lt;/li&gt;&lt;li&gt;"&lt;a href="http://www.cs.purdue.edu/homes/fangy/hetrec11-fang.pdf"&gt;Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback&lt;/a&gt;" presents a combined Matrix Factorization model that includes ratings, content features, and implicit feedback. They use cosine item similarity for weighing negative examples.&lt;/li&gt;&lt;li&gt;In "&lt;a href="http://www.slideshare.net/alansaid/personalizing-tags-a-folksonomylike-approach-for-recommending-movies/download"&gt;Personalizing Tags: A Folksonomy-like Approach for Recommending Movies&lt;/a&gt;", they use tags (or categories) as a very simple method of recommending movies: for each user compute average rating given to movies with a certain tag.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;(3) CONTEXT.&lt;/span&gt; There were 2 workshops (&lt;a href="http://cars-workshop.org/"&gt;CARS&lt;/a&gt; and &lt;a href="http://2011.camrachallenge.com/"&gt;CAMRA&lt;/a&gt;), and several papers in the main conference, talking about how to add contextual information for the recommendations:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"The Effect of Context-Aware Recommendations on Customer Purchasing Behavior and Trust" is an interesting paper, focusing on the evaluation side. They include an A/B test for measuring the effect of context-aware recommendations. Using context increased overall sales in $ but not in number. Therefore, users tend to spend more $ per item.&lt;/li&gt;&lt;li&gt;In the &lt;a href="http://2011.camrachallenge.com/"&gt;CAMRA&lt;/a&gt; workshop, many papers (such as "Temporal Rating Habits: A Valuable Tool for Rater Differentiation" or "Identifying Users From Their Rating Patterns") were related to how to identify who the author of a rating in a household was, since this was one of the tasks for the contest.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Also related to group recommendations, "Group Recommendation using Feature Space Representing Behavioral Tendency and Power Balance among Members", tries to model what is a good recommendation for a group where each of the individuals does not have the same influence.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;(4) METRICS and EVALUATIONS: &lt;/span&gt;There were several papers that offered different ways to measure accuracy for top-N ranked recommendations.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"&lt;a href="http://www.slideshare.net/pcastells/acm-recsys-2011-rank-and-relevance-in-novelty-and-diversity-metrics-for-recommender-systems"&gt;Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems&lt;/a&gt;" presents an interesting framework that includes metric for measuring not only accuracy, but also novelty, diversity....&lt;/li&gt;&lt;li&gt;"Item Popularity and Recommendation Accuracy" is an interesting work on how to remove popularity bias from accuracy metrics. A user study validates the fact that recall measure is correlated with user perceived quality of recommendation. Besides proposing a recall metric that removes popularity bias, he also proposes a popularity stratified training method that weights negative examples according to how popular they are.&lt;/li&gt;&lt;li&gt;"&lt;a href="http://ucersti.ieis.tue.nl/files/papers/3.pdf"&gt;Evaluating Rank Accuracy based on Incomplete Pairwise Preferences&lt;/a&gt;" proposes a measure called expected discounted rank correlation for the specific case of implicit feedback.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;(5) OTHERS&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;eBay and UCSC presented "&lt;a href="http://users.soe.ucsc.edu/%7Ejwang30/index.files/recsys175-wang.pdf"&gt;Utilizing Related Products for Post-Purchase Recommendation in E-commerce&lt;/a&gt;". The paper won the best poster award&lt;/li&gt;&lt;li&gt;There were many papers on Social Recommendations. Just to name one, in "Power to the People: Exploring Neighbourhood Formations in Social Recommender Systems", they did a user study to figure out how much users would like and trust recommendations coming from different user groups (those they decided, friends, everyone...). Interestingly, the method of choice did not make much difference... until you told the users what it was.&lt;/li&gt;&lt;li&gt;In "Wisdom of the Better Few: Cold Start Recommendation via Representative based Rating Elicitation" they discussed how to select most imformative users and items for cold start. I was surprised to see that our "Wisdom of the Few" approach got paraphrased in a paper title.&lt;/li&gt;&lt;li&gt;There were a couple of very interesting workshops on &lt;a href="http://womrad.org/2011/"&gt;Music Recommendations&lt;/a&gt; and &lt;a href="http://pema2011.cs.ucl.ac.uk/"&gt;Mobile Recommendations&lt;/a&gt; that I had to miss since I was attending others. But, they are definitely worth looking into if you are into music or mobile.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8599625116147452790?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8599625116147452790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8599625116147452790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8599625116147452790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8599625116147452790'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/11/recsys-2011-notes-and-pointers.html' title='Recsys 2011 - Notes and Pointers'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2675018638164046997</id><published>2011-09-25T23:41:00.001-07:00</published><updated>2011-12-23T11:01:38.929-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='interaction'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='presentation data'/><title type='text'>The Recommender Problem &amp; the Presentation Context</title><content type='html'>&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:officedocumentsettings&gt;   &lt;o:allowpng/&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves/&gt;   &lt;w:trackformatting/&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf/&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;    &lt;w:splitpgbreakandparamark/&gt;    &lt;w:enableopentypekerning/&gt;    &lt;w:dontflipmirrorindents/&gt;    &lt;w:overridetablestylehps/&gt;   &lt;/w:Compatibility&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val="Cambria Math"&gt;    &lt;m:brkbin val="before"&gt;    &lt;m:brkbinsub val="&amp;#45;-"&gt;    &lt;m:smallfrac val="off"&gt;    &lt;m:dispdef/&gt;    &lt;m:lmargin val="0"&gt;    &lt;m:rmargin val="0"&gt;    &lt;m:defjc val="centerGroup"&gt;    &lt;m:wrapindent val="1440"&gt;    &lt;m:intlim val="subSup"&gt;    &lt;m:narylim val="undOvr"&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" defunhidewhenused="true" defsemihidden="true" defqformat="false" defpriority="99" latentstylecount="267"&gt;   &lt;w:lsdexception locked="false" priority="0" semihidden="false" unhidewhenused="false" qformat="true" name="Normal"&gt;   &lt;w:lsdexception locked="false" priority="9" semihidden="false" unhidewhenused="false" qformat="true" name="heading 1"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 2"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 3"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 4"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 5"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 6"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 7"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 8"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 9"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 1"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 2"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 3"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 4"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 5"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 6"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 7"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 8"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 9"&gt;   &lt;w:lsdexception locked="false" priority="35" qformat="true" name="caption"&gt;   &lt;w:lsdexception locked="false" priority="10" semihidden="false" unhidewhenused="false" qformat="true" name="Title"&gt;   &lt;w:lsdexception locked="false" priority="1" name="Default Paragraph Font"&gt;   &lt;w:lsdexception locked="false" priority="11" semihidden="false" unhidewhenused="false" qformat="true" name="Subtitle"&gt;   &lt;w:lsdexception locked="false" priority="22" semihidden="false" unhidewhenused="false" qformat="true" name="Strong"&gt;   &lt;w:lsdexception locked="false" priority="20" semihidden="false" unhidewhenused="false" qformat="true" name="Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="59" semihidden="false" unhidewhenused="false" name="Table Grid"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Placeholder Text"&gt;   &lt;w:lsdexception locked="false" priority="1" semihidden="false" unhidewhenused="false" qformat="true" name="No Spacing"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Revision"&gt;   &lt;w:lsdexception locked="false" priority="34" semihidden="false" unhidewhenused="false" qformat="true" name="List Paragraph"&gt;   &lt;w:lsdexception locked="false" priority="29" semihidden="false" unhidewhenused="false" qformat="true" name="Quote"&gt;   &lt;w:lsdexception locked="false" priority="30" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Quote"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="19" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="21" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="31" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Reference"&gt;   &lt;w:lsdexception locked="false" priority="32" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Reference"&gt;   &lt;w:lsdexception locked="false" priority="33" semihidden="false" unhidewhenused="false" qformat="true" name="Book Title"&gt;   &lt;w:lsdexception locked="false" priority="37" name="Bibliography"&gt;   &lt;w:lsdexception locked="false" priority="39" qformat="true" name="TOC Heading"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin-top:0in;  mso-para-margin-right:0in;  mso-para-margin-bottom:10.0pt;  mso-para-margin-left:0in;  line-height:115%;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Calibri","sans-serif";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;p class="MsoNormal" style="margin-bottom: 12pt; line-height: normal; font-family: arial;"&gt;&lt;span style="font-size: 12pt;"&gt;In the traditional formulation of the "Recommender Problem", we have pairs of items and users and user feedback values for very few of those dyads. The problem is formulated as the finding of a utility function or model to estimate the missing values.&lt;br /&gt;&lt;br /&gt;In many real-world situations, feedback will be implicit&lt;span style="font-weight: bold;"&gt;**&lt;/span&gt; and binary in nature. For instance, in a web page you will have users visiting a url, or clicking on an add as a positive feedback. In a music service, a user will decide to listen to a song. Or in a movie service, like Netflix, you will have users deciding to watch a title as an indication that the user liked the movie. In these cases, the recommendation problem becomes the prediction of the probability a user will interact with a given item. There is a big shortcoming in using the standard recommendation formulation in such a setting: we don't have negative feedback. All the data we have is either positive or missing. And the missing data includes both items that the user explicitly chose to ignore because they were not appealing and items that would have been perfect recommendations but were never presented to the user.&lt;br /&gt;&lt;br /&gt;A similar issue has been dealt with in traditional data mining research, where classifiers need to be trained only using positive examples. In the "&lt;/span&gt;&lt;a href="http://www.cse.ucsd.edu/users/elkan/posonly.pdf"&gt;&lt;span style="font-size: 12pt; color: blue;"&gt;Learning Classifiers from Only Positive and Unlabeled Examples&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: 12pt;"&gt;" SIGKDD 08 paper, the authors present a method to convert unlabeled examples into both a positive and a negative example, each with a different weight related to the probability that a random exemplar is positive or negative. Another solutions to this issue is presented in the "&lt;/span&gt;&lt;a href="http://research.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf"&gt;&lt;span style="font-size: 12pt; color: blue;"&gt;Collaborative Filtering for Implicit Feedback Datasets&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: 12pt;"&gt;" paper by Hu, Koren and Volinsky. In this work, the authors binarize the implicit feedback values: any feedback value greater than zero means positive preference, while any value equal to zero is converted to no preference. A greater value in the implicit feedback value is used to measure the "confidence" in the fact the user liked the item, but not in measuring "how much" the user liked it. Yet another approach to inferring positive and negative feedback from implicit data is presented in the paper I co-authored with Dennis Parra, and I presented in a &lt;/span&gt;&lt;a href="http://technocalifornia.blogspot.com/2011/07/walk-talk-on-combination-of-implicit.html"&gt;&lt;span style="font-size: 12pt; color: blue;"&gt;previous post&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: 12pt;"&gt;. There, we argue that implicit data can be transformed to positive and negative feedback if aggregated at the right level. For example, the fact that somebody listened only once to a single track in an album can be interpreted as the user not liking that album.&lt;br /&gt;&lt;br /&gt;In many practical situations, though, we have more information than the simple binary implicit feedback from the user. For unlabeled examples that the user did not directly interact with, we can expect to have other information. In particular, we might be able to know whether they were shown to the user or not. This adds very valuable information, but slightly complicates the formulation of our recommendation problem. We now have three different kinds of values for items: positive, presented but not chosen, and not presented. And this is only if we simplify the model. In reality, information related to the presentation can be much richer than this and we might be able to derive data like the probability the user actually saw the item or weigh in different interaction events such as mouse overs, scrolls...&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: arial;"&gt;  &lt;/span&gt;&lt;br /&gt;&lt;div style="text-align: center; font-family: arial;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-VSfb2LZ66y8/TovvlN5sIrI/AAAAAAAAANk/Zou3UMqH4gw/s1600/NetflixInterface.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 335px; height: 177px;" src="http://2.bp.blogspot.com/-VSfb2LZ66y8/TovvlN5sIrI/AAAAAAAAANk/Zou3UMqH4gw/s200/NetflixInterface.jpg" alt="" id="BLOGGER_PHOTO_ID_5659880779386987186" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:officedocumentsettings&gt;   &lt;o:allowpng/&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves/&gt;   &lt;w:trackformatting/&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf/&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;    &lt;w:splitpgbreakandparamark/&gt;    &lt;w:enableopentypekerning/&gt;    &lt;w:dontflipmirrorindents/&gt;    &lt;w:overridetablestylehps/&gt;   &lt;/w:Compatibility&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val="Cambria Math"&gt;    &lt;m:brkbin val="before"&gt;    &lt;m:brkbinsub val="&amp;#45;-"&gt;    &lt;m:smallfrac val="off"&gt;    &lt;m:dispdef/&gt;    &lt;m:lmargin val="0"&gt;    &lt;m:rmargin val="0"&gt;    &lt;m:defjc val="centerGroup"&gt;    &lt;m:wrapindent val="1440"&gt;    &lt;m:intlim val="subSup"&gt;    &lt;m:narylim val="undOvr"&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" defunhidewhenused="true" defsemihidden="true" defqformat="false" defpriority="99" latentstylecount="267"&gt;   &lt;w:lsdexception locked="false" priority="0" semihidden="false" unhidewhenused="false" qformat="true" name="Normal"&gt;   &lt;w:lsdexception locked="false" priority="9" semihidden="false" unhidewhenused="false" qformat="true" name="heading 1"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 2"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 3"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 4"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 5"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 6"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 7"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 8"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 9"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 1"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 2"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 3"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 4"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 5"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 6"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 7"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 8"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 9"&gt;   &lt;w:lsdexception locked="false" priority="35" qformat="true" name="caption"&gt;   &lt;w:lsdexception locked="false" priority="10" semihidden="false" unhidewhenused="false" qformat="true" name="Title"&gt;   &lt;w:lsdexception locked="false" priority="1" name="Default Paragraph Font"&gt;   &lt;w:lsdexception locked="false" priority="11" semihidden="false" unhidewhenused="false" qformat="true" name="Subtitle"&gt;   &lt;w:lsdexception locked="false" priority="22" semihidden="false" unhidewhenused="false" qformat="true" name="Strong"&gt;   &lt;w:lsdexception locked="false" priority="20" semihidden="false" unhidewhenused="false" qformat="true" name="Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="59" semihidden="false" unhidewhenused="false" name="Table Grid"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Placeholder Text"&gt;   &lt;w:lsdexception locked="false" priority="1" semihidden="false" unhidewhenused="false" qformat="true" name="No Spacing"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Revision"&gt;   &lt;w:lsdexception locked="false" priority="34" semihidden="false" unhidewhenused="false" qformat="true" name="List Paragraph"&gt;   &lt;w:lsdexception locked="false" priority="29" semihidden="false" unhidewhenused="false" qformat="true" name="Quote"&gt;   &lt;w:lsdexception locked="false" priority="30" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Quote"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="19" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="21" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="31" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Reference"&gt;   &lt;w:lsdexception locked="false" priority="32" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Reference"&gt;   &lt;w:lsdexception locked="false" priority="33" semihidden="false" unhidewhenused="false" qformat="true" name="Book Title"&gt;   &lt;w:lsdexception locked="false" priority="37" name="Bibliography"&gt;   &lt;w:lsdexception locked="false" priority="39" qformat="true" name="TOC Heading"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin-top:0in;  mso-para-margin-right:0in;  mso-para-margin-bottom:10.0pt;  mso-para-margin-left:0in;  line-height:115%;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Calibri","sans-serif";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} &lt;/style&gt; &lt;![endif]--&gt;&lt;span style="font-family: arial;"&gt;  &lt;/span&gt;&lt;p style="font-family: arial;" class="MsoNormal"&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;In Netflix, we are working on different ways to add this rich information related to presentations and user interaction to the recommender problem. That is why I was especially interested in finding out that this year's SIGIR best student paper award has been awarded to a paper that addresses this issue. In the paper "&lt;/span&gt;&lt;a href="http://www.cc.gatech.edu/%7Esyang46/papers/SIGIR11CCF.pdf"&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;Collaborative Competitive Filtering: Learning&lt;br /&gt;Recommender Using Context of User Choice&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;", the authors present an extension to traditional Collaborative Filtering by encoding into the model not only the &lt;b&gt;collaboration &lt;/b&gt;between similar users and items, but also the &lt;b&gt;competition &lt;/b&gt;of items for user attention. They derive the model as an extension to standard latent factor models by taking into account the context in which the user makes the decision. That is, the probability I decide to select a given item depends on which are the other items I have as an alternative. Results are preliminary but promising. And, this work is definitely an interesting and appealing starting point for an area with many practical applications.&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: arial;"&gt;  &lt;/span&gt;&lt;p style="font-family: arial;" class="MsoNormal"&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;However, there are many possible improvements to the model. One of them, mentioned by the authors, is the need to take into account the so-called &lt;b style="mso-bidi-font-weight:normal"&gt;position bias&lt;/b&gt;. An item that is presented in the first position of a list has many more possibilities to be chosen than one that is farther down. This effect is well-known in the search community and has been studied from several angles. I would recommend, for instance to read some of the very interesting papers on this topic by Thorsten Joachims and his students. In the paper “&lt;/span&gt;&lt;a href="http://www.cs.cornell.edu/People/tj/publications/radlinski_etal_08b.pdf"&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;How Does Clickthrough Data Reflect Retrieval Quality?&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;”, for instance, they show how arbitrarily swapping items in a search result list has almost no effect. This proves that the positioning of the element can be a most important factor than how relevant the item is.&lt;br /&gt;&lt;br /&gt;I would love to hear of other ideas or approaches to deal with this new version of the recommender problem that includes, and would encourage researchers in the area to address an issue of huge potential impact.&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: arial;"&gt;  &lt;/span&gt;&lt;span style="font-family: arial;"&gt;  &lt;/span&gt;&lt;p style="font-family: arial;" class="MsoNormal"&gt;&lt;span style="font-size: 12pt; line-height: 115%;"&gt;**&lt;b style="mso-bidi-font-weight:normal"&gt;Note&lt;/b&gt;: I am using the word implicit here in the traditional sense in the recommendation literature. The truth is that a user selecting an item is in fact &lt;b style="mso-bidi-font-weight:normal"&gt;explicit&lt;/b&gt; information. However, it can be considered implicit in that the user is informing about the preferences indirectly by comparing the item to others in a context.&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2675018638164046997?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2675018638164046997/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2675018638164046997' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2675018638164046997'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2675018638164046997'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/09/recommender-problem-presentation.html' title='The Recommender Problem &amp; the Presentation Context'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-VSfb2LZ66y8/TovvlN5sIrI/AAAAAAAAANk/Zou3UMqH4gw/s72-c/NetflixInterface.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-211881762342667900</id><published>2011-07-28T21:49:00.000-07:00</published><updated>2011-07-31T10:36:02.940-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='netflix'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Joining Netflix</title><content type='html'>Three weeks ago, I started to work for Netflix. Everything has moved so fast with so many things to do and learn that it seems like I have already been here for a much longer time!&lt;br /&gt;&lt;br /&gt;I am now working as the manager of a small team working on recommendations &amp;amp; personalization in the company that promoted recommender systems research to major headlines thanks to the &lt;a href="http://www.netflixprize.com/"&gt;Netflix Prize&lt;/a&gt;. It also feels great to come to the company in an exciting time when it has just reached its 25th million customer and is starting its international expansion to &lt;a href="http://blog.netflix.com/2011/07/netflix-is-coming-to-latin-america.html"&gt;Latin America&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;All the fuzz created around the Netflix Prize &lt;a href="http://news.cnet.com/8301-17852_3-20078504-71/mit-prof-netflix-has-its-recommendations-wrong/"&gt;might lead some&lt;/a&gt; to believe that rating prediction is all there is to Netflix suggesting a given movie. However, I was happy to find out that rating prediction is only one of the many signals that my team uses in creating the final suggestions.&lt;br /&gt;&lt;br /&gt;Awesome place, awesome people, and awesome time to be around. And, btw, &lt;a href="http://www.netflix.com/Jobs?id=7563"&gt;we are hiring&lt;/a&gt;, so let me know if you are interested in joining. (&lt;span style="font-weight: bold;"&gt;Update&lt;/span&gt;: it seems that the jobs link is currently not active outside US/CA... I'm working on getting this fixed)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-EU1JNoF8ZXI/TjI9TqhAlmI/AAAAAAAAANQ/WKPl4DaPcds/s1600/2011-07-20%2B08.19.53.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 189px; height: 252px;" src="http://3.bp.blogspot.com/-EU1JNoF8ZXI/TjI9TqhAlmI/AAAAAAAAANQ/WKPl4DaPcds/s320/2011-07-20%2B08.19.53.jpg" alt="" id="BLOGGER_PHOTO_ID_5634633491833460322" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-211881762342667900?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/211881762342667900/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=211881762342667900' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/211881762342667900'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/211881762342667900'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/07/joining-netflix.html' title='Joining Netflix'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-EU1JNoF8ZXI/TjI9TqhAlmI/AAAAAAAAANQ/WKPl4DaPcds/s72-c/2011-07-20%2B08.19.53.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-4449493430080955379</id><published>2011-07-18T22:32:00.000-07:00</published><updated>2011-07-20T23:45:07.868-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='music recommenders'/><category scheme='http://www.blogger.com/atom/ns#' term='explicit feedback'/><category scheme='http://www.blogger.com/atom/ns#' term='explicit'/><category scheme='http://www.blogger.com/atom/ns#' term='implicit'/><category scheme='http://www.blogger.com/atom/ns#' term='user modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Walk the Talk: On the Combination of Implicit and Explicit Feedback</title><content type='html'>Last week, &lt;a href="http://www.sis.pitt.edu/%7Edparra/"&gt;Denis Parra&lt;/a&gt; presented our paper entitled "Walk the Talk: Analyzing the Relation between Implicit and Explicit Feedback for Preference Elicitation" at the &lt;a href="http://www.umap2011.org/"&gt;UMAP conference&lt;/a&gt;. The paper won Denis the best-student paper award (Congratulations!).&lt;br /&gt;&lt;br /&gt;The paper presents our initial work in analyzing the relation between implicit and explicit feedback. In short, the main question we wanted to answer is how does the self-reported preferences users give in a typical 5-star interface relate to what they actually do when looking at their consumption patterns. Our hypothesis was that there should exist simple models that relate both kinds of feedback. Finding a way to robustly convert implicit feedback into explicit ratings would open up the door to applying well-known methods with implicit feedback. But, much more importantly, we could then combine both kinds of input in a single model.&lt;br /&gt;&lt;br /&gt;In order to test our hypothesis, we prepared an experiment in the music domain. We asked last.fm users to take a &lt;a href="http://technocalifornia.blogspot.com/2010/08/study-on-online-music-taste-call-for.html"&gt;survey&lt;/a&gt; in which we queried them about how much they liked albums that were already in their listening history. With this data in hand, we could analyze the relation between implicit and explicit feedback and try to fit a simple model.&lt;br /&gt;&lt;br /&gt;I recommend you read the &lt;a href="http://bit.ly/r1mvkK"&gt;full paper&lt;/a&gt; if you want to get the longer story of our findings, but here is a brief summary:&lt;ul&gt;&lt;li&gt;There is a strong correlation between implicit feedback and self-reported preference (see figure below)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Variables such as recentness of interaction or overall popularity do not have significant effect. Note that in &lt;a href="http://www.princeton.edu/%7Emjs3/salganik_watts08.pdf"&gt;a previous study&lt;/a&gt; by Salganik &amp;amp; Duncan Watts, global popularity was found to affect users perceived quality. However, in that case and as opposed to ours, users were made aware of the popularity.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Interaction effect: When listening to music, some people prefer to listen to isolated songs or albums. The way they interact with music, affects the way they report their taste.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-gtrwSvrIpEI/TifEaf_CfYI/AAAAAAAAANI/IzeMVpRZbaQ/s1600/up-box.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 514px; height: 357px;" src="http://1.bp.blogspot.com/-gtrwSvrIpEI/TifEaf_CfYI/AAAAAAAAANI/IzeMVpRZbaQ/s320/up-box.png" alt="" id="BLOGGER_PHOTO_ID_5631685818591640962" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;After our analysis, we then construct a linear model that takes into account these variables by performing a linear regression. Once we have built these models, we can evaluate their performance in a regular recommendation scenario by measuring the error in predicting ratings in a hold-out dataset.&lt;br /&gt;&lt;br /&gt;This paper represents an initial but very promising line of work that we have already improved in several ways such as the use of logistic instead of linear regression to account for the non-linearity of the rating scale or the use of the regression model as a way to combine both implicit and explicit feedback. But I will leave those findings for a future post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-4449493430080955379?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/4449493430080955379/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=4449493430080955379' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4449493430080955379'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4449493430080955379'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/07/walk-talk-on-combination-of-implicit.html' title='Walk the Talk: On the Combination of Implicit and Explicit Feedback'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-gtrwSvrIpEI/TifEaf_CfYI/AAAAAAAAANI/IzeMVpRZbaQ/s72-c/up-box.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7977946010100019846</id><published>2011-04-07T13:59:00.000-07:00</published><updated>2011-04-12T06:32:27.856-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stats'/><category scheme='http://www.blogger.com/atom/ns#' term='error'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Recommender Systems: We're doing it (all) wrong</title><content type='html'>A few days back, there was an interesting post by Judy Robertson in the Communications of the ACM blog. The post, entitled "&lt;a href="http://cacm.acm.org/blogs/blog-cacm/107125-stats-were-doing-it-wrong/fulltext"&gt;Stats: We're doing it wrong&lt;/a&gt;", builds upon a paper from last year's CHI conference in which they report that  more than 90% of the HCI researchers used the wrong statistical tools when analyzing and reporting on likert scale type of data. A Likert scale is a unidimensional scale on which the respondent expresses the level of agreement to a statement - typically in a 1 to 5 scale in which 1 is strongly disagree and 5 is strongly agree.&lt;br /&gt;&lt;br /&gt;Here is an excerpt from the post that I think is worth highlighting:&lt;br /&gt;&lt;blockquote style="font-style: italic; color: rgb(153, 153, 0);"&gt;Likert scales give ordinal data. That it (sic), the data is ranked "strongly  agree" is usually better than "agree." However, it's not  interval data.  You can't say the distances between "strongly agree"  and "agree" would  be the same as "neutral" and "disagree," for example.  People tend to  think there is a bigger difference between items at the  extremes of the  scale than in the middle (there is some evidence cited in Kaptein's  paper that  this is the case). &lt;strong&gt;For ordinal data, one should use non-parametric statistical tests&lt;/strong&gt; which do not assume a normal distribution of the data. &lt;strong&gt;Furthermore, because of this it makes no sense to report means of likert &lt;/strong&gt;&lt;strong&gt;scale data--you should report the mode&lt;/strong&gt;.&lt;/blockquote&gt;As Judy, I have to admit that I am not a stats expert myself either. But in the general case I would agree with the previous: likert scale data is ordinal and cannot be treated as interval. However, whether treating it as interval is &lt;span style="font-weight: bold;"&gt;always&lt;/span&gt; a mistake or can be accepted under some circumstances is something that I am not sure and relates to the rest of this post.&lt;br /&gt;&lt;br /&gt;So for instance, it is not uncommon to find references where they clearly state that likert data can be treated as interval. For example, look at what they say in &lt;a href="http://www.fao.org/docrep/W3241E/w3241e04.htm"&gt;this handbook&lt;/a&gt; edited by the FAO.&lt;blockquote style="font-style: italic; color: rgb(153, 153, 0);"&gt;Likert scales are treated as yielding Interval data by the majority of marketing researchers. &lt;/blockquote&gt;Or look at &lt;a href="http://stats.stackexchange.com/questions/10/under-what-conditions-should-likert-scales-be-used-as-ordinal-or-interval-data"&gt;the answer&lt;/a&gt; to the question of whether likert data can be treated as interval in stackexchange.&lt;br /&gt;&lt;br /&gt;So there might be some circumstances in which, depending on the analysis, likert could be treated as interval... I guess. But not in the general case.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Implications for Recommender Systems&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now onto the big question: What does this have to do with Recommender Systems and how does it affect? To start with, let me ask you the question: Does the likert (1 to 5) scale relate to anything we use in recommender systems? You got it: &lt;span style="font-weight: bold;"&gt;ratings&lt;/span&gt; !&lt;br /&gt;&lt;br /&gt;So our worry goes now to understanding whether ratings can be treated as interval or they should instead be treated as ordinal data, just as they are in the general case of the likert scale. In order to defend that ratings can be treated as interval, we should have some validation that the distance between different ratings is approximately equal. However, just as in the case of likert scales, we know this is not the case.&lt;br /&gt;&lt;br /&gt;Look at this figure from &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;our previous work&lt;/a&gt; on measuring noise in ratings.&lt;br /&gt;&lt;img src="file:///home/xavier/Sandbox/data/articles/xamat_UMAP09/figs/fig2b.jpg" alt="" /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-0BzBRalrIDo/TZ4xEitd5EI/AAAAAAAAAKs/1l2M5Og6dEo/s1600/fig2b.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 241px;" src="http://4.bp.blogspot.com/-0BzBRalrIDo/TZ4xEitd5EI/AAAAAAAAAKs/1l2M5Og6dEo/s320/fig2b.jpg" alt="" id="BLOGGER_PHOTO_ID_5592961741347480642" border="0" /&gt;&lt;/a&gt;Here we are plotting the probability of finding different kinds of inconsistencies between pairs of ratings. The probability that a user changes her rating between 2 and 3 is almost 0.35 while the probability she changes between 4 and 5 goes down to almost 0.1. This is a clear indication that users perceive that the distance between a 2 and a 3 is much lower than between a 4 and a 5.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Consequences&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At this point, we can safely say that ratings are ordinal but not interval data. However, they are treated as a continuous interval scale in most of the recommender systems research! Let us stop to think a few of the consequences of ratings not being interval data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Distance Measures:&lt;/span&gt; All the neighbor based methods in collaborative filtering  are based on the use of some sort of distance measure. The most commonly used are Cosine distance and Pearson Correlation. However, both these distances assume a linear interval scale in their computations! We should conclude that using these distance measures with rating data is wrong. Other measures such as &lt;a href="http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient"&gt;Spearman's rank correlation&lt;/a&gt;, do not assume this. But to be honest, I don't remember having read many papers using Spearman.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Error Measures:&lt;/span&gt; This is my favorite one... The most commonly accepted measure of success for recommender systems is the Root Mean Squared Error (RMSE). But wait, this measure is explicitly assuming that ratings are also interval data! Similar error measures such as MAE also fall in the same trap... banned! So what could we use? Standard Information Retrieval measures such as Precision and Recall do not necessarily assume interval scale on the ratings, although their mapping to recommendation efficiency may also be questioned. Rank-based measures such as &lt;a href="http://en.wikipedia.org/wiki/Discounted_cumulative_gain"&gt;Discounted Cumulative Gain&lt;/a&gt; (nDCG) seem like our best bet for now.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Matrix factorization&lt;/span&gt;: Most MF techniques in Recommender Systems are in fact optimizing for RMSE. Therefore, we should discard them as statistically incorrect for the same reasons stated above. There are interesting alternatives to this though, like the &lt;a href="http://research.yahoo.com/files/recsys2010_submission_150.pdf"&gt;PureSVD&lt;/a&gt; method presented in Recsys last year, that do not optimize for RMSE but rather for ranking.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is clear that explicit ratings, just like likert scale data, have to be treated like ordinal (and not interval data). However, most of the methods and measures currently in use in recommender systems assume in some sense that there is a continuous linear scale in the ratings. Of course I am not advocating for throwing all of this research to the trash (among other things, it would include much of mine), but I would advice for a drastic change in the way we approach these issues.&lt;br /&gt;&lt;br /&gt;I am writing this post, especially in the hope to get feedback and reactions from you. So I am looking forward to the comments.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(204, 0, 0); font-weight: bold;"&gt;Update:&lt;/span&gt; This post was featured in Ycombinator Hacker News. So far it has received over 6K views and there is a somewhat interesting &lt;a href="http://news.ycombinator.com/item?id=2423313"&gt;comment thread&lt;/a&gt; in Ycombinator.&lt;br /&gt;&lt;br /&gt;(I'd like to thank and acknowledge the contribution of &lt;a href="http://www.sis.pitt.edu/%7Edparra/"&gt;Denis Parra&lt;/a&gt;, &lt;a href="http://www.ci.tuwien.ac.at/%7Ealexis/Welcome.html"&gt;Alexandros Karatzoglou&lt;/a&gt;, and &lt;a href="http://www.ic.unicamp.br/%7Eoliveira/"&gt;Rodrigo Oliveira&lt;/a&gt; to this post through previous very fruitful discussions)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7977946010100019846?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7977946010100019846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7977946010100019846' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7977946010100019846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7977946010100019846'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/04/recommender-systems-were-doing-it-all.html' title='Recommender Systems: We&apos;re doing it (all) wrong'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-0BzBRalrIDo/TZ4xEitd5EI/AAAAAAAAAKs/1l2M5Og6dEo/s72-c/fig2b.jpg' height='72' width='72'/><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3538183937391045395</id><published>2011-03-18T08:25:00.000-07:00</published><updated>2011-04-28T15:28:19.445-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='noise'/><category scheme='http://www.blogger.com/atom/ns#' term='explicit feedback'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>The Science and the Magic of User Feedback</title><content type='html'>That was the main title of a series of talks I gave in different labs and companies during my recent California tour. In this presentation, I talked about many of our recent projects related to how to interpret user feedback, in general, and in the particular case of recommender systems. I talked about our work on &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;measuring user rating noise&lt;/a&gt;, our follow-up in devising &lt;a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html"&gt;algorithms to reduce this natural noise&lt;/a&gt;, and on how you can use&lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt; experts instead of crowds&lt;/a&gt; to not only minimize this noise but address other issues in collaborative filtering.&lt;br /&gt;&lt;br /&gt;I also gave a sneak preview of our results to the&lt;a href="http://technocalifornia.blogspot.com/2010/08/study-on-online-music-taste-call-for.html"&gt; music survey&lt;/a&gt; I announced some time ago. &lt;a href="http://www.sis.pitt.edu/%7Edparra/"&gt;Denis Parra&lt;/a&gt; and I have submitted this work recently and are hoping to get it accepted to tell you a bit more about how to map implicit to explicit feedback.&lt;br /&gt;&lt;br /&gt;&lt;div style="width: 425px;" id="__ss_7256546"&gt;&lt;div style="text-align: center;"&gt; &lt;strong style="display: block; margin: 12px 0pt 4px;"&gt;&lt;a href="http://www.slideshare.net/xamat/the-science-and-the-magic-of-user-feedback-for-recommender-systems" title="The Science and the Magic of User Feedback for Recommender Systems"&gt;The Science and the Magic of User Feedback for Recommender Systems&lt;/a&gt;&lt;/strong&gt; &lt;object id="__sse7256546" height="355" width="425"&gt; &lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sienceandmagicinuserfeedback-110314043749-phpapp02&amp;amp;stripped_title=the-science-and-the-magic-of-user-feedback-for-recommender-systems&amp;amp;userName=xamat"&gt; &lt;param name="allowFullScreen" value="true"&gt; &lt;param name="allowScriptAccess" value="always"&gt; &lt;embed name="__sse7256546" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sienceandmagicinuserfeedback-110314043749-phpapp02&amp;amp;stripped_title=the-science-and-the-magic-of-user-feedback-for-recommender-systems&amp;amp;userName=xamat" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="355" width="425"&gt;&lt;/embed&gt; &lt;/object&gt;&lt;/div&gt;&lt;div&gt; &lt;/div&gt; &lt;/div&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Update&lt;/span&gt;: Thanks to the guys at &lt;a href="http://sna-projects.com/blog/2011/04/improving-recommendations/"&gt;LinkedIn's SNA group&lt;/a&gt;, I have now added below the video of my presentation at LinkedIn... enjoy!&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;iframe src="http://player.vimeo.com/video/22353044?title=0&amp;amp;byline=0&amp;amp;portrait=0" frameborder="0" height="225" width="400"&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;p style="text-align: center;"&gt;&lt;a href="http://vimeo.com/22353044"&gt;Tech Talk: Xavier Amatriain (Telefonica) -- "The Science and Magic of User and Expert Feedback for Improving Recommendations"&lt;/a&gt; from &lt;a href="http://vimeo.com/talksatlinkedin"&gt;Talks at LinkedIn&lt;/a&gt; on &lt;a href="http://vimeo.com/"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3538183937391045395?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3538183937391045395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3538183937391045395' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3538183937391045395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3538183937391045395'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/03/science-and-magic-of-user-feedback.html' title='The Science and the Magic of User Feedback'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8615217470550203105</id><published>2011-03-15T03:56:00.000-07:00</published><updated>2011-03-15T10:41:00.828-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='agile'/><category scheme='http://www.blogger.com/atom/ns#' term='mamagement'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='scrum'/><category scheme='http://www.blogger.com/atom/ns#' term='extreme Programming'/><title type='text'>Managing Research the Agile Way</title><content type='html'>I have discussed previously on this blog about how well the &lt;a href="http://technocalifornia.blogspot.com/2008/06/agile-research.html"&gt;Scientific Method adapts to Agile approaches&lt;/a&gt;. These ideas also took me to an unfinished effort to draft an &lt;a href="http://technocalifornia.blogspot.com/2009/06/very-draft-agile-research-manifesto.html"&gt;Agile Research Manifesto&lt;/a&gt;. However, by talking to several people with similar ideas, I realized that these attempts were largely interpreted as an intellectual exercise with little practical application. It is clearly my fault for not having explained that all of this in reality comes from many practical experiences. Some of these experiences go back to my PhD years when managing the &lt;a href="http://clam-project.org/"&gt;CLAM framework&lt;/a&gt;, as well as many undergrad student projects. As a matter of fact, during those days I published a practical guide for students on how to do their final project the "agile way" (I still keep the &lt;a href="http://xavier.amatriain.net/PFC/"&gt;webpage&lt;/a&gt;, in catalan, for historical reasons).&lt;br /&gt;&lt;br /&gt;In any case, in this post I wanted to address the practical side of agile research management by giving you a flavor of how I try to manage projects.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.flickr.com/photos/tonymangan/754511201/" title="The Plug-Hole por ~~Tone~~, en Flickr"&gt;&lt;img style="width: 411px; height: 276px;" src="http://farm2.static.flickr.com/1077/754511201_3067a868d7.jpg" alt="The Plug-Hole" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;(Picture by &lt;a href="http://www.flickr.com/photos/tonymangan/"&gt;~Tone&lt;/a&gt;)&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The anatomy of a research project&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What am I talking about when I say a "research project"? Although they might be completely different in theme and even scope, all of the projects that I have in mind when explaining the agile management approach should share at least some of the following properties:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Small-sized team&lt;/span&gt;: It is very likely that we are dealing with a one or two researchers team. A 3-4 people research team can already be considered large in my experience.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Very open and imprecise requirements&lt;/span&gt;: Especially at the beginning, we might have a coarse idea or hypothesis to validate. However, the approach, method, and scope, are likely to be undecided until very late in the game.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;High risk&lt;/span&gt;: By definition, a research project has to be highly innovative and therefore... risky. Our goal is to minimize the cost of a failure and realize early on but not to remove failure since this is an intrinsic feature of risk.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Imprecise resources&lt;/span&gt;: The fact that requirements are not clear and risk is high is usually accompanied by the fact that resources that can be allocated to the project are usually imprecise. If the project is highly successful and proves its interest in the first iterations, it can grow into something larger with more resources added to it. On the other hand, it is also very likely to be killed quickly if it does not yield promising initial results.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The planning game&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I will usually start-off by devoting a couple of weeks to a &lt;span style="font-style: italic;"&gt;Sprint 0&lt;/span&gt; during which the main tasks will be:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Understand what has been done before&lt;/span&gt;: Obviously, this requires lots of reading. However, it is good practice to also start writing at this same time, maybe in an informal wiki or the like.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Define the tools&lt;/span&gt;: Unless you are in a very specific environment, tools are likely to change for every project. Sometimes it is not only about what is the best tool, but also about what the team is most familiar with. This is usually an important thing in most projects, but it is more so in a project that it is high risk in nature and should avoid spending lots of time/resources in adapting to new tools.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Define the initial scope&lt;/span&gt;: There is no way you can have a complete picture of what is going to be the output of the project by this time. However, you should be able to list what you think will be the main steps and even some findings you anticipate. This list should be written like an ever changing Product Backlog (prioritized list of high-level features).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.flickr.com/photos/babyowls/2329783873/" title="Fifteen accounts of life, death, and everything that interferes. por Jenna Carver, en Flickr"&gt;&lt;img style="width: 414px; height: 311px;" src="http://farm4.static.flickr.com/3169/2329783873_3dc3c6a550.jpg" alt="Fifteen accounts of life, death, and everything that interferes." /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;&lt;span&gt;&lt;span&gt;(Picture by &lt;a href="http://www.flickr.com/photos/babyowls/"&gt;Jenna Carver&lt;/a&gt;)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Prioritizing&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;One of the most important activities that you end up doing when planning any project, be it at the initial phase or at any of its iterations, is prioritizing the different requirements, stories... Doing this in a group meeting is a great way to gain insights on the project and to be strategic.  Prioritizing tasks is not much different from any cost/benefit analysis: you measure cost, you measure benefit, and then sort items according to benefit/cost ratio.&lt;br /&gt;&lt;br /&gt;In the case of project planning, I usually like to assign cost to "complexity", and benefit to "interest". In other words, the cost of a feature or story will be how difficult or complex we anticipate it is to implement it. And the benefit is how interesting or important it is for our final goal. Once you sort items using the interest/complexity ratio, you will find that easy-to-do yet interesting features float to the top, while complex and not so important sink down to the bottom.&lt;br /&gt;&lt;br /&gt;Of course, the interesting discussions happen right in the middle. And especially when we have something that seems to be very important, but also very complex to achieve.  In these cases, we feel tempted to jump right away at the problem and devote 100% of our energy to it. However, one of the agile principles is that things seem more complex when you don't have enough understanding. If you put them off to later iterations, they will eventually become clearer and clearer and end up surfacing to the first positions on your priority list. I have found this sort of &lt;span style="font-style: italic;"&gt;smart procrastination&lt;/span&gt; to be extremely useful for agile research management.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Iterate, Iterate, Iterate&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Once you have come up with your initial product backlog, it is all a matter of breaking the process down in short iterations - I usually plan for one-week. At the beginning of each iteration (or &lt;span style="font-style: italic;"&gt;Sprint&lt;/span&gt;), you look at your product backlog, pick some of the top stories and break them down into finer grain tasks. You do the prioritization game on this new list and come up with your next week's scrum/iteration backlog.&lt;br /&gt;&lt;br /&gt;When doing this finer-grain prioritization, I have found it very useful to use the estimated number of hours as the measure of "complexity". Therefore, when picking the top tasks of our list, we will also have an estimate of how feasible it is to have them during this iteration and how much will be the relative effort put into each of them. And, if any task is estimated to be more than a day long, do yourself a favor and break it into several tasks.&lt;br /&gt;&lt;br /&gt;Also, it is important that, especially during the first iterations, you realize that the continuation of the project might be at stake at each iteration (or at least the current approach). Therefore, when measuring the "importance" of tasks to prioritize, ask yourself how relevant will that task be to convince you and others that you are onto something or you need to change routes.&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Test-driven research&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;If you are familiar with agile methods, you will probably know how important testing is in an agile project. Tests not only guarantee the stability of the project but are actually a way to specify requirements in a more verifiable form. In a similar way, you can think of specifying many of your research hypothesis as a test. For example, you can turn your hypothesis that the effect of a given procedure on your data or population is significant by a verifiable assertion that for t-test(D_original, D_after_procedure)&lt;/span&gt;-&gt; p is smaller 5%. There are many hypothesis and research tasks that can - and should - be written in this form before making it to your prioritized todo list. At least you should worry on how any of your results will be validated and how you can trust for them to be consistent and significant.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt; &lt;a href="http://www.flickr.com/photos/dunechaser/3385957841/" title="*splooch!* Gordon Freeman vs. Master Chief por Dunechaser, en Flickr"&gt;&lt;img style="width: 289px; height: 218px;" src="http://farm4.static.flickr.com/3540/3385957841_85bf7fcca6.jpg" alt="*splooch!* Gordon Freeman vs. Master Chief" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;(Picture by &lt;a href="http://www.flickr.com/photos/dunechaser/"&gt;Dunechaser&lt;/a&gt;)&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Related approaches&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you are interested in this kind of approaches, I recommend you read the article on the &lt;a href="http://cacm.acm.org/magazines/2010/10/99484-score-agile-research-group-management/fulltext"&gt;SCORE method&lt;/a&gt;, which is somewhat related to many of the things I am mentioning here. &lt;a href="http://agile2003.agilealliance.org/files/P6Paper.pdf"&gt;Here&lt;/a&gt; you can read an interesting paper on doing test-driven research. Finally, I find the &lt;a href="http://www.infoq.com/news/2009/09/Pomodoro"&gt;Pomodoro&lt;/a&gt; method a very interesting approach to individual time management. Since many research projects end up being quasi-individual, Pomodoro fits them pretty well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8615217470550203105?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8615217470550203105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8615217470550203105' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8615217470550203105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8615217470550203105'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/03/managing-research-agile-way.html' title='Managing Research the Agile Way'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm2.static.flickr.com/1077/754511201_3067a868d7_t.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-78272270456258071</id><published>2011-01-31T13:26:00.000-08:00</published><updated>2011-01-31T14:56:02.312-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='trust'/><category scheme='http://www.blogger.com/atom/ns#' term='Social Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='quora'/><category scheme='http://www.blogger.com/atom/ns#' term='game dynamics'/><title type='text'>On Trust Networks and Gamification. Or How Quora can overcome its Hype and embrace long-term Success</title><content type='html'>If you are reading this blog I am pretty sure that you know quite a lot about &lt;a href="http://www.quora.com/"&gt;Quora&lt;/a&gt; by now. If not, you should sign on and try it a bit before you continue reading the post.&lt;br /&gt;&lt;br /&gt;I have to admit it, the first time I saw Quora I thought it looked like a watered-down version of &lt;a href="http://http//stackoverflow.com/"&gt;stackoverflow&lt;/a&gt; , only with a much broader scope. The ability to follow was nice but... "big deal", I thought. However, I was missing the important point of the seamless integration between Quora and existing OSN, namely Twitter and Facebook. I always say that for an OSN to succeed it needs to ride on all the previous successful ones (including email if you allow me to stretch the definition of OSN that far), but I missed that part in quora until its hype began. Having quick connection to Twitter and Facebook, allowed Quora to overcome the always feared cold-start problem. You sign to Quora and in no time you are "connected" to all your "friends" and can start following their questions, their answers,  votes... cool!&lt;br /&gt;&lt;br /&gt;Well, so it seemed. But in no time, just as quick as people starting hyping about the service they were complaining about it and predicting its failure.  This &lt;a href="http://techcrunch.com/2011/01/31/quora-quora-quora-quora-quora-quora-quora/"&gt;recent post at Techcrunch&lt;/a&gt; does a pretty good at summarizing and linking to the main Quora bitchmemes. Don't miss the&lt;a href="http://techcrunch.com/2011/01/23/why-i-don%E2%80%99t-buy-the-quora-hype/"&gt; original post by Vivek Wadhwa&lt;/a&gt; or some of the &lt;a href="http://www.quora.com/What-can-be-said-to-Vivek-Wadhwas-criticism-on-TechCrunch-Why-I-Don%E2%80%99t-Buy-the-Quora-Hype"&gt;threads&lt;/a&gt; at Quora itself. You should also read &lt;a href="http://scobleizer.com/2011/01/30/why-i-was-wrong-about-quora-as-a-blogging-service/"&gt;this very illustrative piece&lt;/a&gt; on how Scobble went from love to hate in a matter of weeks.&lt;br /&gt;&lt;br /&gt;To summarize, the two biggest complaints are the following: (1) Quora will inevitably be overtaken by spam and there will be no way to find good content anymore; and (2) producers of good content (answers) will become tired of the system and progressively leave making problem (1) even more inevitable.&lt;br /&gt;&lt;br /&gt;While I do agree that these (and many other) issues are very important, I don't see them as inevitable and, in the following paragraphs, I would like to describe two ways to address them. But to start with, let me just state that believing that Quora can survive on being an inside-moderated network is not the answer. So what can be done?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Trust networks&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;In a trust network, nodes (users) have an associated trust value that is somehow used to decide how its contribution will be taken into account by the rest of users. For instance, in a recommender system, I can push content by neighboring nodes I trust while filtering out that coming from nodes with a lower trust value. In more sophisticated versions, trust is not a unique value but can be topic-specific. That is, my trust value can be very high for independent music but very low for classic literature. (If you are interested in the general topic of Trust and Social Networks you can read &lt;a href="http://www.amazon.com/exec/obidos/ASIN/1848003552/j16t3i5j15-20"&gt;Golbeck's book&lt;/a&gt; or any of her many publications or presentations available online)&lt;br /&gt;&lt;br /&gt;So let's go back to Quora now: why should they implement a trust network overlay? and, how could they implement a useful one? There are several reasons for why they should be doing so. But let us focus on the spam issue. You do not want for bad answers to get promoted by bad/evil users. The way around it is to not give these users the power to promote answers. And you can do this quite easily by assigning trust values to users. It would take 100 votes by "level 1" users to get an answer to the level of another one with just one vote by a "level 100" user. Of course, as I was mentioning before, this trust level could be topic-sensitive. Makes sense, doesn't it?&lt;br /&gt;&lt;br /&gt;But, there a number of issues that are still unsolved on how to implement this trust network. The first one is who decides to promote and demote users? My answer is quite simple: users themselves. Whenever your answer gets voted up/down so would your trust level. And again, how much this level would go up/down would depend on the trust level of the voting user.&lt;br /&gt;&lt;br /&gt;The only important remaining issue to such an approach is how to deal with the cold-start issue. But the answer to this would come from the integration to other OSN I was mentioning at the beginning. If I were implementing this kind of system, I would give users an initial trust level based on their &lt;a href="http://tunkrank.com/"&gt;TunkRank&lt;/a&gt; or their &lt;a href="http://klout.com/"&gt;Klout&lt;/a&gt; Score.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Gamification&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The other major issue that still needs to be tackled is how we guarantee that users do not become tired of the system and abandon it. I hope it is clear by now that the approach I described above would make things much more interesting for users interested in promoting their trust level. In fact, this is very close to what is known as &lt;a href="http://gamification.org/wiki/Gamification"&gt;gamification&lt;/a&gt; (see also game dynamics or game mechanics for very related concepts). Attach a badge to given levels of trust for some topics and you can start competing with Foursquare check-ins.&lt;br /&gt;&lt;br /&gt;The use of badges, or game dynamics in general in Q&amp;amp;A sites is by no means new. Actually, stackoverflow, that I was referring to earlier in the post, delivers topical &lt;a href="http://stackoverflow.com/badges"&gt;badges&lt;/a&gt;. And obtaining the first badge on a given topic can be an important accomplishment worth noting in your resume. But stackoverflow did not come up with this idea out of the blue: levels of expertise in forums have been used for a long time (see the&lt;a href="http://ubuntuforums.org/announcement.php?f=48"&gt; Coffee Cups/Beans in Ubuntu forums&lt;/a&gt;, for instance).&lt;br /&gt;&lt;br /&gt;I am not saying that implementing these two approaches would guarantee Quora's success. But not implementing them will probably guarantee the opposite. We have all seen potential in Quora. Apart from the quick integration with existing OSN and the ability to follow, there is the real-time component that brings it closer to a Q&amp;amp;A Twitter. If they don't fix these potential issues, somebody else will come up with an improved version that can very well be the "next big thing" that some social media gurus were seeing in Quora just some weeks ago.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-78272270456258071?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/78272270456258071/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=78272270456258071' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/78272270456258071'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/78272270456258071'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2011/01/on-trust-networks-and-gamification-or.html' title='On Trust Networks and Gamification. Or How Quora can overcome its Hype and embrace long-term Success'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-670858703078172135</id><published>2010-11-16T16:12:00.000-08:00</published><updated>2010-11-25T12:44:27.306-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='presentation skills'/><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>Did you prepare your talk?</title><content type='html'>I don't consider myself to be a great presenter. As a matter of fact, every time I finish a presentation, I find myself thinking about how many things I screwed up and could have done much better. However, whenever I attend a conference I face the cruel reality: my presentations are way better than most research presentations. If I am really not that good, it can only mean one thing: researchers generally suck at presenting their work. (This is in fact one of the reasons I am against organizing research conferences around oral presentations. But this is another discussion I will leave for another post).&lt;br /&gt;&lt;br /&gt;So, if you have any doubts of whether you could be in that category of good researcher/poor presenter, you can do a quick test: Watch the video below. If you think your last presentation is well summarized in the video, you definitely fit into the group. Even if you don't, you might find some tips or advice of interest to you in the rest of this post.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;object height="385" width="480"&gt;&lt;embed src="http://www.youtube.com/v/yL_-1d9OSdk?fs=1&amp;amp;hl=es_ES" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="385" width="480"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;OK, so what are the three basic rules to make a decent presentation? Easy: (1) Prepare yourself, (2) prepare yourself, and (3) prepare yourself.&lt;br /&gt;&lt;br /&gt;At this point, you might already be tempted to stop reading because you disagree with what I am saying. I have found several reasons why people disagree with something as obvious as the fact that making a good presentation requires preparation, but I think all of them are summarized in the two following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(a)&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;I'm a natural&lt;/span&gt;: Maybe you are the kind of self-assured person that thinks that has great presentation skills and those shine best the more you improvise. I was pretty close to this myself some time ago. But if you fit into this category,  there is a very easy test you can do: tape yourself on video on several presentations. If you still think you are great and need no preparation or further skills, congratulations! But chances are that then you realize how many things you have been doing wrong and how much you can improve. All great presenters I know stress the fact that preparation is key, period.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(b)&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;I'm a researcher, not a TV Star&lt;/span&gt;: On the other extreme, you might be aware of your limitations but might think that this is not such a big deal. You are a researcher and live in the world of formulas, theories, or code. You could care less about what people get from your talks and you would be happy standing up and doing the chicken, chicken, chicken presentation. And this is not an exaggeration: I have seen junior researchers that are still editing slides a couple of hours before their scheduled presentation in a top conference. My take on this is the following: if you think presentation skills are not part of what is required in a researcher, you are wrong.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TO2uqVrAn-I/AAAAAAAAAJw/tje5_aXwlbI/s1600/ted-presenter.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 445px; height: 297px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TO2uqVrAn-I/AAAAAAAAAJw/tje5_aXwlbI/s200/ted-presenter.jpg" alt="" id="BLOGGER_PHOTO_ID_5543278758758227938" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So by this point I will suppose that you are convinced of the importance of preparing research presentations. Ideally, you have also taped yourself and found that there are many things to improve. The question is what to do next. Obviously, I cannot pretend to summarize a presentation skills course in a post. There are thousands of resources out there in the form of books, videos, or similar that you will find without problem. But I do think that I can pinpoint a few issues that are important and tricks that might help.&lt;br /&gt;&lt;br /&gt;First, I think it is important to separate two kinds of "preparation": (1) mid/long-term preparation aimed at improving your skills, and (2) short-term preparation for your next presentation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Improving your skills&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Again, you can find many books and resources on how to do this. But some of the things that you should at least consider:&lt;br /&gt;&lt;br /&gt;(1) Tape yourself:&lt;br /&gt;&lt;br /&gt;This will make you aware of the weak points and where you need to focus your efforts&lt;br /&gt;&lt;br /&gt;(2) Enjoy the stage:&lt;br /&gt;&lt;br /&gt;Some people have a really hard time every time they go onto stage, and this shows. There are many things you can do to learn techniques and improve on this that go from playing in a band to taking some acting and performance lessons (I did this and found it very useful and enjoyable)&lt;br /&gt;&lt;br /&gt;(3) Read about it:&lt;br /&gt;&lt;br /&gt;No need to become obsessed. But reading a couple of books or watching some videos giving you tips is not going to hurt. And remember, this is part of your expected skill set as a researcher. If you want a starting point, I can recommend you read a short 12 page essay on "&lt;a href="http://pne.people.si.umich.edu/PDF/howtotalk.pdf"&gt;How to give an academic talk v4.0&lt;/a&gt;" by Paul N. Edwards from U. Michigan.&lt;br /&gt;&lt;br /&gt;(4) Rehearse the techniques:&lt;br /&gt;&lt;br /&gt;It is very good if you have situations where you can rehearse what you learn from the previous. Actually, many of the techniques can be applied in "real" life (e.g. when talking to your boss). Others require of a more realistic setting. I have been lucky to use the courses at the university as a rehearse playground for improving my skills&lt;br /&gt;&lt;br /&gt;(5) If you need help, look for it:&lt;br /&gt;&lt;br /&gt;I have seen many cases of researchers with severe communication problems when presenting. Maybe I sound too harsh here, but I don't think this is acceptable. If you really want to be a researcher but don't think you can get to an acceptable level of presenting either (a) have some co-author present for you or (b) find some professional help. And this latter would be my preferred option. It is not so hard nowadays to find coaches or places that can help you out and if you agree this is an ability that you need in your job (and, again, you should agree), it is worth that you invest on it.&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;&lt;/span&gt;&lt;a href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TO2vZ7AcGoI/AAAAAAAAAKA/ElhgjU5si20/s1600/chairs.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 440px; height: 281px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TO2vZ7AcGoI/AAAAAAAAAKA/ElhgjU5si20/s320/chairs.jpg" alt="" id="BLOGGER_PHOTO_ID_5543279576234072706" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;Preparing your next presentation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Regardless of whether you manage to improve your general presentation skills or not, you will have to face your next presentation sooner or later. When preparing the talk, you should focus on its two main components: the slides, and the talk itself.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The slides&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Again, there are many resources out there on how to prepare your slides, the style, the design.. I was lucky to attend a course on Zen presentation style by the guys at &lt;a href="http://www.presentacionesartesanas.com/"&gt;Presentaciones Artesanas&lt;/a&gt;. Zen-style presentations are &lt;a href="http://www.presentationzen.com/presentationzen/2009/05/making-presentations-in-the-ted-style.html"&gt;the kind of slides&lt;/a&gt; you will see at TED, for instance. I try to bear some of the techniques in mine but (a) I am not a professional presenter (that is, although presentations are important in a researcher's life, I have other things to do), and (b) sometimes, transmitting scientific rigor in a very graphical style is not easy (actually, &lt;a href="http://www.wired.com/wired/archive/11.09/ppt2.html"&gt;according to Tufte&lt;/a&gt;, even Powerpoint should be banned from scientific publications). However, I do recommend to understand some of the design concepts behind the Zen style and maybe use some of them as a basis.&lt;br /&gt;&lt;br /&gt;Once you have found your style, you will need to do the following tasks:&lt;br /&gt;&lt;br /&gt;(1) Know your audience:&lt;br /&gt;&lt;br /&gt;Before you start preparing the presentation, take some time to understand who you will be talking to. It's not the same to do a talk at a conference than pitch your work to business people, present to a prospective employer, or, like I did last week, try to convince high-schoolers of how cool Computer Science is.&lt;br /&gt;&lt;br /&gt;Even if you are only focusing on research presentations at conferences, they are not all the same! Sometimes you will be giving a talk in a setting where everybody is an expert in what you are talking about, while in other occasions only a tiny fraction of the audience is working in your same field. In my case, I won't use the same kind of approach if I am presenting at a Recsys conference where everybody knows about Recommender Systems than at a generic one like WWW, where I can only assume that most of the audience does not know the topic in depth.&lt;br /&gt;&lt;br /&gt;It is also important to look at the program schedule. The name of your session and the talk immediately before and after yours is going to give you more information about who might be sitting in. If you are presenting in a conference with multiple tracks, the talks scheduled at the same time as yours will give you some hint about who is *not* going to be attending yours.&lt;br /&gt;&lt;br /&gt;(2) Find "the message"&lt;br /&gt;&lt;br /&gt;Find a simple take-away message that you want to get through to your audience. In many cases it will be something along the lines of  "look how important and interesting my research is, please go ahead and look more into it by reading the paper... and don't forget to cite it in your next publication". But in other to transmit that idea you need to make your point. Therefore, find the answers to: (a) what problem does your work solve, (b) what makes your work different from other solutions, and (c) why should anybody care about it. These three questions should help you find the message. Stick to one/two ideas and refer the audience to the paper for more details. Trying to squeeze in too many messages in too little time is a recipe for disaster.&lt;br /&gt;&lt;br /&gt;Some researchers like to add another secondary message thread: (d) I am really smart and what I did is so complicated you might not even grasp it... I particularly dislike this kind of presentations and find them pretentious and boring (maybe because I am not so smart). But hey, I know some people have made quite a career of this so you should be aware.&lt;br /&gt;&lt;br /&gt;(3) Prepare a script&lt;br /&gt;&lt;br /&gt;Once you have identified the "main message," you are ready to prepare the script of the slides. I usually start off by having a bunch of empty slides with only the title on them. By having this, I can see if I might be going over time, need to sort things out differently... The script will depend on the kind of talk and time you have to speak. But in general, it will have a structure such as:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Introduce context and situation&lt;/li&gt;&lt;li&gt;Formulate problem and why it is important to solve&lt;/li&gt;&lt;li&gt;Main message (Solution to the problem, consequences, details on the solution...)&lt;/li&gt;&lt;li&gt;Summary on problem and solution&lt;/li&gt;&lt;li&gt;Future work and things to do&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The script is important, but be ready and willing to change it. You are likely not to get it perfect from the start and as soon as you start adding more detail you will see a clearer picture. Don't make the "sticking to the plan" hit you back.&lt;br /&gt;&lt;br /&gt;(4) Make the visuals&lt;br /&gt;&lt;br /&gt;Maybe you think this is the least important part of your presentation. In my experience, I have come to value the visuals very much. Actually, most of the time I spend in preparing some presentations is looking for appropriate visuals that back up and re-enforce the "main message". The less familiar the audience is with your topic, or the less hardcore researchy it is, the more time you will want to spend choosing appropriate visuals. Some well-chosen pictures will make your message more sticky. And you might find some images that are so powerful that might make you go back to your script and twist it a bit.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;The talk&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Once you have the slides more or less ready, you can start preparing the talk itself. Bear this in mind: if you prepare the slides but not the talk, your presentation is likely to suck. Some ideas and tips that can help you in the process:&lt;br /&gt;&lt;br /&gt;(1) Tape yourself&lt;br /&gt;&lt;br /&gt;As I mentioned before, taping yourself is one of the best tools I have found for improving your presentation skills. It is also an amazing tool for preparing your next talk. If you watch a video of yourself rehearsing the talk, you will be able to analyze what you are explaining wrong, where you are wasting your time, what jokes don't make sense... Besides, it is a perfect timing tool: not only you will get the exact duration of your talk in the video but also how you distributed it. This will allow you to make sure that you are devoting the right amount of time to getting the "main message" through.&lt;br /&gt;&lt;br /&gt;(2) Test technical issues as many times as possible&lt;br /&gt;&lt;br /&gt;Don't need to mention &lt;a href="http://en.wikipedia.org/wiki/Murphy%27s_law"&gt;Murphy's law&lt;/a&gt;, I suppose, but if something can go wrong, it will. No matter how much I check things over and over, there are always technical issues that catch me by surprise.&lt;br /&gt;&lt;br /&gt;In my last talk, I had a couple of videos that I knew could be problematic. I spent a lot of time in making sure they were working in the presentation. I even tried my laptop with a secondary monitor to make sure. I asked the host well in advance to make sure that I had the possibility of connecting the audio output of my laptop. And on the day, I went to the hall and tested the audio. I was even going to test the videos but the audience was already half in, so I preferred to keep the surprise and tested with some random music instead (big mistake here!). What happened? Videos did not show, so I had to improvise opening them with another program and that did not work very well either because of a limitation on the projector's resolution, I think.&lt;br /&gt;&lt;br /&gt;In my case, I like to put myself in situations of technological risks and I like the feeling of doing a complex live demo in a presentation that I know can fail (I guess is like the adrenaline rush I used to have when playing in a concert). But I have to admit that the safest advice is to keep technical challenges as simple as possible. And be anal in checking many times those that you know are likely to fail.&lt;br /&gt;&lt;br /&gt;Of course, checking technical details means, among other things, that you need to be in the room for your talk well in advance and test the presentation in the same conditions you are going to it later (even if that means missing one of the coffee breaks in the conference!).&lt;br /&gt;&lt;br /&gt;(3) Be ready for improvising&lt;br /&gt;&lt;br /&gt;And my last piece of advice may seem to contradict the rest. I have been talking about the importance of preparing many details of the talk. However, a presentation should always leave room for improvisation and adaptation. There is nothing worse than the feeling that the speaker has learned the conference by heart, and is not making any attempt to connect with the audience and the context. Besides, there might be elements during the talk that might force you to improvise: a technical issue, different audience that you expected, a reaction from somebody...&lt;br /&gt;&lt;br /&gt;You should be able to put any kind of external element into your presentation while not losing the main message. I don't like having to skip slides since it gives the impression that you are in a rush to finish, but many times there is no alternative: You might have lost precious time in trying to play that video, or maybe went too far in the introduction and now you need to cut short. It is again very important that you have a clear picture of what the "main message" is and improvise by skipping those slides that are not needed to understand it.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TO2vKopEwmI/AAAAAAAAAJ4/kljMWuizEUk/s1600/microphone.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 433px; height: 289px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TO2vKopEwmI/AAAAAAAAAJ4/kljMWuizEUk/s320/microphone.jpg" alt="" id="BLOGGER_PHOTO_ID_5543279313606197858" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I hope that some of this advice is useful in your next presentations. But I would like to hear from you: how do you prepare your talks? Any tips or suggestions you want to share in the comments?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-670858703078172135?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/670858703078172135/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=670858703078172135' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/670858703078172135'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/670858703078172135'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/11/did-you-prepare-your-talk.html' title='Did you prepare your talk?'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TO2uqVrAn-I/AAAAAAAAAJw/tje5_aXwlbI/s72-c/ted-presenter.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6033418578807580712</id><published>2010-09-29T02:52:00.000-07:00</published><updated>2010-09-29T03:14:50.895-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='internships'/><category scheme='http://www.blogger.com/atom/ns#' term='opportunities'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Internship positions on Recommender Systems</title><content type='html'>At Telefonica Research we are looking for young and talented researchers willing to expand their horizons by working in an exciting environment in beautiful Barcelona. And, as you will know if you follow this blog, I am particularly interested in working with PhD students whose research focus is Recommender Systems but also neighboring areas such as Data Mining, User Modeling, Social Networks, and Information Retrieval. We offer three month internships and interesting conditions.&lt;br /&gt;&lt;br /&gt;Work from previous interns has been published in top conferences such as SIGIR, WWW, Recsys, Web Intelligence... (see my &lt;a href="http://xavier.amatriain.net/index_publications.html"&gt;list&lt;/a&gt; of recent publications, most of which include interns)&lt;br /&gt;&lt;br /&gt;And, if you want references of what an internship in Telefonica is like, you might want to contact some of our previous interns in the group:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://an.kaist.ac.kr/%7Emycha/"&gt;Meeyoung Cha&lt;/a&gt;, currently Assistant Professor at KAIST&lt;/li&gt;&lt;li&gt;&lt;a href="http://an.kaist.ac.kr/%7Ehaewoon/"&gt;Haewoon Kwak&lt;/a&gt;, PhD student at KAIST&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.cs.ucl.ac.uk/staff/n.lathia/"&gt;Neal Lathia&lt;/a&gt;, currently Researcher at UCL&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.sis.pitt.edu/%7Ejahn/homepage/Home.html"&gt;Jae-wook Ahn&lt;/a&gt;, PhD student at U. Pittsburgh &lt;/li&gt;&lt;li&gt;&lt;a href="https://www.inf.unibz.it/%7Elbaltrunas/research.html"&gt;Linas Baltrunas&lt;/a&gt;, PhD student at U. Bolzano&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.sis.pitt.edu/%7Edparra/"&gt;Denis Parra&lt;/a&gt;, PhD student at U. Pittsburgh&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.dtic.upf.edu/%7Emramirez/"&gt;Miguel Ramirez&lt;/a&gt;, PhD student at U. Pompeu Fabra&lt;/li&gt;&lt;/ul&gt;Please contact me for more details.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6033418578807580712?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6033418578807580712/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6033418578807580712' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6033418578807580712'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6033418578807580712'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/09/internship-positions-on-recommender.html' title='Internship positions on Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-1532363530206898266</id><published>2010-09-23T15:53:00.000-07:00</published><updated>2010-09-24T05:02:08.467-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mobile'/><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='privacy'/><title type='text'>Contextual Movie Recommendations on an iPhone based on Expert Collaborative Filtering</title><content type='html'>If you follow this blog, you probably have already read about Expert Collaborative Filtering and &lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;The Wisdom of the Few&lt;/a&gt;. Maybe you also read about our recent implementation of the approach to &lt;a href="http://technocalifornia.blogspot.com/2010/07/music-recommendation-through-expert.html"&gt;recommend music&lt;/a&gt;. Well if you are around in &lt;a href="http://recsys.acm.org/2010"&gt;Recsys 2010 conference&lt;/a&gt; next week you will get to see a demo of yet another prototype on Monday's Demo Session.&lt;br /&gt;&lt;br /&gt;We are presenting an iPhone application based on the Expert Collaborative Filtering approach. The application is the result of Josep Bach's undergrad final thesis and you can read the full-blown description of the project in &lt;a href="http://xavier.amatriain.net/pubs/GeolocatedRecommendations.pdf"&gt;his dissertation&lt;/a&gt;. The application, however, is much more than yet another implementation of Expert CF. The main highlights for me is that (a) you can offer personalized recommendations on a phone with 100% privacy guarantees, and (b) you can run a recommendation algorithm on the device, with minimum intervention from the server-side.&lt;br /&gt;&lt;br /&gt;Both these issues can be explained by the client-server architecture depicted below. The server is in charge of compiling all the public information available on the web by crawling critic websites like Rottentomatoes. It also gathers information about local cinemas and their schedules. All this information, which again is public,  is stored in a SQL database and shared through a RESTful API with devices.&lt;br /&gt;&lt;br /&gt;The device, in this case an iPhone but could be anything else, connects to the server and syncs a local database through the RESTful API. Once this is done, all needed information is local on the device. Plus... all the personal information about the user (i.e. ratings on movies in this case). The recommendation algorithm can then run locally and return results in a reasonable time because the set of experts is limited.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJvdIQTB95I/AAAAAAAAAJI/w4twxI-9vYY/s1600/Architechture.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 585px; height: 364px;" src="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJvdIQTB95I/AAAAAAAAAJI/w4twxI-9vYY/s320/Architechture.jpg" alt="" id="BLOGGER_PHOTO_ID_5520248902156154770" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Another important addition to the application is that we have added contextual features. The recommendations you will get on the app depend on your location and the time of the day. Therefore, it will recommend things that match your taste according to the expert-based prediction but also are playing in a cinema nearby now.&lt;br /&gt;&lt;br /&gt;We haven't done a full user evaluation yet, but informal results are very encouraging. We hope you can come and test it in Recsys and give us your feedback. We will soon post a video. But for now, here are some screenshots of the main app screens.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TJvc9gpXREI/AAAAAAAAAJA/Nx35djvpU-M/s1600/Recommendation.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 284px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TJvc9gpXREI/AAAAAAAAAJA/Nx35djvpU-M/s320/Recommendation.png" alt="" id="BLOGGER_PHOTO_ID_5520248717566231618" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;List of Recommendations given your preferences, critics ratings but also your location&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TJvc1bWrkqI/AAAAAAAAAI4/_11FodZyXzk/s1600/MovieInfo.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 294px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TJvc1bWrkqI/AAAAAAAAAI4/_11FodZyXzk/s320/MovieInfo.png" alt="" id="BLOGGER_PHOTO_ID_5520248578706739874" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;Information on a movie, including critics ratings, your ratings, and also closest cinema that is playing with next show times&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TJvcplCtTUI/AAAAAAAAAIw/15B9f-dQ8V8/s1600/CinemaScreen.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 310px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TJvcplCtTUI/AAAAAAAAAIw/15B9f-dQ8V8/s320/CinemaScreen.png" alt="" id="BLOGGER_PHOTO_ID_5520248375148891458" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;Screen showing cinemas near your current location (in blue)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TJvch9EGiYI/AAAAAAAAAIo/6uqb5n3-kcE/s1600/CinemaInfo.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 300px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TJvch9EGiYI/AAAAAAAAAIo/6uqb5n3-kcE/s320/CinemaInfo.png" alt="" id="BLOGGER_PHOTO_ID_5520248244158237058" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;Information on the closest cinema near you&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-1532363530206898266?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/1532363530206898266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=1532363530206898266' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1532363530206898266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1532363530206898266'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/09/contextual-movie-recommendations-on.html' title='Contextual Movie Recommendations on an iPhone based on Expert Collaborative Filtering'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJvdIQTB95I/AAAAAAAAAJI/w4twxI-9vYY/s72-c/Architechture.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6662749976207881997</id><published>2010-09-15T13:26:00.000-07:00</published><updated>2010-09-17T01:32:03.454-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Social Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>The end of the Age of Search?</title><content type='html'>A couple of days ago, there was an interesting &lt;a href="http://www.nytimes.com/2010/09/13/technology/13search.html?_r=1"&gt;article in the &lt;/a&gt;&lt;a href="http://www.nytimes.com/2010/09/13/technology/13search.html?_r=1"&gt;New York Times&lt;/a&gt; on how social networks are changing the search experience. The truth is that the article is a bit confusing and mixes up several different issues. As a matter of fact, most of the article ends up being an introduction to &lt;a href="http://www.hunch.com/"&gt;Hunch&lt;/a&gt;, a very interesting recommendation site (thus not "search") based on different technologies including social recommendations.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://hunch.com/media/img/graph-bg.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 367px; height: 220px;" src="http://hunch.com/media/img/graph-bg.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I twitted the link to the article and presented it as another prove of the "End of the Age of Search". That got me into a very &lt;a href="http://www.google.com/buzz/xavier.amatriain/TRHavZcgPte/xamat-social-recsys-and-the-end-of-the-Age-of"&gt;interesting Buzz conversation&lt;/a&gt; with &lt;a href="http://glinden.blogspot.com/"&gt;Greg Linden&lt;/a&gt; on why I thought that the age of search is coming to an end. I promised that I would try to write a more elaborate post to make my point if I had the time... and here I am.&lt;br /&gt;&lt;br /&gt;The first time I read about the "end of the age of search" was in an article titled &lt;a href="http://money.cnn.com/magazines/fortune/fortune_archive/2006/11/27/8394347/"&gt;the race to create a smart Google&lt;/a&gt; at CNN Money. As a matter of fact, the discussion on how recommender systems were going to render search engines obsolete was cited by Recsys 2009 organizers and turned into &lt;a href="http://recsys.acm.org/2009"&gt;their homepage&lt;/a&gt; motto.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJKiKSv6q8I/AAAAAAAAAIY/GPrnvZoNYkU/s1600/some-questions-cant-be-answered-by-google.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 257px;" src="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJKiKSv6q8I/AAAAAAAAAIY/GPrnvZoNYkU/s320/some-questions-cant-be-answered-by-google.jpg" alt="" id="BLOGGER_PHOTO_ID_5517650791197486018" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So, credit given to Jeffrey O'Brien at CNN Money and Recsys09 organizers, I picked up on this idea and have been elaborating it in several of my presentations. Two years ago, for instance, I gave a  presentation on &lt;a href="http://www.slideshare.net/xamat/recommendations-as-the-future-of-search"&gt;Recommendations as the Future of Search&lt;/a&gt; in an open research day organized by our lab. The main story, that I have repeated several time since then, is the&lt;br /&gt;following:&lt;br /&gt;&lt;story&gt;&lt;br /&gt;"&lt;br /&gt;Think about it: Search is not an ultimate need for people. What people need is information. The fact that they have been using search and this has been so successful is (mainly) because that is the only tool we gave them.&lt;br /&gt;&lt;br /&gt;Search by itself is not enough to compensate the ever-growing information overload. First, there is the issue that for most given queries, you will get many more results than a user can ever go through. So you are faced with the problem of how to turn that huge set into the "ten blue links" (i.e. first results page). But, there are more or less smart ways to do so by taking into account context, user preferences and so on.&lt;br /&gt;&lt;br /&gt;The main issue, however, is a different one: every search action requires users to explicitly formulate a query. From our geek perspective, we usually forget how difficult it is for a regular user to formulate a query given an information need. Even if you are fairly proficient, it might be complicated to turn a fairly trivial information need into something you can formulate in a simple query (take a look at&lt;a href="http://technocalifornia.blogspot.com/2010/07/being-social.html"&gt; the experiment&lt;/a&gt; I did with some SIGIR attendees when I asked them to search for my daughter's name, which is actually written in &lt;a href="http://xavier.amatriain.net/"&gt;my homepage&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;So, of course, the bottom line of my story has "traditionally" been that Recommender Systems represent a step forward since (a) they provide ways to assess relevance taking into account personal preferences and context, and (b) they can provide results without the need for explicit queries. I still believe Recommender Systems will have much to say in the way search is handled in the future. Of course, I work on the area, so you might think my opinion is a bit biased. But you don't have to take my word for it. In this year's &lt;a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html"&gt;industry track at the SIGIR conference&lt;/a&gt; both Yahoo and Google mentioned "implicit search" as one of the most important trends. Now I hear that Google's Schmidt is talking about &lt;a href="http://musically.com/blog/2010/09/08/eric-schmidt-talks-google-music-autonomous-search-and-the-launch-of-google-tv-in-the-us-this-autumn/"&gt;autonomous search&lt;/a&gt;. They are all different ways of talking of &lt;span style="font-weight: bold;"&gt;Recommender Systems&lt;/span&gt; (which maybe, as my friend @mramirez suggested, is not the sexiest name for a research/technology area).&lt;br /&gt;&lt;/story&gt;"&lt;br /&gt;&lt;br /&gt;But, we have recently learned some data that can be used as a supporting evidence of the end of the Age of Search. Nielsen just published some results that show a &lt;a href="http://blog.nielsen.com/nielsenwire/online_mobile/top-us-search-sites-for-july-2010/"&gt;16% drop in web searches&lt;/a&gt; over the last year. And this is something pretty symptomatic! I disagree with some of the comments in that same post by Nielsen saying that this drop is due to the use of mobile devices. Unfortunately, I cannot say that this is due to the huge success of recommender systems either. In my opinion, there is one main reason for this: &lt;span style="font-weight: bold;"&gt;Search is being replaced by Social Networks&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TJKq64FCBGI/AAAAAAAAAIg/cMGt6oeM3mo/s1600/NielsenSearchEngines.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 473px; height: 200px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TJKq64FCBGI/AAAAAAAAAIg/cMGt6oeM3mo/s320/NielsenSearchEngines.png" alt="" id="BLOGGER_PHOTO_ID_5517660421944902754" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;If you think about it, most information needs users have are not about a particular and concrete piece of information such as "who wrote War and Piece". They are actually much less precise needs such as "what is there to do this weekend" or "is there a cool music album I could listen to while I go to work tomorrow". Or even things like "what important stuff has happened in the world today" or "I need to find a job better than the one I have". If you consider information needs like this, you will realize that the answer is much more likely to come out of your social network than out of an artificially formulated query.&lt;br /&gt;&lt;br /&gt;But not only that, I think we would agree that most of the time,  when people go into the Internet, they don't have any information need beyond the prototypical "see what's up" or "catch up". And what do they do? They log into Facebook or check Twitter. It is clear that in this cases, search is out of the picture.&lt;br /&gt;&lt;br /&gt;Yet another worrying trend for the future of search is the decrease in the use of the web browser and the increase of "walled internet gardens". This idea hit front lines last month with &lt;a href="http://www.wired.com/magazine/2010/08/ff_webrip/all/1"&gt;Chris Anderson's piece in Wired&lt;/a&gt; and has been covered throughout, so I won't go into it (but this is the main reason I am trying to avoid the use of the word "Web" in favor of "Internet" in this post).&lt;br /&gt;&lt;br /&gt;As a finishing note, I don't want anybody to get the wrong impression that by talking about the end of search as the driver for web development, I am implying that search-oriented companies like Google is doomed. Of course not. Google know most of what is in this post as well as I do. As I mentioned, they are more and more talking about implicit or autonomous search as a proxy word for recommender systems. And in the case of social networks taking over search, I am pretty convinced they would agree with such a vision for the future. That is why they have been trying to get into the social scene so hard lately with Buzz, Wave... and more recently the rumors are that they are working in a Facebook killer called &lt;a href="http://www.pcworld.com/article/205471/google_gets_serious_about_social_networking_is_google_me_coming_in_2010.html?tk=hp_new"&gt;Google Me&lt;/a&gt; or at least they are looking into adding more and more &lt;a href="http://mashable.com/2010/09/15/google-social-networking/"&gt;social features into their search engine&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Interesting times to be doing research in this area because as Search supremacy comes to an end, we will have more space to fill in the void with newer and much cooler ideas.&lt;br /&gt;&lt;br /&gt;As always... looking forward to your feedback.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6662749976207881997?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6662749976207881997/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6662749976207881997' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6662749976207881997'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6662749976207881997'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/09/end-of-age-of-search.html' title='The end of the Age of Search?'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TJKiKSv6q8I/AAAAAAAAAIY/GPrnvZoNYkU/s72-c/some-questions-cant-be-answered-by-google.jpg' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3918881966387555070</id><published>2010-08-25T01:55:00.000-07:00</published><updated>2010-08-25T06:36:30.094-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='music recommenders'/><category scheme='http://www.blogger.com/atom/ns#' term='lastfm'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><category scheme='http://www.blogger.com/atom/ns#' term='survey'/><title type='text'>Study on online music taste: call for participation</title><content type='html'>Are you a music listener and &lt;a href="http://www.lastfm.com"&gt;lastfm&lt;/a&gt; user? Are you interested in helping out research while having the chance to win a $600 Amazon gift card? Please help us understand online music tastes by completing a survey that will only take around 15 minutes of your time and might even be fun!&lt;br /&gt;&lt;br /&gt;All you need to do to participate is go to &lt;a href="http://musicsurvey.webhop.net/MusicSurvey/"&gt;this page&lt;/a&gt; and provide your last.fm username and a valid email. We will check if you meet the requirements (at least 18 y.o. and 5000 scrobbles on lastfm) and we will then send you a link to your personalized survey.&lt;br /&gt;&lt;br /&gt;Thanks for your time!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3918881966387555070?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3918881966387555070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3918881966387555070' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3918881966387555070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3918881966387555070'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/08/study-on-online-music-taste-call-for.html' title='Study on online music taste: call for participation'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5887324770605068433</id><published>2010-08-08T14:29:00.001-07:00</published><updated>2010-08-09T01:34:16.854-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='matrix factorization'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Multiverse Recommendations (aka using n-dimensional tensor factorization for context-aware collaborative filtering)</title><content type='html'>This post is the first of several in which I will be explaining some of the things we are presenting in the upcoming &lt;a href="http://recsys.acm.org/"&gt;Recsys 2010&lt;/a&gt; conference. The project I will talk about is led by &lt;a href="http://www.ci.tuwien.ac.at/%7Ealexis/Welcome.html"&gt;Alexandros Karatzoglou&lt;/a&gt; and presents a new approach to context aware recommendations that we have named Multiverse. You can access the full paper &lt;a href="http://xavier.amatriain.net/pubs/karatzoglu-recsys-2010.pdf"&gt;here&lt;/a&gt;, but I will give you a brief description in this post.&lt;br /&gt;&lt;br /&gt;The introduction of context in recommender systems is an area of growing interest. The reason is simple: While we all value the fact that Recommender Systems are able to infer our tastes and recommend new things, it is clear that whatever we like - and are willing to receive - depends on the context. E.g. We do not want to receive the same movie recommendations on TV if we are sitting with the kids on a Sunday afternoon or if we are alone on a late night session. There is a growing body of literature on contextual recommendations. Without going any further, I already &lt;a href="http://technocalifornia.blogspot.com/2009/09/context-aware-recommendations.html"&gt;posted&lt;/a&gt; about context-aware recommendations with micro-profiles on this blog. Also, there is a very good chapter on the topic on the upcoming &lt;a href="http://www.springer.com/computer/ai/book/978-0-387-85819-7"&gt;Recommender Systems Handbook&lt;/a&gt;. But, while we wait for it, you might want to look at some of the publications by &lt;a href="http://ids.csom.umn.edu/faculty/gedas/"&gt;Adomavicius&lt;/a&gt; and &lt;a href="http://pages.stern.nyu.edu/%7Eatuzhili/"&gt;Tuzhilin&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Context takes the recommender problem from a two dimensional problem, where we have users and items, to an n-dimensional one where we can have many contextual dimensions added. In our work, we have generalized the successful matrix factorization approach to this n-dimensional case. In order to do this, we have used the idea of &lt;a href="http://en.wikipedia.org/wiki/Tensor"&gt;tensors&lt;/a&gt;, which are precisely a generalization of matrices to n dimensions. The following figure illustrates the idea (note that, for simplicity, we are illustrating the 3 dimensional case with just one contextual variable).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TF8iY3moyvI/AAAAAAAAAII/goKewJTy8KY/s1600/hosvd-tensor.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 174px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TF8iY3moyvI/AAAAAAAAAII/goKewJTy8KY/s320/hosvd-tensor.png" alt="" id="BLOGGER_PHOTO_ID_5503155080308247282" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In the paper, we show how this approach outperforms previously existing methods on a number of different datasets. One of these results is illustrated in the figure below. Note how Tensor Factorization (in green) not only outperforms other methods, but it performs better the more contextual information we add. It is also interesting to note how not observing context information (black line) results in worse performance. When we add contextual information to 80% of our data, not using this information yields a result that is almost 50% worse.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TF8iK7FH_aI/AAAAAAAAAIA/Dx2yX8aHZ6Y/s1600/mae-vs-change-prob.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 449px; height: 336px;" src="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/TF8iK7FH_aI/AAAAAAAAAIA/Dx2yX8aHZ6Y/s320/mae-vs-change-prob.png" alt="" id="BLOGGER_PHOTO_ID_5503154840723258786" border="0" /&gt;&lt;/a&gt;The use of context in recommender systems and other areas of information retrieval is a very interesting topic that is likely to get even more attention in the near future. _We will surely contribute to this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5887324770605068433?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5887324770605068433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5887324770605068433' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5887324770605068433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5887324770605068433'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/08/multiverse-recommendations-aka-using-n.html' title='Multiverse Recommendations (aka using n-dimensional tensor factorization for context-aware collaborative filtering)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TF8iY3moyvI/AAAAAAAAAII/goKewJTy8KY/s72-c/hosvd-tensor.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-4610598741425800006</id><published>2010-07-29T16:07:00.000-07:00</published><updated>2010-07-30T02:15:03.361-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='experts'/><category scheme='http://www.blogger.com/atom/ns#' term='music recommenders'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='collaborative filtering'/><title type='text'>Music Recommendation through Expert-based Collaborative Filtering</title><content type='html'>In September I will be presenting the paper entitled "&lt;span style="font-style: italic;"&gt;Towards Fully Distributed and Privacy-preserving Recommendations via Expert Collaborative Filtering and RESTful Linked Data&lt;/span&gt;" in the &lt;a href="http://www.yorku.ca/wiiat10/"&gt;2010 International Conference on Web Intelligence&lt;/a&gt; in Toronto. You can read the full paper &lt;a href="http://xavier.amatriain.net/pubs/ahn-xamatriain-WI-2010.pdf"&gt;here&lt;/a&gt;, but in this post I will try to give you a taste of what is hidden behind such a long title.&lt;br /&gt;&lt;br /&gt;This paper should be understood as a continuation of my research on Expert Based Collaborative Filtering -- the so-called Wisdom of the Few. I recommend you take a look at &lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;my previous post&lt;/a&gt; on this issue before moving on.&lt;br /&gt;&lt;br /&gt;So the basic idea from our previous work was to use domain experts as the only asset for creating neighborhood and predicting item utility in a similar way as is done in standard kNN collaborative filtering. We made some claims of how that method provided many practical advantages over standard approaches. We also claimed that the approach was scalable and flexible enough to be used in many domains. Unfortunately, at that point, we did not have time to implement and prove all that.&lt;br /&gt;&lt;br /&gt;The current work presents a practical full-fledged implementation of the approach in the music domain. Our goal is to prove some of the previous claims as well as to stablish an architectural framework for expert collaborative filtering providing, among other things, 100% privacy protection.&lt;br /&gt;&lt;br /&gt;The following screenshot will give you an idea of the application. In the client side, it is a Flex/Air stand-alone application that can work in most operating systems. You can rate music albums, see the ratings from the experts, and get personalized recommendations based on that. We also provide access to extended information for albums via links to lastfm as well as access to &lt;a href="http://linkeddata.org/"&gt;Linked Data&lt;/a&gt; resources from &lt;a href="http://musicbrainz.org/"&gt;MusicBrainz&lt;/a&gt; and others.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-WDiASj2I/AAAAAAAAAHs/1TVowtYGL0I/s1600/wotf-ui-albuminfo.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 386px; height: 400px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-WDiASj2I/AAAAAAAAAHs/1TVowtYGL0I/s400/wotf-ui-albuminfo.png" alt="" id="BLOGGER_PHOTO_ID_5494275057827090274" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The key architectural differences between standard and expert Collaborative Filtering are illustrated in the figure below. Note that in our expert CF, user ratings are kept in the client machine. On the other hand, expert ratings are downloaded into the local machine and the computation for the predictions is performed there avoiding any privacy breach.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-V5jyaE-I/AAAAAAAAAHk/Qu2gzrv-YnM/s1600/Distribution.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 590px; height: 266px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-V5jyaE-I/AAAAAAAAAHk/Qu2gzrv-YnM/s400/Distribution.png" alt="" id="BLOGGER_PHOTO_ID_5494274886507041762" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The next figure gives some more details of how we implemented the solution in our case. Again, note that the server is only used to crawl and store expert ratings publically available on the web. Those ratings are then queried from the client through a &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;REST&lt;/a&gt;-style web api. The computation of neighbors and predictions is then performed in the local machine.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TD-VsmSVmrI/AAAAAAAAAHc/_PWFrlAi03I/s1600/Wotf_system_process.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 469px; height: 322px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TD-VsmSVmrI/AAAAAAAAAHc/_PWFrlAi03I/s320/Wotf_system_process.png" alt="" id="BLOGGER_PHOTO_ID_5494274663839537842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;You might be wondering where we got our expert ratings from. If in our previous work, we crawled our movie ratings from &lt;a href="http://www.rottentomatoes.com/"&gt;rottetomatoes&lt;/a&gt;, we now turned to &lt;a href="http://www.metacritics.com/"&gt;metacritics&lt;/a&gt;. The figure below illustrates the number of ratings per critic. In the top positions, we can see AllMusicGuide with over 3500 ratings, or Pitchfork, Uncut, and Mojo, with over 3000.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-VdnzGhtI/AAAAAAAAAHU/ygHnnM88If4/s1600/rating_count_per_critic_new.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 492px; height: 353px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-VdnzGhtI/AAAAAAAAAHU/ygHnnM88If4/s320/rating_count_per_critic_new.png" alt="" id="BLOGGER_PHOTO_ID_5494274406547359442" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I believe that expert collaborative filtering is a very flexible and valid paradigm in many domains. It can offer better results than other kinds of recommendations while solving many of the shortcomings such as scalability, privacy, or cold-start. We are currently working in other deployments in the mobile space, for example. But I will explain that in a future post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-4610598741425800006?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/4610598741425800006/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=4610598741425800006' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4610598741425800006'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4610598741425800006'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/07/music-recommendation-through-expert.html' title='Music Recommendation through Expert-based Collaborative Filtering'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_xAtUP4Gu6Zk/TD-WDiASj2I/AAAAAAAAAHs/1TVowtYGL0I/s72-c/wotf-ui-albuminfo.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2910903137943442076</id><published>2010-07-22T09:25:00.000-07:00</published><updated>2010-07-27T02:40:54.590-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sigir2010'/><category scheme='http://www.blogger.com/atom/ns#' term='mobile'/><category scheme='http://www.blogger.com/atom/ns#' term='Social Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Being Social</title><content type='html'>This was the title of my talk at the &lt;a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html"&gt;SIGIR Industry track&lt;/a&gt; this year. I wanted to post the slides online (see below). However, since there is little explanation in them, I will briefly try to walk you through the story line in this post. Also at the end of the post I added a video of the talk (with some minor gaps), this should also help you get the full picture in case you are missing something.&lt;br /&gt;&lt;br /&gt;&lt;div style="width: 425px; text-align: center;" id="__ss_4817393"&gt;&lt;strong style="margin: 12px 0pt 4px; display: block;"&gt;&lt;a href="http://www.slideshare.net/xamat/being-social-4817393" title="Being Social"&gt;Being Social&lt;/a&gt;&lt;/strong&gt;&lt;object id="__sse4817393" height="355" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirpresentation-xamatriain-100722115048-phpapp02&amp;amp;stripped_title=being-social-4817393"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed name="__sse4817393" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirpresentation-xamatriain-100722115048-phpapp02&amp;amp;stripped_title=being-social-4817393" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="355" width="425"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="padding: 5px 0pt 12px;"&gt;View more &lt;a href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/xamat"&gt;Xavier  Amatriain&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The presentation was about some of the projects we are doing at the Telefonica Research Group in Barcelona. In particular, I show some of my projects on Recommender Systems but also others led by Karen Church and Josep M. Pujol&lt;br /&gt;&lt;br /&gt;(slides 2-7)&lt;br /&gt;But first let me introduce what is &lt;a href="http://www.telefonica.com/"&gt;Telefonica&lt;/a&gt; for those of you who don't know (probably only applicable if you live in the US or Asia). Telefonica is one of the largest Telecom companies in the world (3rd in market cap). It has had a significant growth in the last 20 years, going from a Spain-only company with 12M customers to operating in more than 25 countries and having over 260M customers. &lt;a href="http://www.tid.es/"&gt;Telefonica I+D&lt;/a&gt; (or R&amp;amp;D) is the Research and Development branch. It is the largest private research center in Spain and second largest in Europe.  Finally the Research Group of Telefonica I+D has around 20 permanent research scientists covering areas such as multimedia, mobile and ubiquitous computing, social networks, p2p and content distribution, wireless systems, user modeling and data mining, and HCIR.&lt;br /&gt;&lt;br /&gt;(slides 8-9)&lt;br /&gt;One of the important issues, not only for users but also for a company like ours, is to find ways to deal with information overload. In very few years we have gone from counting the information we were exposed every day to counting the one we are exposed every second. Twitter streams, facebook updates, photos, videos... It's too much to cope with. Besides, this leads to the so-called "&lt;a href="http://en.wikipedia.org/wiki/The_Paradox_of_Choice:_Why_More_Is_Less"&gt;Paradox of Choice&lt;/a&gt;", after the very interesting book by Barry Schwartz. Having more choices does not necessarily lead to more freedom. In fact, it often leads to the opposite. If we have many choices, we tend to choose less because of the &lt;a href="http://en.wikipedia.org/wiki/Analysis_paralysis"&gt;Analysis Paralysis&lt;/a&gt;. And we tend to choose worse, because we oversimplify the choice and use only superficial features.&lt;br /&gt;&lt;br /&gt;(slides 10-11)&lt;br /&gt;We tend to think that search engines have the answer to everything, but that is not always true. Actually, searching is not an ultimate human need. Accessing relevant information is. One of the reason search engines are not the ultimate answer to information needs by people is the interface. We technical geeks think that formulating the right query is easy, but this is far from trivial for the non-technical average user.&lt;br /&gt;&lt;br /&gt;(slides 12-13)&lt;br /&gt;The good news is that you are not alone. There are many people seeking relevant information. And actually, some of them are your "friends". They might be able to help you find what you need.&lt;br /&gt;&lt;br /&gt;(slides 14-15)&lt;br /&gt;I did an interesting test by posting a question on twitter. The question was: "What is my daughter's name?". This information is available in my homepage. Still it is hard to find using any search engine. I received three correct answers hours after. They had all, one way or the other, used my social network (see details of the paths in the slide.&lt;br /&gt;&lt;br /&gt;(slides 16-19)&lt;br /&gt;This leads me to the first project, &lt;a href="http://porqpine.com/"&gt;Porqpine&lt;/a&gt;, led by Josep M. Pujol. Porqpine is a social and distributed search engine that uses the principle of lazy collaboration by letting users collaborate without extra effort. It allows to find personalized and context-aware answers. And it is stand-alone but can co-exist with other search engines. What is does is to locally cache the page and record user interactions (e.g., bookmarking). Then, searches by querying  local caches of a user’s friends.  Pages that friends have “interacted with” are ranked higher. It also uses a proxy masking the identity of the friend. It is currently a Firefox addon that can be downloaded &lt;a href="http://porqpine.com/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(slides 20-21)&lt;br /&gt;We can somewhat overcome content overload by using social input. However, nowadays we are beyond content overload. We also suffer from *context overload*. Our information need also depends on where we are, the time it is, who we are with, what activity we are doing... And this is especially relevant if we consider that the web device of the future is not the desktop but rather the mobile device (be it phone, Ipad...). And a mobile device is not a computer!&lt;br /&gt;&lt;br /&gt;(slides 22-29)&lt;br /&gt;Besides of the importance of context, a mobile phone is personal. People also tend to look for more "fresh" content. And there are some queries like "where is the nearest florist?" that are easy to answer. But what about more personal needs like "Where is that cool cocktail bar I went to the other day... I know there were jazz concerts on Thursday and it's near an old church." And what about discovery and serendipity? What about getting help for deciding? Or points of interest in general? or events?&lt;br /&gt;&lt;br /&gt;(slides 30-36)&lt;br /&gt;All this lead to a question: Can we improve the search and discovery experience of mobile users by providing a readily available connection to their social? The answer was Karen Church's &lt;a href="http://karenchurch.com/research.html"&gt;SSB&lt;/a&gt; (Social Search Browser). SSB is an iPhone optimized web-application plus a Facebook app. When launched it centers on the users current physical location and displays all queries/questions posted by other users in that location. As users pan/zoom the set of queries is updated Users can post new queries or interact with queries of others. We did two field studies in Ireland. The surprising results where that SSB became much more than a tool for finding information. It became a tool for helping and sharing experiences and for supporting curiosity. It was actually seen as an extension of people's social network.&lt;br /&gt;&lt;br /&gt;So I have shown how you can somewhat minimize content and context overload by tapping onto your social network. Ideally, for most tasks, you want to rely on your close "friends". However, for many information needs, your friends might not be enough and you need to resort to the crowds. We have come to know about the "Wisdom of the Crowds". If I ask enough people, I can be sure that the majority will be right.&lt;br /&gt;&lt;br /&gt;(slides 37-39)&lt;br /&gt;But, there is a problem with that: Crowds are not always wise. We don't realize that many times, users are noisy in giving their feedback. Besides, many times our data is too sparse to draw correct conclusions. In our paper "&lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;I Like it, I like it not&lt;/a&gt;" we studied how consistent people were in giving their opinion. We found very significant inconsistencies especially in mild opinions, but also in negative ones.&lt;br /&gt;&lt;br /&gt;(slides 40-44)&lt;br /&gt;So, if we cannot trust the crowds... who can we trust? The experts. As Malcolm Gladwell puts it in Blink, "It is really only experts who can reliably account for their reactions". We know that experts might be biased or trying to steer our opinion. However, even in that case, they will be reliably and consistently doing so. Thus, they can become much better anchor points for predictions. In our &lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;"Wisdom of the Few"&lt;/a&gt;, we presented a Collaborative Filtering approach based on experts from the Web. The basic idea is to find individuals who we can trust to have given reliable opinions on a given domain. These expert opinions are then used to determine who are your most similar experts. The final prediction is then done by computing a standard kNN Collaborative Filtering. Expert Collaborative filtering has many advantages over standard approaches. In particular, it is more scalable and it allows for 100% privacy preservation. This is because, user ratings do not need to be shared on a central repository. Expert opinions can be downloaded locally to perform the computation. We have developed several prototypes including a music recommender system and a mobile cinema recommender with geolocation.&lt;br /&gt;&lt;br /&gt;(slide 45)&lt;br /&gt;As a final summary: We all probably knew about Information Overload. But now it is not only that, we also have Context Overload. We can cope with both by using our social network. This means using our friends if possible or the crowds when necessary. However, crowds are not always as wise as they might seem and we are better off using experts.&lt;br /&gt;&lt;br /&gt;Hope this is a good enough summary so you get the main message and can follow the pointers to more detailed information. You might also want to watch the talk in the following 3 videos that cover most of it.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;Part 1:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="385" width="480"&gt;&lt;param name="movie" value="http://www.youtube.com/v/9AtvjvkLzSE&amp;amp;hl=en_US&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/9AtvjvkLzSE&amp;amp;hl=en_US&amp;amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="385" width="480"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;br /&gt;Part 2:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="385" width="480"&gt;&lt;param name="movie" value="http://www.youtube.com/v/lHJ5H3pseSI&amp;amp;hl=en_US&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/lHJ5H3pseSI&amp;amp;hl=en_US&amp;amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="385" width="480"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;Part 3:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="385" width="480"&gt;&lt;param name="movie" value="http://www.youtube.com/v/v8a9d0la_Jw&amp;amp;hl=en_US&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/v8a9d0la_Jw&amp;amp;hl=en_US&amp;amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="385" width="480"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2910903137943442076?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2910903137943442076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2910903137943442076' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2910903137943442076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2910903137943442076'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/07/being-social.html' title='Being Social'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3961156321204083966</id><published>2010-06-28T15:59:00.000-07:00</published><updated>2010-06-30T06:19:00.320-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tourism'/><category scheme='http://www.blogger.com/atom/ns#' term='mobile'/><category scheme='http://www.blogger.com/atom/ns#' term='hci'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Off the beaten track</title><content type='html'>Next september, Nava Tintarev will be presenting a paper that she and I co-author in &lt;a href="http://mobilehci2010.di.fc.ul.pt/"&gt;Mobile HCI 2010&lt;/a&gt;, in Lisbon. This paper, entitled "Off the Beaten Track - a mobile field study exploring the long tail of mobile tourist recommendations" presents our results on a field study for tourist recommendations. We sent a number of tourists off to visit Barcelona. They were instructed to use a tailored smartphone app which included recommendations of places they could visit.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TCspolkfDSI/AAAAAAAAAG0/TgTl1fKCGZw/s1600/almudenamadrid.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 248px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TCspolkfDSI/AAAAAAAAAG0/TgTl1fKCGZw/s320/almudenamadrid.png" alt="" id="BLOGGER_PHOTO_ID_5488526348137729314" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In the paper, we evaluate the effectiveness, satisfaction and divergence from popularity of a  personalized recommender system comparing it to recommending most popular sites. We found that participants visited more of the recommended POIs for lists with popular but non-personalized recommendations. In contrast, the personalized recommendations led participants to visit more POIs overall and visit places "off the beaten track". The level of satisfaction between the two conditions was comparable and high, suggesting that our participants were just as happy with the rarer, "off the beaten track" recommendations and their overall experience. We believe that personalized recommendations set tourists into a discovery mode with an increased chance for serendipitous findings.&lt;br /&gt;&lt;br /&gt;This paper is the first of a line of research on tourist recommendations that I have just started and hope to be complementing with new publications soon. I will keep you posted in the blog. In the meantime, you can download the full pdf &lt;a href="http://xavier.amatriain.net/pubs/tintarev-mobilehci10.pdf"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3961156321204083966?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3961156321204083966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3961156321204083966' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3961156321204083966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3961156321204083966'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/06/off-beaten-track.html' title='Off the beaten track'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/TCspolkfDSI/AAAAAAAAAG0/TgTl1fKCGZw/s72-c/almudenamadrid.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-376512675604861380</id><published>2010-06-04T16:14:00.000-07:00</published><updated>2010-06-07T09:17:10.215-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='time'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Temporal diversity in Recommender Systems</title><content type='html'>Next month, &lt;a href="http://www.cs.ucl.ac.uk/staff/n.lathia/"&gt;Neal Lathia&lt;/a&gt; will be presenting a paper where I have collaborated in &lt;a href="http://www.sigir2010.org/doku.php"&gt;SIGIR&lt;/a&gt;. In the paper we address the issues of Temporal Diversity and Novelty in Recommender Systems. You can read the paper &lt;a href="http://www.cs.ucl.ac.uk/staff/n.lathia/publications/sigir10.html"&gt;here&lt;/a&gt;, but I will try to give you a brief summary in this post.&lt;br /&gt;&lt;br /&gt;Recommender systems are usually evaluated on their accuracy, that is, their ability to predict how much a user will like/dislike an item given a set of past ratings. However, in any practical scenario, there are many other things that need to be taken into account to evaluate whether a system is giving good an interesting recommendations. One of these relevant issues is the diversity of the top-N recommendation lists. It does not matter that our recommendation is more or less accurate if time after time we recommend the user the same things. A user should expect that the system takes into account her feedback in order to improve and give different and better recommendations.&lt;br /&gt;&lt;br /&gt;In the paper, we evaluate the importance of temporal diversity for users through a user survey. Then we analyze the performance of known collaborative filtering algorithms, and we propose different ways to introduce temporal diversity while using traditional recommendation algorithms.&lt;br /&gt;&lt;br /&gt;We found several interesting results on how user rating behavior affects temporal diversity. For instance, users with large profiles are likely to see less diversity. However, the amount of ratings introduced since last recommendation correlates directly with more diversity. This suggests that we need to encourage users to rate while implementing mechanisms that prevent profiles from growing too large therefore preventing diversity. Smart mechanisms for rating "aging" might be useful.&lt;br /&gt;  &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TAmJKjyHu5I/AAAAAAAAAGY/xIUk7e1nBeo/s1600/accuracy.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TAmJKjyHu5I/AAAAAAAAAGY/xIUk7e1nBeo/s400/accuracy.jpg" alt="" id="BLOGGER_PHOTO_ID_5479061236170079122" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Perhaps an even more interesting finding is that, as illustrated in the figure above, different algorithms perform differently regarding temporal diversity. SVD, for instance, is known to be more precise than kNN in the general case. However, it is interesting to note that it is also much less diverse. Therefore, even a simple decision between SVD or kNN as the base of a recommender system cannot be done disregarding issues such as temporal behavior of the algorithms.&lt;br /&gt;&lt;br /&gt;Again, much more in the &lt;a href="http://www.cs.ucl.ac.uk/staff/n.lathia/publications/papers/lathia_sigir10.pdf"&gt;paper&lt;/a&gt; and, as always, looking forward to your comments and feedback.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-376512675604861380?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/376512675604861380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=376512675604861380' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/376512675604861380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/376512675604861380'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/06/temporal-diversity-in-recommender.html' title='Temporal diversity in Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xAtUP4Gu6Zk/TAmJKjyHu5I/AAAAAAAAAGY/xIUk7e1nBeo/s72-c/accuracy.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6839223123886857463</id><published>2010-05-24T16:03:00.000-07:00</published><updated>2010-05-27T04:25:56.154-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='recsys'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Recsys 2010 Update</title><content type='html'>As most of you know, I am co-chairing (together with &lt;a href="http://marctorrens.net/"&gt;Marc Torrens&lt;/a&gt;) the &lt;a href="http://recsys.acm.org/2010/"&gt;2010 ACM Recommender Systems Conference &lt;/a&gt;(Recsys 2010 for short) to be held in September in Barcelona. After &lt;a href="http://technocalifornia.blogspot.com/2009/04/recsys-2010-in-barcelona.html"&gt;announcing it&lt;/a&gt; in this blog some time back, I thought it was time to give a brief update on the highlights. This is just a summary, but if you want to be up to data, please bookmark the website, or follow us on &lt;a href="http://twitter.com/recsys2010"&gt;twitter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I am listing the highlights in more or less chronological order (or as they come to mind). In no way, the order is meant to imply importance or relevance.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;Venue&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Hosting a conference in Barcelona is already great. As we explain in the conference website, the city has &lt;a href="http://recsys.acm.org/2010/local-attractions-and-venue/local-attractions/"&gt;much to offer&lt;/a&gt;. But, what can be better than having said conference in a convention center surrounded by the sea in the harbor, just down from the Ramblas? Well, this is where the&lt;a href="http://www.wtcbarcelona.com/"&gt; Barcelona WTC&lt;/a&gt; is located (see picture below). And although we did have other options in the city, we couldn't help but falling in love with the place.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://recsys.acm.org/2010/files/2010/01/wtc_photo.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 270px;" src="http://recsys.acm.org/2010/files/2010/01/wtc_photo.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Workshops&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Recsys workshops were already a huge success last year in NY. But, judging by the high-quality proposals we have this year, it seems they are still getting better!  This year we accepted 8 &lt;a href="http://recsys.acm.org/2010/workshops/"&gt;workshops&lt;/a&gt; (actually it is 6 one-day and 1 two-day) and we have accommodated them in two different days: before and after the three days of the main conference. On Sunday Sept. 26th, we have a workshop on Information Fusion, one on Social Recsys, one on Music, and the first part of the two-day Context-aware workshop and challenge. On Thursday Sept. 30th, we have the second part of that workshop plus new workshops on Practical Uses, on e-Learning (in conjunction with the  &lt;a href="http://www.ectel2010.org/" target="_blank" onclick="urchinTracker('/outgoing/www.ectel2010.org/?referer=http%3A%2F%2Frecsys.acm.org%2F2010%2Faccomodation%2F');"&gt;EC-TEL 2010&lt;/a&gt; conference ), and one on Recsys evaluation.&lt;br /&gt;&lt;br /&gt;There is so much to choose from that the problem will be trying to decide which one not to attend. I am really looking forward to all of these workshops.&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Tutorials&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We have also line-up three &lt;a href="http://recsys.acm.org/2010/tutorials/"&gt;very interesting tutorials&lt;/a&gt; by key figures in the field.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cs.bgu.ac.il/%7Eshanigu/"&gt;Guy Shani&lt;/a&gt;, from Ben Gurion University, will be giving a very much anticipated tutorial on Evaluating Recsys.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www-users.cs.umn.edu/%7Ekonstan/"&gt;Joe Konstan&lt;/a&gt;, one of the fathers of the field and chair of SIGCHI for several years will be introducing the use of HCI techniques in Recsys.&lt;br /&gt;&lt;br /&gt;Finally, &lt;a href="http://recsys.acm.org/www.baeza.cl/"&gt;Ricardo Baeza-Yates&lt;/a&gt;, VP of Yahoo Research, will be talking about predicting and recommending queries. Apart from being in my PhD committee, Ricardo is the only researcher I know who has a publication with more than 7000 citations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;" &gt;&lt;span style="font-size:100%;"&gt;Papers&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;We still don't know many details about the main attraction of the conference: the accepted papers. However, we do know that the submissions went up from last year. Bearing those numbers in mind the anticipated acceptance rate will be below 20%... which takes the conference to levels of 1st tier.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Hotels&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We have arranged a great &lt;a href="http://recsys.acm.org/2010/accomodation/"&gt;list of hotels&lt;/a&gt; in Barcelona so you have plenty to choose from. If you can afford to stay in the Eurostars Grand Marina, that should be your first pick since it is located right on the conference venue and it is an amazing hotel. However, we know most budgets are tight nowadays so we have included two great 4* hotels (conveniently located and both top 50 out of 600 in tripadvisor). We have even added a more affordable 3* hotel that is also conveniently located and has good reviews.&lt;br /&gt;&lt;br /&gt;To be honest, if I had to travel to Barcelona myself, I couldn't find better choices than these.&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Local Festivity&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Each year, Barcelona celebrates its major festivity by the End of September. This year, major events will be scheduled 23-26th, just before the start of the conference. If you want an even better taste of our local culture, we recommend you come a couple of days earlier and enjoy the festivity. Visit the &lt;a href="http://www.bcn.cat/merce/en/index.shtml"&gt;Merce website&lt;/a&gt; for more details.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;All in all, we are looking forward to a great conference and hope to see you here!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6839223123886857463?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6839223123886857463/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6839223123886857463' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6839223123886857463'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6839223123886857463'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/05/recsys-2010-update.html' title='Recsys 2010 Update'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5254408546735478461</id><published>2010-05-03T13:37:00.001-07:00</published><updated>2010-05-06T14:47:55.214-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='publications'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>How many rejections make an acceptance?</title><content type='html'>Now that I have a decent (though not impressive) publication record, I realize that people might think that it was easy to get here. Not only that, some of you now starting with your first rejections, might think that researchers like me were fortunate enough not to get the kind of reviews you are getting. Even worse, you might think that we are better than you. Here is a secret for you: I have got many more rejections than acceptances during my career. And, although I have managed to improve my stats significantly (experience counts, that's a fact), I still do!&lt;br /&gt;&lt;br /&gt;I don't want to generalize here. I am sure there are researchers that manage to get all the papers in, starting on their very first submission, and only rarely get pissed at a rejection. But to be honest, I don't know any of them. Top-tier conferences and journals are designed to have low acceptance rates (10 to 20%). Also, they are supposedly designed not to let "bad stuff" in, but they are not optimized to make sure "good stuff" is not left out!&lt;br /&gt;&lt;br /&gt;Here is the best kept secret for researchers: you have to learn to digest your rejections and endure.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;A case study&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;As I was writing &lt;a href="http://technocalifornia.blogspot.com/2010/04/frameworks-generate-domain-specific.html"&gt;my last post&lt;/a&gt; on the paper I have published IEEE Transactions on Software Engineering, I was thinking on the long story of this particular submission and thought it might be good to share it. My goal is not to complain about how the publication system works (although there is meat there for several posts), but rather to encourage you to persist and not give up. I have omitted names of journals to preserve anonymity and to avoid pointing the finger.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;My first move was to send the paper to a 2nd-tier journal. I was working on other journals at that time and this one was not a priority. I simply wanted to get the word out of that part of my thesis. This journal seemed ideal. The result is that, for the only time in my life, I got the paper rejected without revision! Yes, no reviewers or anything like that. Just the editor telling me that the paper was not relevant for the journal. As later events proved, this was pretty much bullshit.&lt;/li&gt;&lt;li&gt;Because of the disappointment and the work I had put into it, I decided to send it to another journal. Only that I decided that would be a top 1st tier journal instead. This time, I did get reviews. I have to admit that the reviews were insightful this time. The main issue was that the reviewers wanted me to focus more on the practical case and less on the general model. So I got a Major Revision out of this. I am pretty sure I could have gone for it at that point but decided not to. Again, I had other papers to focus on and this one was not my priority. But I kept the paper and the reviews.&lt;/li&gt;&lt;li&gt;The story of the paper would have ended there, if it wasn't because I received an invitation from a researcher I know. He was guess editor to a special edition on the same journal I had sent on (2) and, because he knew my work, was inviting me to submit. Why not? I pulled the draft out of the drawer (figuratively, that is) and started working on the reviews I had previously received in that same journal. Right, isn't it? No, wrong! It turned out that the reviewers were complaining about completely different things. Because this was a special issue, all reviewers were uber-experts in this field. And they questioned technicalities and semantics as if they were very importance. However, the reviews were not bad at all. Actually, the paper might had made it this time to the regular journal. But because it was a special issue and they could only accept 5-6 papers out of 50 submissions, it was rejected... again!&lt;/li&gt;&lt;li&gt;So here I am, with a paper I had started working on 5 years before and I knew had not been accepted for bad luck. I took it more or less like it was and sent it to an even better journal (as a matter of fact, out of revenge, I just looked for the journal on software engineering with the highest impact factor). And guess what, it was accepted! I did have to do some revisions though. The funny thing is that those revisions were essentially meant to take the paper to how it was in step one since reviewers asked for less details on the practical case and more meat on the general model.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;End of the story. So I have had a paper accepted to a top journal like IEEE Transactions on Software Engineering. This paper is essentially the same that I had 5 years ago and was rejected without review by a crappy second-tier one. And in between, I had to steer one way and the other several times... what a ride!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;Y. I am convinced! I will send all my drafts tomorrow and hope they eventually make it...&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;NO!&lt;br /&gt;&lt;br /&gt;I don't want the post to be understood as a call to send crappy research anywhere you can in the hope that it makes it at some point. This doesn't work. Period. I review enough papers to know that people send unacceptable stuff (specially to journals). I am talking about papers that are clear reject after you have spent 5 minutes reading them.  I feel the pain, and I am the first one that does not want to waste my time reading pseudo-papers that are sent in the hope that nobody notices how bad they are.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;Y. You are so right there! People who send unacceptable things for review should be banned from the community, their paper published on a website for everybody to see, and...&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Wait, hold on a sec. That is not the way to go either!&lt;br /&gt;&lt;br /&gt;Maybe you are reading this post and were fortunate enough to do your PhD in a top US university. And you had an advisor who called you into the office twice a week. An advisor who read each of your drafts 20 times and marked it all in red for you to repeat many times. If you are in this kind of situation, I have some news for you: you are the exception, no the rule. Many PhD students go to just decent universities and are lucky if they have an advisor that is remotely interested in their thesis. Even in that case, it is likely that the advisor has too many things to do to provide any kind of regular feedback.&lt;br /&gt;&lt;br /&gt;Students and junior researchers use submissions to get feedback on their research... and to learn. And you know what, I think that, as much as I hate reviewing bad papers, they are right to do so. What do you think?&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;You're not alone&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To finish this post/rant, I wanted to point out that having great ideas rejected is part of human history, you are not alone. Read, for instance, &lt;a href="http://www.columbia.edu/%7Exs23/reject.htm"&gt;this list&lt;/a&gt; where X. Sala i Martin has compiled  famous ideas that were rejected (some of them, like the xerox machine, time after time). Also, &lt;a href="http://michaelnielsen.org/blog/three-myths-about-scientific-peer-review/"&gt;this post &lt;/a&gt;by Michael Nielsen has a very interesting discussion on the reliability of peer review, where he cites other examples of famous scientific ideas that were rejected.&lt;br /&gt;&lt;br /&gt;As always, your comments are most welcomed!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5254408546735478461?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5254408546735478461/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5254408546735478461' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5254408546735478461'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5254408546735478461'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/05/how-many-rejections-make-acceptance.html' title='How many rejections make an acceptance?'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7847714984927988462</id><published>2010-04-30T16:22:00.001-07:00</published><updated>2010-05-04T16:31:52.658-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mda'/><category scheme='http://www.blogger.com/atom/ns#' term='framework'/><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='domain-specific'/><category scheme='http://www.blogger.com/atom/ns#' term='dsl'/><category scheme='http://www.blogger.com/atom/ns#' term='pattern language'/><title type='text'>Frameworks generate Domain Specific Languages</title><content type='html'>"&lt;a href="http://xavier.amatriain.net/pubs/xamatriain-IEEE-TSE-2010.pdf"&gt;Frameworks generate Domain Specific Languages: a case-study in the Multimedia Domain&lt;/a&gt;" is the title of a paper I have authored and has just been published in &lt;a href="http://www.computer.org/portal/web/csdl/doi/10.1109/TSE.2010.48"&gt;IEEE Transactions on Software Engineering&lt;/a&gt;. As you might have guessed, the title of the paper is a twist on the classical "&lt;a href="http://portal.acm.org/citation.cfm?id=758674"&gt;Patterns Generate Architectures&lt;/a&gt;" by Kent Beck and Ralph Johnson.&lt;br /&gt;&lt;br /&gt;This publication completes the trilogy of journal articles I intended to get out of my Thesis (two others are "&lt;a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4303016"&gt;A Domain-Specific Metamodel for Multimedia Processing Systems&lt;/a&gt;" in IEEE Transactions on Multimedia, and "&lt;a href="http://www.springerlink.com/content/6744jq80vk0h79pn/"&gt;A framework for efficient and rapid development of cross-platform audio applications&lt;/a&gt;" in ACM Multimedia Systems). My intention was to get this paper published a long time ago but I will save that juicy story for a future post.&lt;br /&gt;&lt;br /&gt;But, in any case, what are we presenting in the paper? Before explaining this, you might need to understand, in case you don't already, what a domain-specific language is:&lt;br /&gt;&lt;br /&gt;A &lt;a href="http://en.wikipedia.org/wiki/Domain-specific_language"&gt;Domain-Specific Language&lt;/a&gt; (DSL) is a high-level programming or modeling language tailored for a specific domain. A DSL lacks the flexibility of general-purpose languages but it offers powerful constructs that are closer to the concerns and constructs of the particular domain.&lt;br /&gt;&lt;br /&gt;In this work we claim that the aim of any software development process should be to come up with a DSL. In order to do this we bridge the gap between &lt;a href="http://en.wikipedia.org/wiki/Model-driven_architecture"&gt;Model-driven Architectures&lt;/a&gt; (MDA), &lt;a href="http://en.wikipedia.org/wiki/Pattern_language"&gt;Pattern Languages&lt;/a&gt; and Framework Development and embed all into a coherent and agile development model. This is illustrated in the figure below:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/S9tnFd3MhnI/AAAAAAAAAGQ/m84ChumKdm4/s1600/DevelopmentProcess.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 588px; height: 479px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/S9tnFd3MhnI/AAAAAAAAAGQ/m84ChumKdm4/s400/DevelopmentProcess.png" alt="" id="BLOGGER_PHOTO_ID_5466075916357371506" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here is the transcript of the abstract (complete preprint pdf &lt;a href="http://xavier.amatriain.net/pubs/xamatriain-IEEE-TSE-2010.pdf"&gt;here&lt;/a&gt;):&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;blockquote&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:85%;"&gt;"We present an approach to software framework development that includes the generation of domain-specific languages (DSL) and pattern languages as goals for the process. Our model is made of three workflows -- framework, metamodel, and patterns -- and three phases -- inception, construction, and formalization. The main conclusion is that when developing a framework we can produce with minimal overhead -- almost as a side-effect -- a metamodel with an associated DSL, and a pattern language. Both outputs will not only help the framework&lt;/span&gt; &lt;span style="font-size:85%;"&gt;evolve in the right direction but will also be valuable in themselves.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;In order to illustrate these ideas, we present a case-study in the multimedia domain. For several years we have been developing a multimedia framework. The process has produced a full-fledged domain-specific metamodel for the multimedia domain, with an associated DSL, and a pattern language. "&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7847714984927988462?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7847714984927988462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7847714984927988462' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7847714984927988462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7847714984927988462'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/04/frameworks-generate-domain-specific.html' title='Frameworks generate Domain Specific Languages'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/S9tnFd3MhnI/AAAAAAAAAGQ/m84ChumKdm4/s72-c/DevelopmentProcess.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-1962172584656599482</id><published>2010-02-28T15:26:00.000-08:00</published><updated>2010-03-07T16:14:38.340-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='streaming'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='mapreduce'/><category scheme='http://www.blogger.com/atom/ns#' term='hive'/><category scheme='http://www.blogger.com/atom/ns#' term='pig'/><category scheme='http://www.blogger.com/atom/ns#' term='pipes'/><category scheme='http://www.blogger.com/atom/ns#' term='hadoop'/><title type='text'>The Hadoop Ecosystem (a personal overview from a non-expert)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://hadoop.apache.org/images/hadoop-logo.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 71px;" src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;A couple of weeks ago, I attended a three-day course on &lt;a href="http://hadoop.apache.org/"&gt;Hadoop&lt;/a&gt; from the guys at &lt;a href="http://www.cloudera.com/"&gt;Cloudera&lt;/a&gt;. Although I had heard and read about Hadoop before, this was a great opportunity to learn many details on Hadoop and find out about several tools that make up the Hadoop ecosystem. If, like me before, you only have a rough idea of what's in Hadoop, you should be interested in the post. Take what I say with a grain of salt, since I am no expert in Hadoop. However, because I am not an expert, I think I can guarantee a fresher look and you can trust I am not trying to sell you the project. But, if you &lt;span style="font-weight: bold;"&gt;are&lt;/span&gt; an expert and you read the post, you might want to give feedback in case I got something wrong.&lt;br /&gt;&lt;br /&gt;Hadoop is an open source java implementation of the &lt;a href="http://en.wikipedia.org/wiki/MapReduce"&gt;MapReduce&lt;/a&gt; framework introduced by Google. The main developer and contributor to Hadoop, however, is Yahoo. It might seem weird that one of Google's main competitors releases an open source version of a framework they introduced. More so, when Google has recently been granted a patent for it. However, it seems unlikely that Google can execute their patent. &lt;a href="http://gigaom.com/2010/01/19/why-hadoop-users-shouldnt-fear-googles-new-mapreduce-patent/"&gt;One of the main reasons&lt;/a&gt; is that Map and Reduce functions have been known and used in functional programming for many years. Another reason is that Hadoop has gained a huge popularity as part of the Apache project. Enforcing the patent would not get Google much love from many companies that are now making a living of it or using it as an important component in their web architecture.&lt;br /&gt;&lt;br /&gt;But, before we go into any more detail, it would be good to understand what can Hadoop be used for and when we should think about adopting it. First, and above all, Hadoop is a framework for &lt;span style="font-weight: bold;"&gt;data analysis and processing&lt;/span&gt;. Therefore, if you have no data, or if you have no need to process it, do not continue with this post. Hadoop is sometimes presented as an alternative to traditional relational databases. However, it is not a database (although it does provide a noSQL one called HBase as one of its tools), it is a framework for distributing data processes. Ok, so here was the second keyword: &lt;span style="font-weight: bold;"&gt;distribution&lt;/span&gt;. If you think you can do whatever you need to do in a single machine, you don't need Hadoop. However, you might want to look at it anyway, since distributing your data processes can be cheaper and also much more reliable. And finally, and related to the previous, using Hadoop only makes sense if you are processing &lt;span style="font-weight: bold;"&gt;large&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;datasets&lt;/span&gt; and by large I mean several TB's.&lt;br /&gt;&lt;br /&gt;However, even if your problem fits into the three previous conditions (distributed processing of large datasets) you can still not be completely sure Hadoop is your solution. Distributed relational databases are still an option. I won't go into the details, but you might want to read at some voices that are recently stepping in to defend the scalability of relational databases and their applicability in highly demanding large datasets. These two posts are good reading: "&lt;a href="http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/"&gt;Gett&lt;/a&gt;&lt;a href="http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/"&gt;ing Real about NoSQL and the SQL-Isn't-Scalable Lie&lt;/a&gt;" and "&lt;a href="https://lwn.net/SubscriberLink/376626/eb49eddf0edda33e/"&gt;SCALE 8x: Relational vs. non-relational&lt;/a&gt;". I would also recommend this recent presentation on "&lt;a href="http://www.slideshare.net/jbellis/what-every-developer-should-know-about-database-scalability-pycon-2010"&gt;What every developer should know about database scalability&lt;/a&gt;"&lt;br /&gt;&lt;br /&gt;So now that we have some intuition of when Hadoop may be of interest, let me introduce the two main issues behind Hadoop: MapReduce and HDFS.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;MapReduce&lt;/span&gt; is a programming model introduced by Google, which is at the core of Hadoop. It is based on the use of two functions taken from functional programming: Map and Reduce. Map processes a (key,value) pair into a list of intermediate (key,value) pairs. Reduce takes an intermediate key and the set of values for that key. Both the mapper and reducer functions are written by the user. The framework groups together intermediate values associated with the same key in order to pass them to the corresponding Reduce.&lt;br /&gt;&lt;br /&gt;MapReduce claims to be a sufficiently generic programming model that most data processing tasks can be decomposed in such a way. If you are interested in learning more, I recommend you start with &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;Google's paper&lt;/a&gt;. You can also take a look at &lt;a href="http://www.youtube.com/watch?v=yjPBkvYh-ss&amp;amp;feature=related"&gt;Google's set of videos&lt;/a&gt; introducing the framework. If you want a more "academic" presentation, you might want to take a look at &lt;a href="http://www.youtube.com/watch?v=mVXpvsdeuKU"&gt;these UC Berkeley classes&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The other important core issue in Hadoop I mentioned before is the &lt;a href="http://hadoop.apache.org/common/docs/current/hdfs_design."&gt;Hadoop Distributed File System&lt;/a&gt; (&lt;span style="font-weight: bold;"&gt;HDFS&lt;/span&gt;). HDFS is the equivalent of the Google File System (GFS) used in the original MapReduce framework. This filesystem is optimized for reading in streaming large files (from several gigabytes to terabytes). Note that HDFS does not allow, for instance, to edit a file once it has been written.&lt;br /&gt;&lt;br /&gt;Ok, so now we have the basics in place: how do we use Hadoop? Since Hadoop is written in &lt;span style="font-weight: bold;"&gt;Java&lt;/span&gt;, the most straightforward to get started is by using its Java API. If you look at the &lt;a href="http://hadoop.apache.org/common/docs/current/mapred_tutorial.html"&gt;Hadoop Map/Reduce tutorial&lt;/a&gt;, for instance, you will see how the framework is introduced through its Java API.&lt;br /&gt;&lt;br /&gt;But, if you want to use Hadoop but would rather keep away from Java, there are plenty of other options. First, there is &lt;a href="http://hadoop.apache.org/common/docs/r0.15.2/streaming.html"&gt;Hadoop &lt;span style="font-weight: bold;"&gt;Streaming&lt;/span&gt;&lt;/a&gt;, which allows to use arbitrary program code with Hadoop. Stdin and Stdout are used for data flow, and each mapper and reducer is defined in a separate program. This comes in very handy if you want to use Hadoop through a scripting language. Now, if you want to have a greater performance in your mapper and reducer functions and would like to call compiled C++ code instead, your solution is called &lt;a href="http://developer.yahoo.com/hadoop/tutorial/module4.html#pipes"&gt;Hadoop &lt;span style="font-weight: bold;"&gt;Pipes&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img style="margin: 0px auto 10px; text-align: left; cursor: pointer; width: 140px; height: 60px;" src="http://hadoop.apache.org/hive/images/hive_logo_medium.jpg" alt="" border="0" /&gt;&lt;img style="margin: 0px auto 10px; text-align: left; cursor: pointer; width: 63px; height: 82px;" src="http://hadoop.apache.org/pig/images/pig-logo.gif" alt="" border="0" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;Now, what if you still would like to access the Hadoop framework but do not fancy the MapReduce programming mode? In other words, is there any higher-level and more programmer friendly way to interface with Hadoop? And the answer is, of course, yes. There are several ways to do this but I will mention two of them: &lt;span style="font-weight: bold;"&gt;Hive&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;Pig&lt;/span&gt;. &lt;a href="http://hadoop.apache.org/hive/"&gt;Hive&lt;/a&gt; is a tool developed at Facebook that allows for an SQL-like access to the Hadoop infrastructure. Although the project is not very mature yet, this is a very interesting option to consider and it seems to be giving &lt;a href="http://www.facebook.com/note.php?note_id=89508453919"&gt;very good results to Facebook&lt;/a&gt;. The other option is to use &lt;a href="http://hadoop.apache.org/pig/"&gt;Pig&lt;/a&gt;, developed by Yahoo. Pig provides a higher-level language called Pig Latin that increases productivity, especially if you are dealing with non-java programmers that are closer to the domain (e.g. data analysts). Pig Latin is a dataflow language and it even has a graphical front-end plugin for Eclipse called &lt;a href="http://wiki.apache.org/pig/PigPen"&gt;PigPen&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I would not like to finish this personal overview of the Hadoop Ecosystem without mentioning&lt;br /&gt;&lt;a href="http://lucene.apache.org/mahout/"&gt;Mahout&lt;/a&gt;, a project for distributed machine learning with Hadoop. Among its examples, &lt;span style="font-weight: bold;"&gt;Mahout&lt;/span&gt; includes an implementation of several collaborative filtering algorithms for recommendation. I would also encourage you to take a look at this list of &lt;a href="http://atbrox.com/2010/02/12/mapreduce-hadoop-algorithms-in-academic-papers-updated/"&gt;academic papers about or using Hadoop&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;At this point I have to say that I have mixed feelings about Hadoop and about MapReduce itself. Although it is a powerful framework with immediate application to real-life problems that involve very large datasets, the model feels more like a kludge than a paradigm shift. I understand why people turn to tools like Hive and Pig that hide the MapReduce complexity behind more friendly models such as ER and Dataflow networks. Providing a framework that is both efficient, usable but also conceptually illuminating is definitely an area to work in the future. And it seems that I am not the only one thinking along these lines. Even Yahoo themselves are &lt;a href="http://www.theregister.co.uk/2010/02/16/yahoo_and_ron_brachman_on_mapreduce/"&gt;looking into new ways&lt;/a&gt; that go beyond Hadoop and MapReduce.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-1962172584656599482?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/1962172584656599482/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=1962172584656599482' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1962172584656599482'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1962172584656599482'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/02/hadoop-ecosystem-personal-overview-from.html' title='The Hadoop Ecosystem (a personal overview from a non-expert)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5019548240288100433</id><published>2010-02-02T15:39:00.000-08:00</published><updated>2010-02-11T02:18:10.016-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='agile'/><category scheme='http://www.blogger.com/atom/ns#' term='scrum'/><category scheme='http://www.blogger.com/atom/ns#' term='extreme Programming'/><title type='text'>Is Scrum really (that) evil?</title><content type='html'>Some days ago I was surprised to find (the cult of) Scrum in a list of the &lt;a href="http://radar.oreilly.com/2009/12/the-best-and-the-worst-tech-of.html"&gt;Worst Technologies of the Decade&lt;/a&gt;. I posted the link on &lt;a href="http://www.linkedin.com/groups?gid=37631&amp;amp;trk=myg_ugrp_ovr"&gt;LinkedIn's Agile Alliance group&lt;/a&gt; and that started a mild discussion where most people more or less defended Scrum. What follows is my personal take on the issue: Is Scrum really (that) evil?&lt;br /&gt;&lt;br /&gt;(No need to say that if you know nothing about Scrum you should at least learn &lt;a href="http://en.wikipedia.org/wiki/Scrum_%28development%29"&gt;something&lt;/a&gt; about it before proceeding with the rest of the post)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://digibit.co.uk/App_Themes/Default/Images/ScrumAgile.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 583px; height: 259px;" src="http://digibit.co.uk/App_Themes/Default/Images/ScrumAgile.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Ok, so the first thing people will tell you about Scrum is that it is not a method, obviously not a technology, not a methodology... then what is it? Scrum sells itself as a ... (drums here) "&lt;span style="font-weight: bold;"&gt;framework&lt;/span&gt;"! For those of you with a less agile background, another well known example of a framework in the context of development methods is the &lt;a href="http://en.wikipedia.org/wiki/Rup"&gt;Rational Unified Process&lt;/a&gt;. If you are thinking on proposing a new development "method",  you should definitely think about selling it as a framework. This will bring you a number of benefits, the most important being:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;You can say that your framework is useful for any kind of situation&lt;/li&gt;&lt;li&gt;If anything goes wrong, you can always blame whoever instantiated the framework for not doing it right&lt;/li&gt;&lt;/ol&gt;Well, that is exactly what happens with Scrum. You can not say it is evil in itself, not even good or bad: it will be as good or bad as its particular instance. This is, of course, unfair with Scrum users that may have the feeling that if anything goes right it will be thanks to the framework, but if anything is wrong it will be their fault.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(UPDATE:  The following paragraph has been edited after feedback from comments to the post and Kent Beck himself. I only had time to skip through the &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;2nd Edition &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;of XP Explained, but I think I got the point.)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Unfortunately, the alternative is not very appealing either. Other methods like eXtreme Programming in their first incarnations would tell you that the only way to apply the method was to apply all practices. There was an important point in the message since some practices are actually complementary and feedback into others. So you can't, for instance, decide to skip Continuous Integration and still think you can get away with it. In any case, reality is that you cannot expect all teams to apply all practices and while practices like Unit Testing reach 60% usage, others like Pair Programming do not even reach 25%. Also as Kent Beck acknowledges in the preface to the 2nd edition of his foundational eXtreme Programming Explained, enforcing all practices is like enforcing a given programming style. And that might not be fit in every situation. The current proposed solution in XP is to present some &lt;span style="font-weight: bold;"&gt;primary &lt;/span&gt;practices and some &lt;span style="font-weight: bold;"&gt;corollary&lt;/span&gt;. Also now the message is that practices cannot be enforced and need to be evaluated in each situation. This brings XP closer to being a toolkit of practices, which is not far from the concept of framework: the responsibility is on the particular instance or application.&lt;br /&gt;&lt;br /&gt;Maybe the ideal solution would look something like: propose a generic framework that includes several practices and variations and then illustrate the framework with several practical instances that can be used depending on the project at hand. Of course, I am no the first one to think about something like this. As a matter of fact, Alistair Cockburn's &lt;a href="http://alistair.cockburn.us/Crystal+methodologies"&gt;Crystal methodologies&lt;/a&gt; were designed precisely with this idea in mind. Unfortunately the author did not get very far in proposing the different instances and now the only one fully developer is Crystal Clear, valid for smaller and lighter projects.&lt;br /&gt;&lt;br /&gt;In any case, Scrum &lt;span style="font-weight: bold;"&gt;is&lt;/span&gt; a framework and when something goes wrong it is pretty likely that the "user" will be to blame. However, as I have just explained, there is no clear alternative to putting so much weigh on the particular instance of a method.&lt;br /&gt;&lt;br /&gt;Still, can we say that there some things that are inherently good or bad about scrum?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;The good things&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course there are a few things that are good about Scrum regardless of how you apply the framework. To name a few:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Accountability:&lt;/span&gt; Scrum makes it much easier to make teams accountable for what they do. It also makes it easier to hold clients accountable for what they asked and even managers for what they did not provide. Forget about eXtreme Programming cards and boards, scrum artifacts like the burndown charts or the impediments or product backlog are great tools if what you need is more accountability in your life... I mean projects.&lt;br /&gt;&lt;/li&gt;&lt;li style="font-weight: bold;"&gt;Credibility of Agile: &lt;span style="font-weight: normal;"&gt;Let's face it, the first thing that comes to mind to many people when they read the agile manifesto is a bunch of long-bearded hipster coding with one hand and eating chips with the other. And this impression might not improve much more when you mention things like the "planning game" or.... (my god!) Pair Programming. Scrum brings credibility to Agile because it looks like "serious stuff".&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li style="font-weight: bold;"&gt;The agile for non-developers and managers: &lt;span style="font-weight: normal;"&gt;Believe it or not, those who have to decide whether to promote Agile in a company may have no clue of what the life of a developer is like&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;&lt;/span&gt;(and many could not care less). If you plan selling Agile by saying you will make developer's life easier, think again. But if you talk about shortening cycles, minimizing risk and... improving accountability (that is offering tools for managers to understand what is going on) you might have a chance. The biggest strength of Scrum is that it makes it much easier for managers and decision makers to buy into Agile. And believe me, this is a major issue that by itself may justify Scrum.&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;The bad ones&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Too many artifacts:&lt;/span&gt; Ok, but wasn't one of the goals of becoming agile to offer a lightweight method with as few artifacts as possible. Then, what are all these artifacts doing in my life now? Product backlog, sprint backlog, impediments backlog, product burndown chart, sprint burndown chart, daily meeting, retrospective, sprint planning... is this really necessary? The truth is that if managed right many of these artifacts are not a huge overhead and are useful, really. And again, they are needed to convince managers that we are doing things right. The problem is that, if we are not very careful, it is very easy to abuse them and end up making them more important than the product or team itself.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Too much focus on the process:&lt;/span&gt; Related to the previous, it is easy to fall into a situation where the &lt;span style="font-weight: bold;"&gt;process&lt;/span&gt; is more important than anything else. It is not about your project, your product, and your team... it is about getting the process right. What? This sounds more like &lt;a href="http://en.wikipedia.org/wiki/Capability_Maturity_Model"&gt;CMM&lt;/a&gt; than Agile!&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Developers? What developers?:&lt;/span&gt; And finally... where does that leave developers? If you implement Scrum and nothing else, you are pretty likely to turn developers' lifes into a sweatshop 2.0: they have to work more, faster, and better.... and are more accountable than before. Scrum does not care much about developers, so you better do this yourself. That is why a preferred approach is to sparkle in some XP practices on top of a Scrum-driven project organization. However, because XP will not be so popular among managers and decision makers you are likely to have much less support (if you don't agree with this, just try mentioning pair programming).  Scrum by itself is not developer-friendly and that  is a fact. &lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;Conclusions: is this really agile?&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;S&lt;/span&gt;&lt;span style="font-size:100%;"&gt;o now do me a favor and take a moment to (re)-read the &lt;a href="http://agilemanifesto.org/"&gt;Agile Manifesto&lt;/a&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;. What do you read in the first line? "(We value) &lt;/span&gt;&lt;span style="font-size:100%;"&gt;Individuals and interactions over processes and tools". The main &lt;span style="font-style: italic;"&gt;sin&lt;/span&gt; is that it makes it really easy to revert the sign in the equation. It is much easier to sell processes and tools to managers and because Scrum is a framework it is very easy to in a way that is not even agile anymore.&lt;/span&gt; of Scrum&lt;br /&gt;&lt;br /&gt;My  advice, however is not to avoid Scrum. Use Scrum to convince your organization of the benefits of Agile. Use the "processes and tools" view to make your point to managers and decision makers. But,&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Do not forget to build motivated teams&lt;/li&gt;&lt;li&gt;Value developers and educate them in eXtreme Programming techniques&lt;/li&gt;&lt;li&gt;Make them feel that they are far more important than any Scrum process or tool by allowing self-organization&lt;/li&gt;&lt;li&gt;Make sure that Scrum Masters (or project managers and the like) understand they dual role of using the "process" to improve accountability and upwards reporting while using people's skill and valuing the individuals in the team.&lt;/li&gt;&lt;/ul&gt;Of course, this is not the magic bullet of agile implementation. As a matter of fact, the combination of Scrum with eXtreme Programming techniques is already a favored approach in many companies (according to the &lt;a href="http://www.blogger.com/www.versionone.com/agilesurvey/"&gt;2009 State of Agile study&lt;/a&gt; 24% of the companies use this hybrid). But again, individuals and interactions may be more valuable than any process or tool. So at the end, it may not be so much about how you combine or instantiate your particular flavor of agile method but about how good and motivated are your teams.&lt;br /&gt;&lt;br /&gt;As always, would love to hear your comments and experiences.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5019548240288100433?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5019548240288100433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5019548240288100433' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5019548240288100433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5019548240288100433'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2010/02/is-scrum-really-that-evil.html' title='Is Scrum really (that) evil?'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3190210540644333515</id><published>2009-12-20T14:42:00.000-08:00</published><updated>2009-12-21T02:11:30.424-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='hci'/><category scheme='http://www.blogger.com/atom/ns#' term='survey'/><title type='text'>The hidden complexity of survey design (Part 1)</title><content type='html'>A couple of weeks ago I attended a &lt;a href="http://upf.edu/survey/Noticies/4.html"&gt;two-day course on Survey Design and Evaluation&lt;/a&gt;. In my recent research (see for instance the &lt;a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html"&gt;Rate It Again publication in last Recsys conference&lt;/a&gt;) I have become more and more interested on how people give their opinions.&lt;br /&gt;&lt;br /&gt;The course was taught entirely by &lt;a href="http://saris.sqp.nl/saris/"&gt;Professor Willem Saris&lt;/a&gt;, a very well-known researcher in survey design that was able to attract attendees from all over the world for this course. Although the course was fairly advanced, it touched upon the very issues that I wanted to see discussed. In this post I will try to very briefly mention some of them. More than trying to give a thorough explanation, I hope to draw your attention over some of these issues. Even if you are not into Recommender Systems, it is not strange for Computer Science researchers to be involved in projects in which you need to do some sort of survey, and I am sure you will find some of these issues as interesting as I have.&lt;br /&gt;&lt;br /&gt;I will summarize some initial issues in this first post and dive into others in future posts if you consider it interesting enough.&lt;br /&gt;&lt;br /&gt;But, before I start, let me throw in a couple of surprising conclusions just to catch your attention:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Batteries of agree/disagree questions a&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;re evil!&lt;/span&gt; Yes, I am sure you have come across them and possibly even designed a survey in which users are asked at the beginning something like "Mark how much you agree/disagree with the following statements". Well, this is to be avoided at any cost. I will clarify why and how you can replace these kinds of questions.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;You cannot compare results among different demographic groups assuming that the same response means the same to any group.&lt;/span&gt; It turns out that different countries, for instance, have different rating styles and understand questions differently. For instance, British respondents tend to be much milder in their response than Spaniards. In a scale from 0 to 5, a British 3 might mean the same as a Spanish 5! In any case, this is not something you can assume in advanced, it is something you need to analyze in order to guarantee fair comparisons. More on this later.&lt;/li&gt;&lt;/ol&gt;Ok, so I hope I have caught your attention by now and you agree with me that these issues are very interesting and seldom explained (actually, during the course we saw many examples of professional surveys that were plain wrong).&lt;br /&gt;&lt;br /&gt;The method for developing a survey presented in the course was a three step procedure: (1) Distinguish between concepts by postulation and concepts by intuition; (2) Develop assertions for concepts by intuition; and (3) Develop requests for an answer from assertions. Let's see them in a bit of detail.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. Concepts by postulation and concepts by intuition&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One first important decision when trying to measure a given concept is whether we can measure it by &lt;span style="font-weight: bold;"&gt;intuition&lt;/span&gt; or by &lt;span style="font-weight: bold;"&gt;postulation&lt;/span&gt;. If we think that a concept is straightforward enough, we can directly ask the question we would like to be answered (e.g. How often do you watch sports on television?). However, many times we are trying to measure concepts for which a simple and direct question won't do (e.g. How interested in politics are you?) so we need to measure them by postulation.&lt;br /&gt;&lt;br /&gt;When measuring a concept by postulation, we need to decompose the complex concept we want to measure into a series of &lt;span style="font-weight: bold;"&gt;indicators&lt;/span&gt;. These indicators can be either &lt;span style="font-weight: bold;"&gt;formative &lt;/span&gt;or &lt;span style="font-weight: bold;"&gt;reflective&lt;/span&gt;. Formative indicators are variables that &lt;span style="font-weight: bold;"&gt;define&lt;/span&gt; the concept. They should take into account all the necessary components and are not necessarily correlated. On the other hand, reflective indicators are &lt;span style="font-weight: bold;"&gt;consequences&lt;/span&gt; of the concept being measured (e.g. people watch the news because they are interested on politics). These indicators are correlated since they are all linked by the originating concept.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. From concept to assertion&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;There are three forms of assertions for asking a concept by intuition:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Subject + LV predicator + subject complement (e.g. Politicians are fair)&lt;/li&gt;&lt;li&gt;Subject + predicator + direct object (e.g. I like conservatives)&lt;/li&gt;&lt;li&gt;Subject + predicator (e.g. The importance of world economics has changed)&lt;/li&gt;&lt;/ul&gt;On the other hand, a concept by intuition might be measuring different kinds of subjective variables that can be separated into categories such as: evaluation, importance, feelings, rights, policies... It turns out that depending on the kind of subjective variable we are measure, one kind of structure might or might not be appropriate. For instance, if you are measuring &lt;span style="font-weight: bold;"&gt;importance&lt;/span&gt;, only structure 1 will work (there is a complete table that I cannot reproduce where you see the relation between kind of variable and structure to use).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. From assertions to requests for answers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One last step is to decide how to present the request to the survey participant. The following list summarizes the different options available:&lt;ul&gt;&lt;li&gt;Direct request&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;With WH word&lt;/li&gt;&lt;li&gt;Without WH word&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Direct Instruction ("Please indicate....")&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Direct Request ("Will you vote...")&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;li&gt;Indirect Request (made of pre request and subordinate clause)&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;With WH word ("Tell me why you think...")&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Without WH word ("Do you think....?")&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;Following this 3 step approach does not guarantee you are avoiding all errors but rather guarantees that you are looking into all the issues that are needed in order to decide what is the right question to measure a given concept.&lt;br /&gt;&lt;br /&gt;And, if you cannot avoid all errors, what can you do about it? Well, you can measure them and take them into account and even predict them. In order to do that I would need to introduce the &lt;a href="http://www.socialresearchmethods.net/kb/mtmmmat.php"&gt;Multitrait Multimethod Approach&lt;/a&gt; and concepts such as &lt;span style="font-weight: bold;"&gt;reliability&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;validity&lt;/span&gt;, and &lt;span style="font-weight: bold;"&gt;quality&lt;/span&gt;. But that shall be in a second post if there is enough interest on this.&lt;br /&gt;&lt;br /&gt;You can read more on these issues in Wille Saris' book "&lt;a href="http://www.amazon.com/Evaluation-Analysis-Questionnaires-Research-Methodology/dp/0470114959/ref=ntt_at_ep_dpi_1"&gt;Design, Evaluation,  and Analysis of Questionnaires for Survey Research&lt;/a&gt;".&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.binbin.net/photos/john-wiley-and-sons-ltd/des/design-evaluation-and-analysis-of-questionnaires-for-survey-research.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 124px; height: 200px;" src="http://www.binbin.net/photos/john-wiley-and-sons-ltd/des/design-evaluation-and-analysis-of-questionnaires-for-survey-research.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3190210540644333515?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3190210540644333515/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3190210540644333515' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3190210540644333515'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3190210540644333515'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/12/hidden-complexity-of-survey-design-part.html' title='The hidden complexity of survey design (Part 1)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8354391811155347323</id><published>2009-12-13T06:44:00.000-08:00</published><updated>2009-12-13T13:29:27.866-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='music recommenders'/><category scheme='http://www.blogger.com/atom/ns#' term='content-based'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='collaborative filtering'/><title type='text'>On the uselessness of content for recommendations</title><content type='html'>This is one of the hot discussions that has sparked as a result of the &lt;a href="http://www.netflixprize.com/"&gt;Netflix Prize&lt;/a&gt;. During the competition several teams reported trying to use movie metadata always with discouraging results. This is probably best summarized by a &lt;a href="http://pragmatictheory.blogspot.com/2008/08/you-want-truth-you-cant-handle-truth.html"&gt;2008 post&lt;/a&gt; by Pragmatic Theory, one of the leading teams.&lt;br /&gt;&lt;br /&gt;The issue was re-opened during the last Recsys conference in two ways: First, there was an interesting discussion during one of the panels including the leading teams. Second a paper with a rather provocative title was published: "&lt;a href="http://www.gravityrd.com/download/recsys2009pila_draft.pdf"&gt;Recommending new movies: even a few ratings are more valuable than metadata&lt;/a&gt;" .&lt;br /&gt;&lt;br /&gt;After this, I have seen several discussions in which people used these findings to conclude that content-based recommendations are little more than a dead end, and it is not worth to invest on such research. One such discussion happened in the &lt;a href="http://www.linkedin.com/groups?gid=1758697"&gt;Recommender Systems group&lt;/a&gt; in LinkedIn. But, it was in the &lt;a href="http://listes.ircam.fr/wws/info/music-ir"&gt;Music-IR list&lt;/a&gt;, where things heated up the most, turning into a long and interesting thread. Most of what follows is basically an edited version of what I already expressed in those two discussions.&lt;br /&gt;&lt;br /&gt;In a few words, my take on this issue is that results reported in the context of the Netflix competition are (1) Algorithm-dependent and (2) dataset-dependent.  Although these findings are a valid explanation of why people found no use for metadata in  the context of the Netflix prize, one can not extrapolate this finding to other contexts. Why?&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Results related to the Netflix prize only refer to how some specific content features help improve the success measure chosen in this case (RMSE). It is a well-known fact that RMSE in a Recommender System does not correlate perfectly with user satisfaction. Things like, for instance, serendipity or novelty, are more likely to come out of a content-based than a CF Recsys since content-based approaches are better suited to explore the long tail.&lt;/li&gt;&lt;li&gt;The dataset in Netflix is somewhat representative of many Recsys cases, but not all. For instance, the sparsity of the rating matrix is much greater in the "movie" dimension, than in the "user" dimension. That is, for a given movie, we are likely to have many ratings. On the other hand, for a given user, we are likely to have very few ratings. As some of the participants in the Recsys panel explained, the Netflix problem is more about how to fill in user "missing values" than movie "missing values". That is one of the reasons why movie content does not help much. Adding content to the user dimension (for instance by adding demographics) would probably have helped. Obviously, this is not easy to do unless Netflix had included the phone number or SSN of users in the dataset.&lt;/li&gt;&lt;li&gt;When people talk about content information in the context of the Netflix Prize, they are referring to a very specific form of content: editorial metadata coming mainly from &lt;a href="http://www.imdb.com/"&gt;imdb&lt;/a&gt;. But, in different settings, there are many other and better sources of content information. For instance, one can try to infer descriptors by automatically analyzi ng the signal (either video or audio) and use those features for content-based recommendations. We are still far from having automatic algorithms that can on their own bring useful enough features to map to user preferences. But, that does not mean these features do not exist. Another approach to extracting those features is to have experts manually anotate the content. This is what &lt;a href="http://www.pandora.com/"&gt;Pandora&lt;/a&gt; does in their music recommendation system. And although I have not seen hard numbers, it seems users are more satisfied than when using CF alone. &lt;/li&gt;&lt;/ol&gt;I think that we will probably see the use of content (and user demographics) in the second edition of the prize, since the dataset will be very different and will include fewer ratings per movie and more user info.&lt;br /&gt;&lt;br /&gt;All that said, in the general case, and with no other info on the problem, I would probably venture to say that Collaborative Filtering is a more general solution than content-based. But clearly, the best solution is to combine both as each solves a part of the problem.&lt;br /&gt;&lt;br /&gt;So, let me try to summarize my thinking in a set of simple statements:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;CF is more effective than content-based recommendations in the general case.&lt;/li&gt;&lt;li&gt;The fact that editorial metadata has not proved useful to increase RMSE accuracy in the Netflix Prize &lt;span style="font-weight: bold;"&gt;does not&lt;/span&gt; mean that content-based recommendations are useless.&lt;/li&gt;&lt;li&gt;Adding some sort of content description helps recommendations as long as this description does effectively describe the content and maps into user preferences.&lt;/li&gt;&lt;li&gt;Editorial metadata does not map directly to the content, neither to user preferences so its usefulness may be very limmited.&lt;/li&gt;&lt;li&gt;Feautures automatically derived from the content map directly to the content but not to user preferences in the general case. Lots of research efforts still need to go into this to close this semantic gap.&lt;/li&gt;&lt;li&gt;Manually annotated content features map to the content and to user preferences so they should prove useful as in the case of Pandora. But they might be expensive in the general case.&lt;/li&gt;&lt;/ul&gt;As always, looking forward to your comments.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8354391811155347323?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8354391811155347323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8354391811155347323' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8354391811155347323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8354391811155347323'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/12/on-uselessness-of-content-for.html' title='On the uselessness of content for recommendations'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-215305794213456355</id><published>2009-10-26T01:52:00.000-07:00</published><updated>2009-10-26T10:12:01.138-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recsys09'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='new york'/><title type='text'>Recsys 09</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://recsys.acm.org/images/token.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 150px; height: 150px;" src="http://recsys.acm.org/images/token.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last week I attended the &lt;a href="http://recsys.acm.org/"&gt;2009 ACM Conference on Recommender Systems&lt;/a&gt;, Recsys09 for short. The conference took place in &lt;a href="http://www.nyu.edu/"&gt;New York University&lt;/a&gt;'s &lt;a href="http://www.stern.nyu.edu/"&gt;Stern School of Business&lt;/a&gt; organized by &lt;a href="http://pages.stern.nyu.edu/%7Eatuzhili/"&gt;Alex Tuzhilin&lt;/a&gt;. This was the 3rd edition of this very special conference for me. Special for several reasons such as the fact that it is the main conference in the area that I am focusing my research; or the fact that I am co-chairing the conference next year in Barcelona. The area of recommender systems has also a special attraction since it combines people with backgrounds as different as HCI, Marketing, Data Mining, Information Retrieval, or Mathematics. If you add the fact that there is an extremely important representation from industry, and many of which you won't easily see in many other conferences from Netflix to Autodesk and a great number of start-ups, you have an explosive cocktail. People in the audience that rave when they see a formula that cannot fit into one slide mix with senior committee members that propose to automatically reject papers that use the Greek alphabet.&lt;br /&gt;&lt;br /&gt;The conference has been steadily growing for the past years. It started out of a workshop organized in Bilbao by &lt;a href="http://corp.strands.com/"&gt;Strands&lt;/a&gt;. The &lt;a href="http://recsys.acm.org/2007/"&gt;fi&lt;/a&gt;&lt;a href="http://recsys.acm.org/2007/"&gt;rst edition&lt;/a&gt; was then held in Minneapolis, home to the Movielens group which could also be considered birth place of the area as a whole. Then off to &lt;a href="http://recsys.acm.org/2008/"&gt;EPFL&lt;/a&gt; and finally this year in NY. The numbers are astonishing for a conference as young (and presumably focused) as this one: more than 280 attendees and an acceptance rate of 19% make it look almost like a first-tier conference.&lt;br /&gt;&lt;br /&gt;If you want to get a good idea of what went on during the conference I recommend you take a look at the &lt;a href="http://twitter.com/#search?q=%23recsys09"&gt;tweets hashed with #recsys09&lt;/a&gt;. And if you want a really quick idea of what where the core topics, look at the beautiful tag cloud below,  generated from the tweets by &lt;a href="http://www.csi.ucd.ie/users/barry-smyth"&gt;Barry Smyth&lt;/a&gt;. In the next paragraphs I will briefly highlight what I think were the most important ideas discussed during the conference.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://dejavu6.ucd.ie/wordpress/wp-content/uploads/2009/10/recsys-09-tweet-cloud-small.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 461px; height: 240px;" src="http://dejavu6.ucd.ie/wordpress/wp-content/uploads/2009/10/recsys-09-tweet-cloud-small.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The first day, we had 3 very interesting tutorials. These tutorials had the great virtue of already setting what would be 3 of the most important topics during the conference: Social Recommendations and Trust, Algorithms, and the Netflix Prize.&lt;br /&gt;&lt;br /&gt;In the first tutorial, &lt;a href="http://www.cs.umd.edu/%7Egolbeck/"&gt;Jennifer Golbeck&lt;/a&gt; did an awesome job of introducing the field of Trust-based Recommendations and explain the challenges in the field. The tutorial was extremely interactive with many questions and comments from the audience. It is true that the idea of trust is also one that very easily leads to passionate debates and opinions. The area of trust and social-based recommendations appeared again and again during the conference. There was a whole session devoted to it in the main track (or 2 if we include the one on tags and Social Networks) and a &lt;a href="http://ls13-www.cs.uni-dortmund.de/homepage/RSWEB/"&gt;workshop&lt;/a&gt; on the last day. Interestingly enough, though, I did hear relevant people from the industry say that they did not believe social recommendations to be of any practical use. Don't really know what to make of that though.&lt;br /&gt;&lt;br /&gt;The second tutorial was more of a traditional and classical lecture on Bayesian Methods. Bayesian Methods is the most popular (but not only) approach to model-based recommendations. They have two main advantages: they allow for the use of nice probabilistic formalisms, and they allow to infer knowledge from the resulting model. However, latent models based on Matrix Factorization have proved to be more reliable and, in principle, they also allow to infer knowledge from the latent variables. During the conference there were 2 different sessions on algorithms, which were dominated by different approaches to hybridize recommendations and by improvements over pre-existing collaborative filtering methods. Among the latter, I should mention the Best Paper winner, Benjamin Marlin. His &lt;a href="http://bit.ly/4jA3CQ"&gt;paper&lt;/a&gt; proves that missing data (i.e. items that have not been rated) cannot be considered random and he introduces a way of taking some non-random effects into account. I found the conclusions of the paper not very striking, but the approach and scope of the idea is. And Marlin deserves the award for being the first to point to this issue, and also for all his great work in the area in general.&lt;br /&gt;&lt;br /&gt;The last tutorial in day 1, which started a thread of its own, was a discussion on the lessons learned from the &lt;a href="http://www.netflixprize.com/"&gt;Netflix Prize&lt;/a&gt;. Very, very interesting discussion where some of the issues I mentioned in my previous &lt;a href="http://technocalifornia.blogspot.com/2009/09/netflix-prize-lessons-learned.html"&gt;blog post&lt;/a&gt; were brought up. For instance, I asked about the goodness of RMSE as a success measure. Everybody agrees that the only way to really evaluate a recommender is to do A/B tests on a real system but you cannot do this in an unsupervised way such as the contest. However, I insisted on the possibility of using other measures such as top-N related ones (e.g. nDCG). The (not very convincing) answer to this possibility was from the participants: it would be much harder to optmize algorithms for top-N measures that for the much more simple RMSE. The Netflix prize appeared now and again during the conference, especially since it was finally awarded recently. For instance, there was &lt;a href="http://www.gravityrd.com/download/recsys2009pila_draft.pdf"&gt; a very provocative paper&lt;/a&gt; by one of the participant teams proving that metadata is useless. This has stirred a heated discussion on whether that means that content-based approaches are useless altogether. The simple answer: NO. They are useless in the very specific case of the Netflix competition and dataset, and using RMSE as the success measure. Content-based approaches (and hybrids) are here to stay and need much more research.&lt;br /&gt;&lt;br /&gt;The last thread that was also started on the very first day was the industrial one. As I mentioned before, company presence in Recsys is very relevant. And this year it was kicked of by a panel where Netflix and Yahoo discussed on the 8 challenges of the Recommender Systems Field. The panel was extremely interesting because &lt;a href="http://www-users.cs.umn.edu/%7Eriedl/"&gt;John Riedl&lt;/a&gt; did a great jog on conducting it and on getting the two industry particpants to prepare it for weeks. To summarize, the Challenges were: transparency, exploration, navigation, time value, user action interpretation, evaluation, scalability, and relation academy/industry. The next industrial activity in the program was Francisco Marin's keynote where instead of the challenges he talked about the 10 lessons learned during his years of experience. It was a brilliant keynote that impacted many people (especially some students that were then deciding to change the orientation of their PhD). In Francisco's vision the algorithm is only 5% of the Recommender, while the most important part is the User Interface, which should take around 50% of the resources. But, if you want an excellent summary of this keynote, take a look at Neal Lathia's &lt;a href="http://bit.ly/3LiKac"&gt;reconstruction from tweets&lt;/a&gt;. The last activity worth mentioning from this industrial thread was the &lt;a href="http://corp.strands.com/recsys/"&gt;Industry Workshop&lt;/a&gt; on the last day. It was organized by &lt;a href="http://marctorrens.net/"&gt;Marc Torrens&lt;/a&gt; (the other co-chair of next year's conference) and it attracted more than 45 people from industry.&lt;br /&gt;&lt;br /&gt;A final thread that did not start on the first day was the application-related one. There was an applications session that was a sort of miscellaneous but where &lt;a href="http://www.csi.ucd.ie/users/jill-freyne"&gt;Jill Freyne&lt;/a&gt; presented a very interesting and well-delivered paper on the effect of people recommendation on social networks. In this application thread I should include some of the very interesting posters in the poster session. Applications that went all the way from a source code recommender from Karatzoglou and Weimer to IPTV or mobile tourist recommender systems.&lt;br /&gt;&lt;br /&gt;Anoother very interesting thing left out of these 5 thread was the &lt;a href="http://ids.csom.umn.edu/faculty/gedas/cars2009/"&gt;Workshop on Context-aware Recommender Systems&lt;/a&gt; where I presented some of our preliminary work on &lt;a href="http://technocalifornia.blogspot.com/2009/09/context-aware-recommendations.html"&gt;time-dependent music recommendation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As a final personal promotion note I should say that &lt;a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html"&gt;my paper&lt;/a&gt; was probably an interesting oddball in the conference. It was the only paper that addressed the issue of data quality and user feedback and the impact it has on the recommendations. It made it really tough on the organizers to decide what session it should belong to, so I ended up presenting in the Trust session. But my impression was the it was very well received and i opens up a whole new avenue of future research in the field. Here you can check the slides I used during the presentation.&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img style="visibility: hidden; width: 0px; height: 0px;" src="http://counters.gigya.com/wildfire/IMP/CXNID=2000002.0NXC/bT*xJmx*PTEyNTY1NDk3MjE3MzImcHQ9MTI1NjU*OTczMDE3OCZwPTEwMTkxJmQ9c3NfZW1iZWQmZz*yJm89OTVjNGFkYTA*OWQyNGJkNzhiMGI*ZTZmYWE4MzljMWEmb2Y9MA==.gif" width="0" border="0" height="0" /&gt;&lt;/div&gt;&lt;div style="width: 425px; text-align: center;" id="__ss_2335913"&gt;&lt;a style="margin: 12px 0pt 3px; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; display: block; text-decoration: underline;" href="http://www.slideshare.net/xamat/rate-it-again" title="Rate it Again"&gt;Rate it Again&lt;/a&gt;&lt;object style="margin: 0px;" width="425" height="355"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=recsys09presentation-091024100544-phpapp02&amp;amp;stripped_title=rate-it-again"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=recsys09presentation-091024100544-phpapp02&amp;amp;stripped_title=rate-it-again" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View more &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/"&gt;documents&lt;/a&gt; from &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/xamat"&gt;Xavier  Amatriain&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;Overall, a great conference. And although the bar was set very, very high, we hope to exceed expectations in our 2010 Recsys in Barcelona. Hope to see everyone there!&lt;br /&gt;&lt;br /&gt;(Btw, this is a very personal overview. Feel free to leave you in the form of comments and let me know if there is any mistake or misinterpretation)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-215305794213456355?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/215305794213456355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=215305794213456355' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/215305794213456355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/215305794213456355'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/10/recsys-09.html' title='Recsys 09'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5432284717594424469</id><published>2009-09-29T15:19:00.001-07:00</published><updated>2009-10-14T09:38:07.454-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='netflix prize'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>The Netflix Prize: Lessons Learned</title><content type='html'>Some time ago I published a post with the title "&lt;a href="http://technocalifornia.blogspot.com/2009/05/netflix-prize-what-if-there-is-no.html"&gt;What if there is no Million $&lt;/a&gt;" in which I discussed the possibility that the &lt;a href="http://www.netflixprize.com/"&gt;Netflix prize&lt;/a&gt; had no solution. A few weeks later, two teams beat the 10% threshold that entitled them to the grand prize. Bellkor's Pragmatic Chaos beat The Ensemble in a photo finish, only because they sent their solution 20 minutes earlier.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/StX87-hQtlI/AAAAAAAAAFc/Pfdpcmva5XQ/s1600-h/NeflixCompleted.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 307px; height: 272px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/StX87-hQtlI/AAAAAAAAAFc/Pfdpcmva5XQ/s320/NeflixCompleted.png" alt="" id="BLOGGER_PHOTO_ID_5392494236171023954" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;If you want more details on how it all happened I recommend you start by reading the two winning teams web pages. The Ensemble (runner up) has an awesome &lt;a href="http://www.the-ensemble.com/"&gt;web&lt;/a&gt; with lots of information on the prize. And Bellkor's Pragmatic Chaos &lt;a href="http://www.research.att.com/%7Evolinsky/netflix/bpc.html"&gt;web&lt;/a&gt; also gives inside information on the winners' road to the million.&lt;br /&gt;And if you want the nasty technical details, the three teams that merged into the winning Bellkor's Pragmatic Chaos have published their solution. &lt;a href="http://bit.ly/2uFoQ"&gt;Here&lt;/a&gt; you will find a description of Pragmatic Theory's solution. &lt;a href="http://bit.ly/eBveg"&gt;Here&lt;/a&gt; is Big Chaos'. And &lt;a href="http://bit.ly/16fddh"&gt;here&lt;/a&gt; is the already well-known Bellkor approach.&lt;br /&gt;&lt;br /&gt;After the competition ended, there have been countless reactions and discussions about it. See, for example, what &lt;a href="http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix"&gt;Gavin Potter&lt;/a&gt; (a.k.a. Guy in the Garage), one of the Netflix Prize stars has to say &lt;a href="http://justaguyinagarage.blogspot.com/2009/07/reflections-on-netflix-competition.html"&gt;in his blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So, I will add my voice to this choir of reactions to the prize. In this post I will try to summarize what, from my humble personal perspective, have been the biggest lessons learned from the prize. Take this as a warming up of the panel entitled "What did we learn from the Netflix Prize?" that we will attend in &lt;a href="http://recsys.acm.org/"&gt;NY&lt;/a&gt; next week.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. RMSE is not a valid success measure&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Whether RMSE was a valid success measure for a recommender sytem was discussed very early after the prize started. As a matter of fact, this discussion is even older in academic circles.&lt;br /&gt;&lt;br /&gt;The fact of the matter is that RMSE is not a valid success measure for several reasons. The ultimate one is that there is no direct correlation between this measure and the end-user satisfaction to recommendations. However, using more hci'ish measures related to user satisfaction is out of the question in the context of a prize such as Netflix's. It would be nice to see, though, some post-mortem in which at least the winning approach was used in a user study and compared to the original system. Hopefully, we should see a significant increase in user satisfaction with the recommendations... but, to be honest, I am not all that sure.&lt;br /&gt;&lt;br /&gt;Once we have ruled out, user-study related measures, is RMSE the best we can do? Well, I (and many others) think that there are much better mathematical measures that correlate to the actual goal of a Recommender System. See, the main problem with RMSE is that it weighs the same the error you make by predicting a 2 where it should have been a 1 than the one you make when predicting a 4 instead of a 5. But, you would never recommend an item with a 2!&lt;br /&gt;&lt;br /&gt;I particularly like Top-N measures such as Precision and Recall of "recommendable" items (either fixing N or, better, defining a recommendable threshold). An even more precise measure is the so-called &lt;a href="http://en.wikipedia.org/wiki/Discounted_cumulative_gain"&gt;Normalized Discounted Cumulative Gain&lt;/a&gt; (NDCG) where item order in the recommendation list is taken into account.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. Time matters&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Another interesting finding is the importance of modeling temporal evolution of user preferences. The fact that I liked Meatballs in 1979 does not mean that I will like Meatballs 4 now. As a matter of fact, it does not even mean that I still like the original one now. This is what we call &lt;span style="font-style: italic;"&gt;stability&lt;/span&gt; in rating theory. Yehuda Koren, of the wining team, has a very interesting &lt;a href="http://research.yahoo.com/pub/2824"&gt;publication&lt;/a&gt; on the topic. Neal Lathia's latest &lt;a href="http://www.cs.ucl.ac.uk/staff/N.Lathia/"&gt;publications&lt;/a&gt; have also interesting insights on the temporal evolution of collaborative filtering systems.&lt;br /&gt;&lt;br /&gt;Now, it turns out that, just as you can model the importance of time, you can also take into account many other different factors. As a matter of fact, this is what the Bellkor team calls the "factor model". Again, let me point you to a &lt;a href="http://research.yahoo.com/pub/2435"&gt;publication&lt;/a&gt; by Koren to learn more about this (actually, now that we are at it, you might want to take a look at all of Koren's latest publications, most of which are very relevant for our discussion).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. Matrix Factorization methods work best&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Probably, the single event that&lt;span style="font-weight: bold;"&gt;&lt;/span&gt; marked a turning point in the Netflix competition was Simon Funk's &lt;a href="http://sifter.org/%7Esimon/journal/20061211.html"&gt;publication&lt;/a&gt; of the SVD solution. Since then, many teams turned to SVD-like solutions. Matrix Factorization is the family of methods, so to say, that include particular implementations such as SVD but also many other like non-negative Matrix Factorization, Maximum Margin Matrix Factorization, and so on. Again, latest &lt;a href="http://research.yahoo.com/pub/2859"&gt;publication&lt;/a&gt; from Koren does a gentle introduction to Factor models in this context. The &lt;a href="http://www.cs.uic.edu/%7Eliub/KDD-cup-2007/proceedings.html"&gt;papers&lt;/a&gt; from the 2007 KDD cup are also a good source for information on Factor Models in the context of the Prize, since it was then when these approaches where probably thought as the ultimate solution.&lt;br /&gt;&lt;br /&gt;Factor models are great since they can accomplish slightly better results than standard neighbor-based methods, they offer some sort of insight on the problem and, above all, they can be implemented in a much more efficient way. However, I am still to be convinced that, in isolation, they are the best method in a general case.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;4. One method is not enough (nor 100)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Or, in other words, given any prediction method it usually pays off more to add a new one than to improve the existing one (If I don't remember wrong some member of the winning team said something similar in an interview).&lt;br /&gt;&lt;br /&gt;So, yes, as sad as it may seem, there is no magical solution to the Netflix Prize. Factor models sort of work but, alone they wouldn't get you the million $. As a matter of fact, you need many, many methods combined to reach that number. The problem with this approach is that the resulting algorithms is as close to a black-box as it can get. No more insights, no more knowledge learned from it: millions of parameters that self-adjust to fit into the solution.&lt;br /&gt;&lt;br /&gt;Don't get me wrong, this is an outstanding accomplishment from an engineering perspective. But it limits the scientific insight on the problem. It also raises the question of how portable the solution is to other domains and even datasets. I would like Netflix to pick 500K different users and 17K new movies and report the error that the system makes on them.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;5. The importance of data&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;and noise&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, to finish this list, let me bring the discussion home. As I have discussed in a talk at Boston University (and in a &lt;a href="http://technocalifornia.blogspot.com/2009/07/its-all-about-data.html"&gt;previous post&lt;/a&gt;), given a problem such as the one posed by Netflix you have two options: limit yourself to the existing data and try to bang your head to improve the algorithm, or try to improve the data itself. Of course, in the context of the prize, improving the data was not easy (although some tried to add content information to the movie titles without sucess). But in a realistic setting there are many ways this could be feasible and much, much efficient in terms of resources and results.&lt;br /&gt;&lt;br /&gt;Take the &lt;a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html"&gt;approach&lt;/a&gt; to removing noise we propose in our upcoming paper, for instance. As we present in that paper there are improvements above 10% that can be accomplished by simply asking some users to re-rate some items. In another paper we also found that simply re-ordering the items to rate reduced inconsistent ratings and therefore helped in predicting recommendations.&lt;br /&gt;&lt;br /&gt;----&lt;br /&gt;&lt;br /&gt;As a final note, I think that the Netflix Prize has left more questions than answers while putting the spotlight on Recommender Systems research. This is of course great news for us researchers in the area. We can only hope that the 2nd edition of the prize, already announced will bring more glory to the field :-)&lt;br /&gt;&lt;br /&gt;Let me also congratulate the winners, runner-ups, organizers and other participants. In case it was not clear: &lt;span style="font-weight: bold;"&gt;you did an awesome job!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Please post your other lessons learned as comments to this post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5432284717594424469?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5432284717594424469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5432284717594424469' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5432284717594424469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5432284717594424469'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/09/netflix-prize-lessons-learned.html' title='The Netflix Prize: Lessons Learned'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/StX87-hQtlI/AAAAAAAAAFc/Pfdpcmva5XQ/s72-c/NeflixCompleted.png' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8938525134623364876</id><published>2009-09-21T14:47:00.000-07:00</published><updated>2009-09-29T08:39:17.907-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Towards Context-aware Recommendations</title><content type='html'>The goal of a Recommender System is to model the users' preferences in order to recommend new items that the users is likely to find of interest. However, we know that user preferences are influenced by a contextual conditions, such as the time of the day, mood, or current activity, but this type of information is not exploited by standard models. Context-aware recommender systems (CARS) aim at improving user satisfaction to recommendations by tailoring these to each particular context.&lt;br /&gt;&lt;br /&gt;Context-aware recommendation is a research hot topic since it bridges the gap between recommender systems and other areas of research such as ubiquitous computing. However, research on this topic is still on its first stages and there is a lot to be done. If you want some background reading, I would recommend start by looking at the &lt;a href="http://portal.acm.org/citation.cfm?id=1454068&amp;amp;dl=GUIDE&amp;amp;coll=GUIDE&amp;amp;CFID=54061305&amp;amp;CFTOKEN=26411187"&gt;work &lt;/a&gt;of Gedas Adomavicious and Alex Tuzhilin.&lt;br /&gt;&lt;br /&gt;I have been meaning to work on contextual recommendations for some time and this summer I had the perfect opportunity since &lt;a href="https://www.inf.unibz.it/%7Elbaltrunas/research.html"&gt;Linas Baltrunas&lt;/a&gt;, a student of &lt;a href="http://www.inf.unibz.it/%7Ericci/"&gt;Francesco Ricci&lt;/a&gt; working on this topic for his PhD, has been collaborating with us in the lab.&lt;br /&gt;&lt;br /&gt;In this first approach to contextual recommendations we have tried to tackle the issue of time-dependent music recommendation. That is, designing a recommendation algorithm that can recommend not only personalized music but one that fits better to the current time of the day, day of the week, or season of the year. Our initial assumption is, of course, that music taste depends on those variables (i.e. you don't listen to the same kind of music on saturday evening than on monday morning).&lt;br /&gt;&lt;br /&gt;There a couple of things with this use case that make it specially interesting (and dificult) when compared to previous work. First, music preference modeling is done through implicit feedback. That is, users don't tell you explicitly what they like or don't, they simply listen more to some music than other. Converting that to a user preference model has some issues of its own, especially if you need to take into account contextual variables such as time. And also, time is a continuous context variable. All previous work on contextual user modeling and recommendation is done using discrete variable such as who you are with or whether you are at work or at home. This raises another issue since there is a need to &lt;span style="font-weight: bold;"&gt;segment&lt;/span&gt; the data before building the context-aware preference model.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SsIa09-fF4I/AAAAAAAAAFU/DUm0lwmp2BY/s1600-h/partitioning.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 202px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SsIa09-fF4I/AAAAAAAAAFU/DUm0lwmp2BY/s320/partitioning.png" alt="" id="BLOGGER_PHOTO_ID_5386897601580701570" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We are publishing our very first initial results in the &lt;a href="http://ids.csom.umn.edu/faculty/gedas/cars2009/"&gt;CARS workshop&lt;/a&gt; during the &lt;a href="http://recsys.acm.org/"&gt;2009 Recsys Conference&lt;/a&gt;, in a couple of weeks in NY. &lt;a href="http://xavier.amatriain.net/pubs/baltrunas_CARS09.pdf"&gt;Here&lt;/a&gt; you have the paper. Comments and suggestions are welcomed!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8938525134623364876?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8938525134623364876/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8938525134623364876' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8938525134623364876'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8938525134623364876'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/09/context-aware-recommendations.html' title='Towards Context-aware Recommendations'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SsIa09-fF4I/AAAAAAAAAFU/DUm0lwmp2BY/s72-c/partitioning.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5922353831885056511</id><published>2009-08-05T02:29:00.001-07:00</published><updated>2009-08-06T13:07:26.669-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='noise'/><category scheme='http://www.blogger.com/atom/ns#' term='recsys'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Rate it Again</title><content type='html'>&lt;a href="http://technocalifornia.blogspot.com/2009/05/netflix-prize-what-if-there-is-no.html"&gt;A few weeks back&lt;/a&gt;, I described our work on trying to measure Natural Noise on user feedback. This was motivated by &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;a study&lt;/a&gt; we recently published in the UMAP conference. One of the possible solutions that I commented then was to address the noise issue directly by trying to apply denoising algorithms to the user data.&lt;br /&gt;&lt;br /&gt;Well, that is exactly what we have done in a paper that has been accepted for publication in &lt;a href="http://recsys.acm.org/"&gt;Recsys09&lt;/a&gt; (NY). The paper is entitled "Rate It Again" and you can access a preprint copy &lt;a href="http://xavier.amatriain.net/pubs/xamatriain_Recsys09.pdf"&gt;here&lt;/a&gt;. The basic idea in our approach is to ask users to re-rate items that they already rated in the past. We can then denoise ratings that prove to be inconsistent by minimizing their contribution to the recommendation process.&lt;br /&gt;&lt;br /&gt;The biggest practical issue with the approach is that we don't want all users to have to re-rate all items in order to identify which ones to denoise. That is why in that same paper we propose ways to decide which items and users are most likely to introduce noise in order to have only those go over the burden of re-rating items.&lt;br /&gt;&lt;br /&gt;We measured relative improvement in terms of RMSE up to 14% and we verified that this is consistent regardless of the particular recommendation algorithm (item and user-based CF, SVD, etc...).&lt;br /&gt;&lt;br /&gt;This is another example of how to improve recommender systems using a data-driven approach. Denoising user feedback is a promising avenue, and there is still a lot of room for improvement!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update&lt;/span&gt;: I forgot to mention the impressive numbers for this year's Recsys. There were 203 submisions (almost doubling last year's numbers), 140 for long papers. The acceptance rate was down to 17% making it more competitive than other 1st tier conferences. And, as a matter of fact, having been in the Program Committee, and looking at the &lt;a href="http://recsys.acm.org/accepted_long_papers.pdf"&gt;list of accepted papers&lt;/a&gt;, I can say that the quality of the papers is comparable (if not higher) than the quality of recommender papers accepted to related first tier conferences.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5922353831885056511?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5922353831885056511/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5922353831885056511' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5922353831885056511'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5922353831885056511'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/08/rate-it-again.html' title='Rate it Again'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-4710841211427108588</id><published>2009-08-04T12:50:00.000-07:00</published><updated>2009-08-05T02:25:31.271-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='umap09'/><category scheme='http://www.blogger.com/atom/ns#' term='sigir09'/><category scheme='http://www.blogger.com/atom/ns#' term='conferences'/><title type='text'>UMAP and SIGIR 09</title><content type='html'>I usually do a short report after I attend a conference. However, because of multiple commitments, deadlines and important things, I have failed to do so in the last two: UMAP 09 in Trento, Italy and SIGIR 09 in Boston, MA. So, I thought I'd give it a go and write a short report on both.&lt;br /&gt;&lt;br /&gt;I will start by saying that lately I am not very fond of conferences in general. Don't get me wrong, I love socializing with other researchers in the area, meeting new people, and getting a chance to present my research to a larger audience while getting expert feedback. The problem is that in most conferences this comes to be secondary. There is a very interesting and &lt;a href="http://cacm.acm.org/magazines/2009/8/34492-time-for-computer-science-to-grow-up/fulltext"&gt;recent article in ACM Communications&lt;/a&gt; by Lance Fortnow that does a very good job in analyzing the issue (although I should warn you that I do not subscribe his solution of going back to journals!). In any case, given this context, take my review of these two conferences with a grain of salt, I shall return to the broader issue of conferences sometime soon.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://umap09.fbk.eu/sites/umap09.fbk.eu/files/header.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 418px; height: 64px;" src="http://umap09.fbk.eu/sites/umap09.fbk.eu/files/header.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Ok, so I will start by commenting on &lt;a href="http://umap09.fbk.eu/"&gt;UMAP09&lt;/a&gt;, which was organized in Trento (Italy) late June. There, I presented a long paper "I like it... I like it not", that I already commented &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;in this same blog&lt;/a&gt;. UMAP has been organized this year for the first time by joining two pre-existing bi-anual conferences: UM (User Modeling) and AH (Adaptive Hypermedia) (see information about past conferences &lt;a href="http://umap09.fbk.eu/past_conf"&gt;here&lt;/a&gt;). Both of these conferences where highly regarded so the resulting union was anticipated to be a success. Besides, the area of User Modeling is gaining a lot of momentum recently and a conference such as UMAP09 was expected to ride the wave. However, attendance was around 200 people, which was the same that any of the two conferences had in isolation. Acceptance rate for long papers was 26%&lt;br /&gt;&lt;br /&gt;For people like me coming from outside the community UMAP looks like a weird conference, and there are many things about the organization that are hard to understand. First, there is the fact that the conference is not sponsored by any well-known organization like ACM, IEEE, etc... but rather by a non-profit organization called &lt;a href="http://www.um.org/"&gt;User Modeling Inc.&lt;/a&gt; . I am sure there are (or were) good reasons for this, but to an outsider this sounds weird, you'll give me that. Then, and possibly related to the previous, there is the issue with the proceedings: they are published with Springer (a for profit publisher) in the infamous LNCS series... and proceedings are not available for download in electronic format even at this stage! If you add the fact that the choice of location in beautiful but hard to reach Trento was questionable I am not surprised the conference turned out to be less than what the orgs expected. I really think some of these issues should be addressed soon: there are many other conferences that are more than happy to accept research related to User Modeling, and UMAP will  have to do their best to attract people. However, if well-managed, UMAP should be a very attractive and relevant conference. Next year it will be organized in Hawaii and there are already talks of co-locating it with a larger event in 2011.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.sigir2009.org/sites/default/files/logo.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 92px; height: 122px;" src="http://www.sigir2009.org/sites/default/files/logo.jpg" alt="" border="0" /&gt;&lt;/a&gt;Moving on to SIGIR, which is a completely different beast since it is a well-established first-tier conference and it was organized in the easy-to-reach Boston. I have no complaint about the organization (except for maybe the lack of lunches during main conference days). And as a matter of fact I have to congratulate them for excellent social events:  both the banquet at JFK museum and the Harbor tour were great. I presented our paper "&lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;The Wisdom of the Few&lt;/a&gt;" in a conference that had the lowest acceptance rate since 1997, a 16% ... quite an honour being accepted.&lt;br /&gt;&lt;br /&gt;SIGIR -- the most important conference on Information Retrieval, for those that are not in the field -- is one of those large conferences where you have several parallel tracks. This is great since you are always likely to find something you are interested on. However, it has the downside that most people might be attracted to the same track, leaving others almost empty. Curiously enough, this is what happened during the &lt;a href="http://sigir2009.org/Program/industry"&gt;Industry track&lt;/a&gt;: most people were attracted to it, leaving the research tracks with much less attendance than expected. A suggestion for next years would be not to host the industry track in parallel but during a specific time. In any case, this brings me to my important question: why where researchers more attracted to the industry track than the research ones? The answer is simple: while the average presenter in the industry track is a well-known professional with an above-average public speaking skills, the average presenter is a PhD student that can barely hope to grasp the audience attention by not putting too many formulas on the slides. I will leave this analysis here but will try to come back to it in a dedicated post soon. If you are interested in reading more there is a great series of posts on the excellent Industry Track at SIGIR 2009 by the organizer &lt;a href="http://thenoisychannel.com/about/"&gt;Daniel Tunkelang&lt;/a&gt;, Endeca's Chief Scientist. Start &lt;a href="http://thenoisychannel.com/2009/07/29/sigir-2009-day-3-industry-track-matt-cutts/"&gt;here&lt;/a&gt;, and follow to similar or later posts. You can also find other great posts summarizing SIGIR, see Jeff Dalton's summaries &lt;a href="http://www.searchenginecaffe.com/2009/07/sigir-2009-day-1-summary.html"&gt;here&lt;/a&gt;, for instance.&lt;br /&gt;&lt;br /&gt;On the last SIGIR day I attended the&lt;a href="http://ir.mathcs.emory.edu/SSM2009/"&gt; Search in Social Media Workshop&lt;/a&gt;. This turned out to be one of the highlights of the conference. The setup of the workshop was great. It was divided in different topic blocks. For each block there was a keynote. Then other presenters had a short time for presentation and then they all joined for a discussion panel including participation from the audience and from a twitter feed projected on the side... Brilliant! I particularly liked the keynotes by &lt;a href="http://www-users.cs.umn.edu/%7Ekonstan/"&gt;Joseph Konstan&lt;/a&gt; and&lt;a href="http://www.ir.iit.edu/%7Eabdur/"&gt; Abdur Chowdhury&lt;/a&gt;, Twitter's Chief Scientist.&lt;br /&gt;&lt;br /&gt;Overall going to conferences is a great experience and I got to meet many interesting people, have interesting conversations and I presented 2 papers getting a lot of feedback. Conferences are essential in the work of a researcher. However, there is a lot of room for improvement in order to make the best of them. And definitely CS conferences should take the lead because technology will be key in this transformation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-4710841211427108588?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/4710841211427108588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=4710841211427108588' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4710841211427108588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4710841211427108588'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/08/umap-and-sigir-09.html' title='UMAP and SIGIR 09'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6032188526101764446</id><published>2009-07-21T05:53:00.000-07:00</published><updated>2009-07-21T19:42:35.651-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sigir09'/><category scheme='http://www.blogger.com/atom/ns#' term='boston'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>It's all about the Data...</title><content type='html'>(or Data-driven approaches to the Recommendation problem)&lt;br /&gt;&lt;br /&gt;This is the title of the talk I will be giving on Thursday July, 23, 3:30 pm at &lt;a href="http://www.bu.edu/maps/?id=757"&gt;Boston University&lt;/a&gt; (Math &amp;amp; CS Building, room 135).&lt;br /&gt;&lt;br /&gt;Here is the abstract of the talk, hope to see you around:&lt;br /&gt;&lt;br /&gt;&lt;pre wrap=""&gt;The Netflix Prize put Recommender Systems (RS) research in the spotlight. Given 100M ratings from 500K users to 17K movies, researchers from all over the world have been racing for almost 3 years to improve accuracy by 10% in order to win the 1M$ prize. A couple of weeks ago, it was announced that a merge between several teams that used hundreds of predictors might have won the prize. However, there are doubts about the generalization properties of this winning approach.&lt;br /&gt;&lt;br /&gt;Our approach to the Recommender problem has been different: instead of taking data as is and invest in fine tuning a large number of machine  learning algorithms to model the data, we have focused on understanding the data and improving it.&lt;br /&gt;&lt;br /&gt;In a recent UMAP 2009 paper named "&lt;a href="http://xavier.amatriain.net/pubs/xamatriain_umap09.pdf"&gt;I Like it, I Like it Not..&lt;/a&gt;", we show that the natural noise due to the inconsistencies in user feedback sets a lower bound on the so-called "magic barrier" in RS and could in fact be very close to the Netflix Prize threshold. Once the inconsistencies of users when providing explicit feedback have been characterized, we can devise ways to minimize them.&lt;br /&gt;&lt;br /&gt;In "&lt;a href="http://xavier.amatriain.net/pubs/xamatriain_sigir09.pdf"&gt;The Wisdom of the Few&lt;/a&gt;" (SIGIR 2009), we propose a different approach to Collaborative Filtering by using feedback from experts instead of regular users. In "Adaptive Data Sources" (ITPW @IJCAI 2009) we propose to use ensembles of data sources instead of ensembles of predictors. Finally, in "Rate it Again" (RECSYS 2009), we present an algorithm for denoising user feedback based on a re-rating approach.&lt;br /&gt;&lt;br /&gt;In this talk, I will give an overview of the issue of noise in user feedback for Recommender Systems and will briefly describe the work that we have done (as previously described) to overcome it.&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6032188526101764446?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6032188526101764446/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6032188526101764446' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6032188526101764446'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6032188526101764446'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/07/its-all-about-data.html' title='It&apos;s all about the Data...'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5102564471109542956</id><published>2009-07-13T01:37:00.000-07:00</published><updated>2009-07-12T16:32:32.731-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='crowds'/><category scheme='http://www.blogger.com/atom/ns#' term='web science'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Why the crowds are not (always) wise</title><content type='html'>More often than not I come across situations in which the now famous &lt;a href="http://www.randomhouse.com/features/wisdomofcrowds/"&gt;"Wisdom of the Crowds"&lt;/a&gt; is applied in the wrong context or situation. Let me try to explain it with an example of one such situations:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Let's imagine a well-known online newspaper decides to give a prize to the best... Linux application, for instance.  In order to do so, it decides that it will let the crowds decide. First, developers who have an application and are interested in the $100K of the prize will submit their application.  Then users will vote and the prize will be awarded to the application with most votes.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ok, so, do you think the winner of the contest will be the "best Linux application"? Of course not. And there are many possible reasons why that won't be the case, right?&lt;br /&gt;&lt;br /&gt;Let's start by the two most important conditions that we need to guarantee if we want to trust the Wisdom of the Crowds:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;There has to be... well, a crowd&lt;/li&gt;&lt;li&gt;Whatever we are measuring needs to be close to the notion of "popularity"&lt;/li&gt;&lt;/ol&gt;Ok, the first condition seems fair and obvious enough. However, it is often forgotten. If you want the crowds to decide on something, you need to have enough opinions to avoid possible bias. Ideally, you would even worry about things such as demographics, etc... Given that this is seldom possible, you  need to guarantee that malicious bias or shilling is not possible, or at least hard.&lt;br /&gt;&lt;br /&gt;For instance, in our example, imagine the winner Linux application got 120 votes. "Uhm", I hear you say, "for $100K, I could get 120 people to vote my app". Well, there you are. And there are many cases in which this might not be so explicit, but in which the amount of opinions do not make a crowd.&lt;br /&gt;&lt;br /&gt;Ok, let's go to the second condition: we need that whatever we are measuring is somewhat correlated with popularity. In our example, it might be that what we mean by "best application" is precisely the "most popular" one. If so, that's ok. If what we are trying to measure is something else: such as "most innovative", "most secure, estable, ...." chances are that the crowds will not be evaluating these features. Again, in many cases, what the "best" means, is not critical (think on TV programs such as American Idol where the crowds get to pick the best performer, for instance). However, in some others, we might be getting an answer to the wrong question.&lt;br /&gt;&lt;br /&gt;Although I summarized/simplified the two most important conditions for the Wisdom of the Crowds to take place, the original book by Surowiecki talked about 4 conditions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Diversity of opinion&lt;/li&gt;&lt;li&gt;Independence&lt;/li&gt;&lt;li&gt;Decentralization&lt;/li&gt;&lt;li&gt;Aggregation&lt;/li&gt;&lt;/ul&gt;(The &lt;a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Four_elements_required_to_form_a_wise_crowd"&gt;Wikipedia entry&lt;/a&gt; does a good at summarizing these.)&lt;br /&gt;&lt;br /&gt;Wisdom in crowds should be taken with care and only where appropriate.&lt;br /&gt;Actually, in many situations, instead of crowds we might prefer to have a number of experts give us their opinion (would you post your symptoms on a web page and decide what pills to take based on the wisdom of the crowds?).&lt;br /&gt;For this reason, we propose a Collaborative Filtering approach based on experts in our forthcoming SIGIR paper (see &lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;my previous blog entry on this&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;And for that same reason, I am starting to like the concept of the &lt;a href="http://unescochair.blogs.uoc.edu/24022009/francis-pisani-the-alchemy-of-crowds/"&gt;Alchemy of the Crowds&lt;/a&gt; (as opposed to Wisdom) described by the authors of the same name.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5102564471109542956?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5102564471109542956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5102564471109542956' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5102564471109542956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5102564471109542956'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/07/why-crowds-are-not-always-wise.html' title='Why the crowds are not (always) wise'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3692240511625532134</id><published>2009-07-03T04:39:00.001-07:00</published><updated>2009-07-05T15:41:57.362-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='classifiers'/><category scheme='http://www.blogger.com/atom/ns#' term='svm'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><category scheme='http://www.blogger.com/atom/ns#' term='nearest neighbors'/><title type='text'>Sometimes your neighbors might not be your friends</title><content type='html'>A few days ago in UMAP 09, Rachael Rafter and others from &lt;a href="http://www.csi.ucd.ie/users/barry-smyth"&gt;Barry Smyth&lt;/a&gt;'s group from the &lt;a href="http://www.clarity-centre.org/"&gt;Clarity Center&lt;/a&gt; presented an interesting paper entitled "&lt;span style="font-weight: bold;"&gt;What Have The Neighbours Ever Done for Us? A Collaborative Filtering Perspective&lt;/span&gt;" (See abstract at the bottom of &lt;a href="http://umap09.fbk.eu/s7"&gt;this page&lt;/a&gt;**).&lt;br /&gt;&lt;br /&gt;The paper draws its main conclusion from several studies over several datasets. And what it has to say may sound surprising: in standard nearest-neighbors collaborative filtering, more often than not, neighbors are contributing negatively to the prediction. In other words, using the user mean would work better in many cases than adding the contribution from the neighbors. The problem is that this gets even worse when looking at precisely the most informative ratings (i.e. those at either extreme of the scale).&lt;br /&gt;&lt;br /&gt;These findings seem to be somewhat related to the paper we are presenting next week in the &lt;a href="http://www.dcs.warwick.ac.uk/%7Essanand/itwp09/schedule.html"&gt;The 7th Workshop on Intelligent Techniques for Web Personalization &amp;amp; Recommender Systems at IJCAI&lt;/a&gt; (well to be precise I should say &lt;a href="http://www.cs.ucl.ac.uk/staff/N.Lathia/"&gt;Neal Lathia&lt;/a&gt; will be presenting it).&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://xavier.amatriain.net/pubs/lathia_ijcai_itwprs09_final.pdf"&gt;that paper&lt;/a&gt;, we are presenting a novel approach to finding nearest neighbors: instead of looking at users that you have close by we try to find your most like-minded peers looking at several different datasets recovered from the web. Maybe your movie watching taste (and rating style) is better predicted by a group of experts. Or maybe it is a group of power users from a completely different dataset the ones that fit you better. In our paper we use only three datasets: Netflix, Flixter, and Rotten Tomatoes.&lt;br /&gt;&lt;br /&gt;We found that if we had a perfect classifier that could tell us what group a user belongs to (or will be better predicted by) we could improve the accuracy level way below the Netflix prize threshold. The novelty of this approach is that now, instead of focusing on merging many complex algorithms, "all" we need to do is to treat the recommender problem as a classification problem and use algorithms as simple as we want.&lt;br /&gt;&lt;br /&gt;So, I am hearing you ask: where is the catch? Well the catch is that we haven't been able to find a good classificaiton algorithm (in that case we would be talking about a long paper in a conference not one in a workshop). We have tried a bit with several simple classifiers (including SVM, NN, etc...) but there is no way. However, there are many promising avenues including cost-sensitive classifiers.&lt;br /&gt;&lt;br /&gt;So, if you are an expert on classifiers and want a fun project, join us in finding who your good friends are... beyond your neighbors.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;** Unfortunately I can not point you to any digital version of the article since there is none available. Just another of the bad choices by UMAP organizers is to publish proceedings with Springer and not clear rights for a digital version to be available right away.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3692240511625532134?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3692240511625532134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3692240511625532134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3692240511625532134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3692240511625532134'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/07/sometimes-your-neighbors-might-not-be.html' title='Sometimes your neighbors might not be your friends'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2177767774042583432</id><published>2009-06-15T02:40:00.001-07:00</published><updated>2009-06-15T02:54:58.700-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='agile'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>The (very draft) Agile Research Manifesto</title><content type='html'>Some time ago &lt;a href="http://technocalifornia.blogspot.com/2008/06/agile-research.html"&gt;I posted&lt;/a&gt; about the idea of applying agile principles to Scientific Research. The idea is built upon the observation that the Scientific Process is iterative by nature and shares many ideas with Agile Methods. I have used agile-like methods to coach students and small research teams with very good results.&lt;br /&gt;&lt;br /&gt;After the previous post I have been discussing these ideas with a few people. One of the things that came to mind is that it should be possible to adapt the Agile Manifesto without making substantial changes to its underlying principles. I have recently published &lt;a href="http://xavier.amatriain.net/AgileResearchManifesto/"&gt;my first attempt&lt;/a&gt; at doing so. You should treat this as an early draft up for discussion and feedback (As a matter of fact, the version you will now read already includes feedback from some discussions). And, if you would like to participate in the discussions, please join the &lt;a href="http://www.linkedin.com/groups?gid=1824874"&gt;LinkedIn group&lt;/a&gt; we created for that purpose.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2177767774042583432?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2177767774042583432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2177767774042583432' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2177767774042583432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2177767774042583432'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/06/very-draft-agile-research-manifesto.html' title='The (very draft) Agile Research Manifesto'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8560865038317114613</id><published>2009-05-24T16:29:00.000-07:00</published><updated>2009-06-02T15:12:39.706-07:00</updated><title type='text'>Netflix Prize: What if there is no Million $ ?</title><content type='html'>If you are participating in the Netflix Prize, don't worry... This post is not about the economic crisis and Netflix filing for bankrupcy and not paying the prize. But this is in fact about a much more "scary" perspective: what if there was no way to lower the threshold set in the competition? Or more precisely, what if the only way to lower the error threshold was actually overfitting to the existing training and testing dataset?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://web4.cs.ucl.ac.uk/research/csml/images/Netflix_Prize.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 176px; height: 209px;" src="http://web4.cs.ucl.ac.uk/research/csml/images/Netflix_Prize.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The possibility cannot be discarded. &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;A few weeks back&lt;/a&gt; I posted on this same blog a discussion on a recent work we did analyzing the impact of natural noise on user ratings. That is, when users give us feedback through ratings, they are adding a background noise. There are many possible reasons for this. In some cases the user does not really see a difference between rating an item with a 3 or a 4. Other times, the user is not being careful enough when giving the feedback and is letting other factors affect the result. Ratings will be affected by things like how long ago the item was used, what was the previously rated item, or even the mood the user is in.&lt;br /&gt;&lt;br /&gt;But, whatever the reason is, the result is that we have data with noise and/or errors. If we take a random rating and asked the user "What was your rating for item X?" we will inevitably get errors. Uhm... so even the user makes errors when recalling her own ratings? Yes! As a matter of fact we can easily measure this error by asking her to rate the same items several times (see, again, &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;my previous post&lt;/a&gt; on this).&lt;br /&gt;&lt;br /&gt;Now, here is a rule of thumb: we cannot predict the user any better than this same user can assess her own ratings. Therefore this natural noise threshold is setting a "magic barrier". We cannot try, and it really makes no sense, to go below this error in our predictions. What difference does it make that our system is very good in predicting some item should "be" a 3 if the user does not really see the difference between (or is not sure about) a 2 and a 3 for that item. Or, the other way around: "How can we predict a 3 with no error if the user "randomly" moved between 2,3, and 4 when giving us feedback?"&lt;br /&gt;&lt;br /&gt;So, and returning to the initial issue, the question is now: is the Netflix Prize threshold below this "magic barrier"? Do they even know? Well, a member of the leading team Korbell and I had an informal conversation with Netflix VP for Personalization. Of course, they cannot say much about the prize in case they would be giving vital information for winning the price. However, when we asked him whether he had any information as for what was this "magic barrier" on the Netflix Prize dataset he answered that they did a small study to estimate something similar to this. Their estimation was "around 0.5". That is surely non-negligible but it is safely located below the winning threshold of 0.83. However, remember this was only a small study that gave them a rough estimate. Our measures on a similar dataset yielded RMSE values between 0.57 and 0.82 Although these values depend on several variables such as the time between ratings or even how items are presented to the user, we have reasons to believe the Netflix dataset should be on the higher end of this range (if not higher!). Read more on our &lt;a href="http://xavier.amatriain.net/pubs/xamatriain_umap09.pdf"&gt;UMAP article&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As a final appendix, let me throw in two important conclusions that pinpoint future directions. First, it is clear that the RMSE measure should be reconsidered. If, on average, the user does not know the difference between a 2 and a 3, we should not take that into account in our success measure. Top-N measures seem much more suitable as a measure of success in Recommender Systems: the user might not care or see a difference between a 2 and 3, but she will surely be deceived if we recommend something she values with a 1.&lt;br /&gt;Another strategy is to select only users that are more consistent and use those to generate recommendations for the target user. If the target user is noisy herself, we will still get lousy recommendations. But we will be minimizing errors for the rest. This is the approach we took in our &lt;a href="http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html"&gt;Wisdom of the Few&lt;/a&gt;. Finally, although we cannot aim at getting results below the "magic barrier", there is something we &lt;span style="font-weight: bold;"&gt;can&lt;/span&gt; do: lower that barrier. In a work we have under submission, we devised a  "denoising" algorithm that is able to improve accuracy almost up to a 15% by lowering this noise threshold. But, I will leave this for a future post once we hopefully get the paper accepted.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8560865038317114613?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8560865038317114613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8560865038317114613' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8560865038317114613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8560865038317114613'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/netflix-prize-what-if-there-is-no.html' title='Netflix Prize: What if there is no Million $ ?'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3358413754856523370</id><published>2009-05-24T16:09:00.000-07:00</published><updated>2009-06-01T16:13:31.544-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='allosphere'/><category scheme='http://www.blogger.com/atom/ns#' term='ted talks'/><title type='text'>Allosphere@TED (part II)</title><content type='html'>In a &lt;a href="http://technocalifornia.blogspot.com/2009/02/allosphereted.html"&gt;previous post&lt;/a&gt; I talked about the presentation my former boss (and still friend) JoAnn Kuchera-Morin gave at TED talking about the &lt;a href="http://www.allosphere.ucsb.edu/"&gt;Allosphere&lt;/a&gt; project I was technical director at UCSB. I am now writing this follow-up post because since then many more people have found about the project.&lt;br /&gt;&lt;br /&gt;TED put the &lt;a href="http://www.ted.com/index.php/talks/joann_kuchera_morin_tours_the_allosphere.html"&gt;video of the presentation&lt;/a&gt; online a few weeks ago.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="400" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/u-D-zEToJQ4&amp;amp;hl=es&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/u-D-zEToJQ4&amp;amp;hl=es&amp;amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="400" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;Shortly after, the video was &lt;a href="http://hardware.slashdot.org/article.pl?sid=09/04/15/2017209&amp;amp;from=rss"&gt;slashdotted&lt;/a&gt;. And of course, once that happen you are likely to get much attention. Some of the most interesting comments I found:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.engadget.com/2009/04/16/allosphere-three-story-virtual-environment-not-available-for-bir/"&gt;AlloSphere three story virtual environment not available for birthday parties&lt;/a&gt; @ Endgadget&lt;br /&gt;&lt;a href="http://singularityhub.com/2009/05/14/enter-the-allosphere-a-360%C2%B0-audiovisual-research-dome/"&gt;Singularity Hub&lt;/a&gt;&lt;br /&gt;&lt;a href="http://scienceblogs.com/bioephemera/2009/04/the_allosphere_flying_through.php"&gt;The AlloSphere: Flying through a giant virtual brain?&lt;br /&gt;&lt;/a&gt;&lt;a href="http://dgoudy.wordpress.com/2009/04/19/hci-and-the-allosphere/"&gt;HCI and the Allosphere&lt;br /&gt;&lt;/a&gt;&lt;a href="http://infosthetics.com/archives/2009/04/allosphere_a_new_way_to_interpret_scientific_data.html"&gt;AlloSphere: Interpret Scientific Data in a 3 Story High Metal Sphere&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;I particularly enjoyed &lt;a href="http://www.technovelgy.com/ct/Science-Fiction-News.asp?NewsNum=2258"&gt;this one&lt;/a&gt; and &lt;a href="http://www.virtualworldsnews.com/2009/04/allosphere-not-a-virtual-world-but-visualizing-the-same-data.html"&gt;this other one&lt;/a&gt;. Funny, when I was working on the project people made the joke about Professor X's Cerebro, where I was the X :-)&lt;br /&gt;&lt;br /&gt;Hopefully, now that the project has caught more attention, it will be easier to get the right money in the door.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3358413754856523370?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3358413754856523370/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3358413754856523370' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3358413754856523370'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3358413754856523370'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/allosphereted-part-ii.html' title='Allosphere@TED (part II)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2543164554650124642</id><published>2009-05-24T14:51:00.000-07:00</published><updated>2009-05-24T16:00:11.930-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fellows'/><category scheme='http://www.blogger.com/atom/ns#' term='acm'/><category scheme='http://www.blogger.com/atom/ns#' term='computer science'/><title type='text'>ACM Fellows 2008</title><content type='html'>A few weeks ago I read the list of the 2008 ACM Fellows. ACM each year recognizes computer scientists for their contributions. The &lt;a href="http://www.acm.org/press-room/news-releases/fellows-2008"&gt;2008 list&lt;/a&gt; includes 44 new fellows from very different backgrounds. However, I was happy to find out that I knew some of them and I definitely agreed with them being on the list. I will add my small grain by mentioning them in this blog:&lt;br /&gt;&lt;p&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Alan C. Kay&lt;/b&gt;&lt;br /&gt;&lt;em&gt;&lt;/em&gt;&lt;span style="font-style: italic;"&gt;For fundamental contributions to personal computing and object-oriented programming&lt;/span&gt;&lt;/p&gt;The surprise here is how in the world could Alan Kay still not be an ACM Fellow! I can thing of maybe only a handful of people that I would put before him in such a category. If you are doing anything related to computers you probably already know a lot about Alan: father of Object-Oriented Programming (including Smalltalk), inventor of laptop, the windows-based interfaces, the &lt;a href="http://laptop.org/en/"&gt;OLPC&lt;/a&gt; project...&lt;br /&gt;Still, this is a good excuse to read and learn &lt;a href="http://en.wikipedia.org/wiki/Alan_Kay"&gt;a bit more about him&lt;/a&gt;. Alan was one of my usual cites in my Software Engineering lessons and I was fortunate enough to meet him during the presentation fo the OLPC project in UCLA and learned about many things, including his love for Spanish ham and his current relation with several Open Source projects.&lt;br /&gt;&lt;p&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Perry R. Cook&lt;/b&gt;   &lt;em&gt;Princeton&lt;/em&gt;&lt;em&gt; University&lt;/em&gt;&lt;/p&gt; &lt;p style="font-style: italic;"&gt;For contributions to computer music, physics-based sound synthesis and voice analysis/synthesis&lt;/p&gt;I have known Perry for many years. He was even a professor in one of my PhD courses at UPF. But anyone who has done some research in anything related to computer music knows &lt;a href="http://en.wikipedia.org/wiki/Perry_R._Cook"&gt;Perry Cook&lt;/a&gt;. He is most known for his work on physical modeling of instruments and voice synthesis. But he has also co-authored very important sofware packages such as &lt;a href="http://ccrma.stanford.edu/software/stk/"&gt;STK&lt;/a&gt;, which was a big influence on our &lt;a href="http://clam-project.org/"&gt;CLAM&lt;/a&gt;. More recently, he has also coauthored &lt;a href="http://chuck.cs.princeton.edu/"&gt;Chuck&lt;/a&gt; with Ge Wang. But above all, Perry is a great guy... someone you will want to hang out with after the conference and have some beers.&lt;br /&gt;&lt;p&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Joseph&lt;/b&gt;&lt;b&gt; A. Konstan&lt;/b&gt;&lt;em&gt; - University&lt;/em&gt;&lt;em&gt; of Minnesota&lt;/em&gt;&lt;/p&gt; &lt;p style="font-style: italic;"&gt;For contributions to human-computer interaction&lt;/p&gt;Funny that I am hosting Prof. Konstan this Friday on his visit to Barcelona. I have just met him briefly in previous ACM &lt;a href="http://www.recsys.org/"&gt;Recsys&lt;/a&gt; conferences. And to be honest, I was not very much aware of his previous work. Joseph was well-known to me for being one of the founders (together with John Riedl) of the &lt;a href="http://www.grouplens.org/"&gt;GroupLens &lt;/a&gt;research group. Their work on Recommender Systems has been seminal and extremely important for raising awareness of this field in recent years. But apart from that it turns out that Prof. Konstan has been President of &lt;a href="http://sigchi.org/"&gt;SIGCHI &lt;/a&gt;(one of ACM's most important Special Interest Groups with 4500 members). He is also known for his work on Online Communities and Computer Systems for HIV prevention. Read more in his &lt;a href="http://www-users.cs.umn.edu/%7Ekonstan/"&gt;webpage&lt;/a&gt;.&lt;br /&gt;&lt;p&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;William Buxton&lt;/b&gt;&lt;em&gt; - Microsoft Research&lt;/em&gt;&lt;i&gt; &lt;/i&gt;&lt;/p&gt; &lt;p style="font-style: italic;"&gt;For contributions to the field of human-computer interaction&lt;/p&gt;&lt;p&gt;Bill Buxton is an amazing guy I had the pleasure to meet in Santa Barbara. That was before he was appointed as Microsoft Chief Scientist. But still, he was teh kind of person everybody listened to as soon as he started talking. You only need to read &lt;a href="http://www.billbuxton.com/#bio"&gt;his bio&lt;/a&gt; tu understand why that is. He started working also on Computer Music and at that time started working on multi-touch surfaces and composition tools. He then went off to be Chief Scientis at &lt;a href="http://en.wikipedia.org/wiki/Alias_Systems_Corporation"&gt;Alias/Wavefront&lt;/a&gt;, now part of Autodesk and known worldwide for their Maya package. It is a bit weird to see him now as Microsoft Chief Scientist. But, hey... that's a heck of a job!&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2543164554650124642?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2543164554650124642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2543164554650124642' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2543164554650124642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2543164554650124642'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/acm-fellows.html' title='ACM Fellows 2008'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-809240655077776722</id><published>2009-05-21T15:12:00.000-07:00</published><updated>2009-05-21T23:51:04.385-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='firefox'/><category scheme='http://www.blogger.com/atom/ns#' term='pulseaudio'/><category scheme='http://www.blogger.com/atom/ns#' term='jack'/><category scheme='http://www.blogger.com/atom/ns#' term='CLAM'/><category scheme='http://www.blogger.com/atom/ns#' term='flash'/><title type='text'>Linux: Processing audio from the browser</title><content type='html'>Imagine being able to process streams of audio that are playing on your browser directly and on real-time on a external application. For instance, you could analyze the music of a youtube video while it's playing.&lt;br /&gt;&lt;br /&gt;Well, this is possible in Linux with a little infrastructure: &lt;a href="http://en.wikipedia.org/wiki/JACK_Audio_Connection_Kit"&gt;jack&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/PulseAudio"&gt;pulseaudio&lt;/a&gt; and a couple of modules that connect one with the other. &lt;a href="http://ubuntuforums.org/showthread.php?t=843012"&gt;This post&lt;/a&gt; in the ubuntuforums gives a good enough explanation on the requirements (scroll down until you see the section on "pulseaudio through jack").&lt;br /&gt;&lt;br /&gt;Anyway, I have recorded a two-part screencast where I explain all this and use this setting to process a youtube video with CLAM and detect its chords in real-time.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Part 1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-e7cf974e5aef0531" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.youtube.com/get_player"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="flashvars" value="flvurl=http://v1.nonxt1.googlevideo.com/videoplayback?id%3De7cf974e5aef0531%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330441582%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D3A2457C645FE4053B5460FAAD928F3236265C5E2.86554498CA9DBAD88E8FB9BD10037E767FFCE653%26key%3Dck1&amp;amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3De7cf974e5aef0531%26offsetms%3D5000%26itag%3Dw160%26sigh%3Dx9MPNC7VQl8MckFWmb2VBTwakIU&amp;amp;autoplay=0&amp;amp;ps=blogger"&gt;&lt;embed src="http://www.youtube.com/get_player" type="application/x-shockwave-flash"width="320" height="266" bgcolor="#FFFFFF"flashvars="flvurl=http://v1.nonxt1.googlevideo.com/videoplayback?id%3De7cf974e5aef0531%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330441582%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D3A2457C645FE4053B5460FAAD928F3236265C5E2.86554498CA9DBAD88E8FB9BD10037E767FFCE653%26key%3Dck1&amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3De7cf974e5aef0531%26offsetms%3D5000%26itag%3Dw160%26sigh%3Dx9MPNC7VQl8MckFWmb2VBTwakIU&amp;autoplay=0&amp;ps=blogger"allowFullScreen="true" /&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Part 2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Final demo step, recorded with a camera given to the problems using jack, clam, and the gtk-RecordMyDesktop app at the same time. Still... you'll get the idea after watching part 1)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-a970f9f88c523d60" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.youtube.com/get_player"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="flashvars" value="flvurl=http://v16.nonxt8.googlevideo.com/videoplayback?id%3Da970f9f88c523d60%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330441582%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D325B3A7E443631A2ABC04BA94A24CB44C8E7C1CE.259780238F454C63418215B55A2EDEFE122F08F7%26key%3Dck1&amp;amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3Da970f9f88c523d60%26offsetms%3D5000%26itag%3Dw160%26sigh%3D38HappTljmbYLpzVE5-_Keuj-Nk&amp;amp;autoplay=0&amp;amp;ps=blogger"&gt;&lt;embed src="http://www.youtube.com/get_player" type="application/x-shockwave-flash"width="320" height="266" bgcolor="#FFFFFF"flashvars="flvurl=http://v16.nonxt8.googlevideo.com/videoplayback?id%3Da970f9f88c523d60%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330441582%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D325B3A7E443631A2ABC04BA94A24CB44C8E7C1CE.259780238F454C63418215B55A2EDEFE122F08F7%26key%3Dck1&amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3Da970f9f88c523d60%26offsetms%3D5000%26itag%3Dw160%26sigh%3D38HappTljmbYLpzVE5-_Keuj-Nk&amp;autoplay=0&amp;ps=blogger"allowFullScreen="true" /&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-809240655077776722?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='enclosure' type='video/mp4' href='http://www.blogger.com/video-play.mp4?contentId=a970f9f88c523d60&amp;type=video%2Fmp4' length='0'/><link rel='enclosure' type='video/mp4' href='http://www.blogger.com/video-play.mp4?contentId=e7cf974e5aef0531&amp;type=video%2Fmp4' length='0'/><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/809240655077776722/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=809240655077776722' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/809240655077776722'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/809240655077776722'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/processing-audio-from-browser-in-linux.html' title='Linux: Processing audio from the browser'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-1239699067818993606</id><published>2009-05-17T15:21:00.000-07:00</published><updated>2009-11-04T14:57:53.910-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rotten tomatoes'/><category scheme='http://www.blogger.com/atom/ns#' term='netflix prize'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>The Wisdom of the Few</title><content type='html'>One of the most common approaches to Recommender Systems is the so-called Collaborative Filtering. The main rationale is the following: In order to predict items that you will like, we find the most similar users to you by looking at your previous likes and dislikes. We then recommend items that those users have liked, but you still don't know.&lt;br /&gt;&lt;br /&gt;There are several caveats with this approach. One of them is that we need an effective way of capturing users likes and dislikes. Most of the times we need to do this by asking users to explicitly rate items. This is the typical 1 to 5 star rating that you get in many services from Netflix to Amazon. But we know, as I commented in an &lt;a href="http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html"&gt;earlier post&lt;/a&gt;, that users are noisy when giving that feedback.&lt;br /&gt;So, because rating feedback is noisy, we are prone to make errors when predicting what a user likes or doesn't like.&lt;br /&gt;&lt;br /&gt;But, standard Collaborative Filtering has several other problems. First, because we need to compute neighbors and predictions, we need to transmit all user ratings to a centralized server and this can compromise user privacy. The number of users and items is likely to be huge and applying this approach is computationally expensive and has scalability issues. And so on...&lt;br /&gt;&lt;br /&gt;We have proposed a new approach called "Expert-based Collaborative Filtering". In this approach, instead finding neighbors from a general pool of like-minded users similar to the target, we find neighbors in an expert database. The rationale is that these experts will be much more consistent in their ratings (i.e. less noisy) and data will be less sparse.&lt;br /&gt;&lt;br /&gt;We have conducted experiments using movies and experts from &lt;a href="http://www.rottentomatoes.com/"&gt;Rotten Tomatoes&lt;/a&gt; and concluded that users prefer recommendations drawn from like-minded experts more than those predicted from (noisy) like-minded peers.&lt;br /&gt;&lt;br /&gt;In the next &lt;a href="http://www.sigir2009.org/"&gt;SIGIR 2009&lt;/a&gt; conference in Boston we will be presenting the paper entitled "The Wisdom of the Few: A Collaborative Filtering Approach Based on Expert Opinions from the Web". &lt;a href="http://xavier.amatriain.net/pubs/xamatriain_sigir09.pdf"&gt;Here&lt;/a&gt; you can access a copy of the paper where you will find a complete explanation about this new approach.&lt;br /&gt;&lt;br /&gt;Update: here are the slides I presented at SIGIR&lt;br /&gt;&lt;br /&gt;&lt;div style="width: 425px; text-align: center;" id="__ss_1814963"&gt;&lt;a style="margin: 12px 0pt 3px; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; display: block; text-decoration: underline;" href="http://www.slideshare.net/xamat/the-wisdom-of-the-few-sigir09" title="The Wisdom of the Few @SIGIR09"&gt;The Wisdom of the Few @SIGIR09&lt;/a&gt;&lt;object style="margin: 0px;" height="355" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirpresentation-090805102207-phpapp02&amp;amp;stripped_title=the-wisdom-of-the-few-sigir09"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirpresentation-090805102207-phpapp02&amp;amp;stripped_title=the-wisdom-of-the-few-sigir09" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="355" width="425"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View more &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/xamat"&gt;Xavier  Amatriain&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-1239699067818993606?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/1239699067818993606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=1239699067818993606' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1239699067818993606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1239699067818993606'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/wisdom-of-few.html' title='The Wisdom of the Few'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3560140981037850397</id><published>2009-05-17T14:27:00.000-07:00</published><updated>2009-05-17T16:06:02.935-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='madrid'/><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='www2009'/><title type='text'>WWW 2009 Conference</title><content type='html'>I have been meaning to blog about the WWW conference since it happened a few weeks back but have been pushing it back because of deadlines. In any case, I did not want to let it go by without at least writing a few lines.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www2009.org/images/Quijote.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 213px; height: 182px;" src="http://www2009.org/images/Quijote.gif" alt="" border="0" /&gt;&lt;/a&gt;The &lt;a href="http://www2009.org/"&gt;WWW09&lt;/a&gt; conference was held in Madrid, April 20-24. I have attended many conferences in my life but this was the first time I was in a WWW conference. And it was overall a really positive surprise. The WWW is a very large conference. Most of the time there are up to 8 parallel tracks in the main conference, let aside posters and other events. However, the organization was extremely good (a german colleague joked that it did not seem a spanish-organized event).&lt;br /&gt;&lt;br /&gt;Probably one of the highlights in terms of the organization was the visit of the Prince of Spain for the opening ceremony. Although he speaks perfect English, he gave a talk in Spanish because of protocol rules... weird. You can see my (very bad) recording of the opening ceremony here (&lt;a href="http://www.youtube.com/watch?v=sTn7LGN6KAU"&gt;part 1&lt;/a&gt;, part2, part3).&lt;br /&gt;&lt;br /&gt;The conference was really taken by the twitter hype. It was amazing to see people tweatting at all times and about everything. You only need to do a search for &lt;a href="http://twitter.com/#search?q=%23www2009"&gt;#www2009&lt;/a&gt; in twitter to find out (it was a trending topic for a large part of the conference). Or you can see &lt;a href="http://twitpic.com/3rzlw"&gt;this amazing picture&lt;/a&gt; of people tweating and blogging during a Flamenco concert in the conference reception.&lt;br /&gt;&lt;br /&gt;One of the interesting surprises of the conferences was the &lt;a href="http://www2009.org/developers.html"&gt;Developers Track&lt;/a&gt;. One of the bad things about conferences talk that I have complained in the past is that they seldom add anything beyond reading the conference paper. But in the Devel Track, this was not true in anyway. There were very good presentations including demos, in-depth explanations with code examples, etc... This is the track where I presented the work result of Jun's GSoC that I already explained in a &lt;a href="http://technocalifornia.blogspot.com/2009/04/multilevel-audio-descriptor-aggregator.html"&gt;previous post&lt;/a&gt;. Most of the talk was recorded and is now available on three Youtube videos (&lt;a href="http://www.youtube.com/watch?v=_-c762VK83s"&gt;part1&lt;/a&gt;, &lt;a href="http://www.youtube.com/watch?v=ww42HLa-07s"&gt;part2&lt;/a&gt;, &lt;a href="http://www.youtube.com/watch?v=lMLHYDbX-0c"&gt;part3&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Other interesting presentations from the Telefonica Research team include Pablo Rodriquez' &lt;a href="http://www.youtube.com/watch?v=wgmdU_vmXPQ"&gt;keynote&lt;/a&gt;, and Josep M. Pujol's presentation of the &lt;a href="http://www.porqpine.com/"&gt;Porqpine&lt;/a&gt; search engine.&lt;br /&gt;&lt;br /&gt;I am usually not very fond of very large conferences such as the WWW. But I have to say that I really liked it. Having so much to choose from guaranteed that  there was always something interesting to attend. And there was a great balance between academics, hackers and people from industry. Definitely, one of the conferences I want to be targetting in years to come.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3560140981037850397?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3560140981037850397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3560140981037850397' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3560140981037850397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3560140981037850397'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/05/www-2009-conference.html' title='WWW 2009 Conference'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-542241607462219995</id><published>2009-04-13T14:38:00.000-07:00</published><updated>2009-04-13T14:54:00.042-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='www'/><category scheme='http://www.blogger.com/atom/ns#' term='aggregator'/><category scheme='http://www.blogger.com/atom/ns#' term='CLAM'/><category scheme='http://www.blogger.com/atom/ns#' term='annotator'/><title type='text'>Multilevel Audio Descriptor Aggregator @WWW09</title><content type='html'>If you take a look at the &lt;a href="http://www2009.org/developers.html"&gt;Developer's track&lt;/a&gt; in next week's WWW Conference, you will see an article in which I am listed as a co-author: "&lt;a href="http://xavier.amatriain.net/pubs/wang_www09.pdf"&gt;Combining multi-level audio descriptors via web identification and aggregation&lt;/a&gt;".&lt;br /&gt;&lt;br /&gt;This is mostly the result of Jun Wang's work with &lt;a href="http://clam-project.org"&gt;CLAM&lt;/a&gt; in last year's &lt;a href="http://code.google.com/soc/"&gt;Google Summer of Code&lt;/a&gt;. Of course, a bit more of work followed after those 3 months but the bulk of the work was done on the Google grant. It must be said also that Jun is an exceptionally brilliant and hard-working student (Chinese Academy of Sciences).&lt;br /&gt;&lt;br /&gt;The paper explains how CLAM has been used to build a tool that is able to integrate many levels of audio descriptors - ranging from low-level descriptors extracted from the audio signal itself to high level descriptions gathered from the web.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://junjun2008viva.spaces.live.com/Blog/cns%219D8A38440C531493%21837.entry"&gt;In Jun's blog&lt;/a&gt;, you can read a longer explanation including a video clip of the CLAM Aggregator at work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-542241607462219995?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/542241607462219995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=542241607462219995' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/542241607462219995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/542241607462219995'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/04/multilevel-audio-descriptor-aggregator.html' title='Multilevel Audio Descriptor Aggregator @WWW09'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7058093950221602459</id><published>2009-04-06T16:08:00.000-07:00</published><updated>2009-04-06T16:14:57.793-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='barcelona'/><category scheme='http://www.blogger.com/atom/ns#' term='recsys'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Recsys 2010 in Barcelona</title><content type='html'>It has just been announced: Recsys 2010 will be held in Barcelona! And Marc Torrens (Strands) and myself will be sharing the general chair of the conference.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://recsys.acm.org/"&gt;ACM Recsys&lt;/a&gt; is the premiere annual event for researchers and practitioners on Recommender Systems. This year, the 3rd edition of the conference is taking place in New York... a great preparation for next year's conference in Barcelona.&lt;br /&gt;&lt;br /&gt;I have organized conferences before both in Barcelona and Santa Barbara and I know how much work is needed. But this is an excellent opportunity and we are sure we will be hosting an excellent event.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7058093950221602459?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7058093950221602459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7058093950221602459' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7058093950221602459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7058093950221602459'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/04/recsys-2010-in-barcelona.html' title='Recsys 2010 in Barcelona'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3809277378108491855</id><published>2009-04-02T15:34:00.000-07:00</published><updated>2009-04-06T16:07:49.889-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ratings'/><category scheme='http://www.blogger.com/atom/ns#' term='noise'/><category scheme='http://www.blogger.com/atom/ns#' term='explicit feedback'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>"I like it... I like it not" or How miss-behaved users are when giving feedback</title><content type='html'>Many recommender algorithms base their accuracy on the assumption that users are able to provide a good assessment on their preferences. This is known as "explicit feedback": users explicitly rate items and those ratings are used as the base to create their profile. The &lt;a href="http://www.netflixprize.com/"&gt;Netflix prize&lt;/a&gt;, for instance, assumes that we can predict user ratings to an acceptable level of accuracy by simply taking past ratings into account.&lt;br /&gt;&lt;br /&gt;However, it is a known fact that users are fairly inconsistent when asked whether they like or not a given item, let aside the idea of giving an accurate rating on a 1 to 5 scale. Much work has been invested in recent years to come up with better prediction algorithms but, in general, these algorithms disregard the existence of such noise in the user feedback.&lt;br /&gt;&lt;br /&gt;In a recent &lt;a href="http://xavier.amatriain.net/pubs/xamatriain_umap09.pdf"&gt;work&lt;/a&gt;, accepted to the &lt;a href="http://umap09.fbk.eu/"&gt;UMAP 2009&lt;/a&gt; conference, we focus on analyzing this noise in user feedback. In order to do this, we devised a two part test-retest  experiment in which we had users rate the same movies three times. We measure an RMSE which is roughly between 0.5 and 0.8. This value is related to the "magic barrier" in recommender systems (i.e. the minimum error a recommender system can accomplish in practice).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SdqKLOGWxFI/AAAAAAAAAEM/myzAtA1Nylk/s1600-h/fig2b.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 241px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SdqKLOGWxFI/AAAAAAAAAEM/myzAtA1Nylk/s320/fig2b.jpg" alt="" id="BLOGGER_PHOTO_ID_5321717835059610706" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Other findings include the fact that extreme ratings are more consistent than those in the middle of the scale (see figure) ; and the order in which movies are presented to users affects the consistency of the rating. Read the &lt;a href="http://xavier.amatriain.net/pubs/xamatriain_umap09.pdf"&gt;full paper&lt;/a&gt; for more details... and come to Trento for the UMAP conference for the presentation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3809277378108491855?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3809277378108491855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3809277378108491855' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3809277378108491855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3809277378108491855'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/04/i-like-it-i-like-it-not-or-how-miss.html' title='&quot;I like it... I like it not&quot; or How miss-behaved users are when giving feedback'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SdqKLOGWxFI/AAAAAAAAAEM/myzAtA1Nylk/s72-c/fig2b.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5881958961810240119</id><published>2009-02-15T13:42:00.000-08:00</published><updated>2009-03-03T03:45:40.271-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='venture'/><category scheme='http://www.blogger.com/atom/ns#' term='sylicon valley'/><category scheme='http://www.blogger.com/atom/ns#' term='startup'/><category scheme='http://www.blogger.com/atom/ns#' term='barcelona'/><title type='text'>The Barcelona Startup Scene</title><content type='html'>For some time now, Barcelona is trying to present itself as the "European Sylicon Valley". But, to contradict &lt;a href="http://www.techcrunch.com/2008/04/05/europe-is-searching-for-its-silicon-valley/"&gt;Techcrunch&lt;/a&gt;, there are many more nice things to Barcelona than its climate. First, there is an large population of very talented techies. Surely, this is in part due to the good level of the local universities (what can I say... I teach in one of them :-). Second, local people are known for being creative and open-minded, and you can breath some of that in the Barcelona air. And finally, because of its "charm", Barcelona attracts many talented and creative people from abroad. Given a choice, they'd rather work on their laptop from the sunny Barcelona beach than from a rainy place in northern Europe, I guess (so, yes... it's about the weather).&lt;br /&gt;&lt;br /&gt;Local government agencies try to support start-up through different programs. However, life is not easy for new tech companies in Barcelona. The main issue is the almost non-existent local venture capital. As Mario Nemirovsky put it in a recent conversation I had with him: there is capital, but not venture. On top of that, being a Spanish-based start-up does not make things easy to get to foreign VC money (especially US). Not to mention the excessive bureocratic burden put on new companies here in Spain.&lt;br /&gt;&lt;br /&gt;However, there are many people that are definitely putting their share to change this and make of Barcelona the real European Sylicon Valley. I will review some of them in the next paragraphs. I admit the following review is biased towards people I personally know, many of whom are actually friends. So if you feel you should be here, let your voice be heard!&lt;br /&gt;&lt;br /&gt;Mario Nemirovsky, whom I mentioned before, is one of these. Mario has founded several companies, such as &lt;a href="http://www.consentry.com/company_management.html"&gt;ConSentry&lt;/a&gt;, in the US. He is considered one of the most &lt;a href="http://www.hispanic-net.org/hispanic-net/opencms/schemas/news_data/news_0001.html"&gt;influential latinos&lt;/a&gt; in Sylicon Valley. He is currently in Barcelona on a grant but his main interest is in briging the Barcelona startup scene to the next level. As a first initiative, he helped found &lt;a href="http://www.miraveo.com/"&gt;Miraveo&lt;/a&gt;. Miraveo offers a really interesting and revolutionary approach to create spontaneous adhoc wireless networks.&lt;br /&gt;&lt;br /&gt;Also based in Barcelona, Terry Jones (@terrycojones) and Esteve Fernandez are about to release the first beta of FluidDB in their &lt;a href="http://www.fluidinfo.com/"&gt;Fluidinfo&lt;/a&gt; mind-blowing startup. Fluidinfo has been named as "the next Google"  or "world-changind" by people such as Robert Scoble or Tim O'Reily. It is hard to describe Fluidinfo in a few words so I will use their "Database meets the Wiki" lema. But if you really want to understand what they do, check the four videos in &lt;a href="http://scobleizer.com/2008/12/05/the-unfundable-world-changing-startup/"&gt;this Scobleizer post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.3scale.net/"&gt;3Scale&lt;/a&gt; is another Barcelona startup that I know pretty well. Actually, it is the only one of the ones I mention in this post where I have put some (very little) money. 3Scale provides a solution to manage all issues related to a web service. Their product is already fully functional and they have a great business model. They were selected for Techcrunch 50 and Le Web Paris this year. Need I say more? :-)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.bmat.com/"&gt;BMAT&lt;/a&gt; (Barcelona Music and Audio Technologies) is a spin-off from the &lt;a href="http://mtg.upf.edu"&gt;MTG&lt;/a&gt; Research group where I did my PhD. And therefore, I have many friends in there. They have a number of B2B products related to audio and music such as music search and recommendation, voice processing and games, or broadcasting monitoring.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.strands.com/"&gt;Strands&lt;/a&gt; also started being a music-oriented company but is now much more than that. Apart from their public product, they build custom solutions for recommendation and social networks. Curiously enought, the startup was originally started in Corvalis (OR) by Francisco and Marc, from Barcelona. But now that the company has grown, they have moved much of their activity to Barcelona (although they are still active in Corvalis and several other locations around the world).&lt;br /&gt;&lt;br /&gt;I should also highlight &lt;a href="http://www.fluendo.com/"&gt;Fluendo&lt;/a&gt; and &lt;a href="http://www.flumotion.com"&gt;Flumotion&lt;/a&gt; , companies that started out of the great &lt;a href="http://www.gstreamer.net/"&gt;GStreamer&lt;/a&gt; open source project (which, by the way, is closely related to our own &lt;a href="http://clam-project.org"&gt;CLAM&lt;/a&gt; project).&lt;br /&gt;&lt;br /&gt;I will finish by mentioning a couple of the startups from serial-enterpreneur (and friend) Otto Wust. After founding &lt;a href="http://sclipo.com/"&gt;Sclipo&lt;/a&gt; and getting it to win the European 2.0 startup in 2007 and making it to the &lt;a href="http://sclipo.com/blog/?p=57"&gt;RedHerring top 100 &lt;/a&gt;list in 2008, Otto is now focusing in a new venture. &lt;a href="http://nicepeopleatwork.com/"&gt;NicepeopleAtWork&lt;/a&gt; is offering a complete solution for video production suite for publishing web videos. Although the company is very young, they already have interesting results and important clients.&lt;br /&gt;&lt;br /&gt;Ok, and I know I am missing many more so I might be doing a Part II of this post. In any case, let your voice be heard through the comments.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5881958961810240119?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5881958961810240119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5881958961810240119' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5881958961810240119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5881958961810240119'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/02/barcelona-startup-scene.html' title='The Barcelona Startup Scene'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5556844302828394043</id><published>2009-02-15T09:54:00.001-08:00</published><updated>2009-02-15T13:39:32.144-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='publications'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='peer review'/><title type='text'>Something is moving in Scientific Peer Reviewing</title><content type='html'>&lt;pre style="font-family: arial;" wrap=""&gt;During these past few months I have had pretty bad experiences with peer reviews at conferences: unfair reviews, others that were plain wrong,&lt;br /&gt;papers considered off-topic because I did not respect what would be expected by the "community", poor or non-existent intervention of the track or PC-chair... I guess this will all sound familiar if you are in the research business. Being on the other side (the reviewer's) does not help to see things much better.&lt;br /&gt;&lt;br /&gt;There are many people now thinking that the system we have nowadays is flawed in many ways and probably things are going to change (hopefully for the better) soon.&lt;br /&gt;&lt;br /&gt;Along these lines, Michael Nielsen had an interesting &lt;a href="http://michaelnielsen.org/blog/?p=531"&gt;post in his blog&lt;/a&gt;. He talks about the three myths of scientific peer review: (1) Scientists have always used peer review; (2) peer review is reliable; and (3) Peer review is the way we determine what’s right and wrong in science.&lt;br /&gt;&lt;br /&gt;Jon Crowcroft, S. Keshav, and Nick McKeown also write &lt;a href="http://portal.acm.org/citation.cfm?id=1435417.1435430&amp;amp;coll=ACM&amp;amp;dl=ACM&amp;amp;idx=J79&amp;amp;part=magazine&amp;amp;WantType=Magazines&amp;amp;title=Communications%20of%20the%20ACM&amp;amp;CFID=21006648&amp;amp;CFTOKEN=93779101"&gt;an interesting article&lt;/a&gt; in the past issue of Communications of the ACM. The article, entitled "Scaling the academic publication process to internet scale", proposes a way to use Web 2.0 paradigms, and in particular crowdsourcing, as a way to overcome the flaws of the current system. The goals of their proposed process are stated as: (A1) Authors should not submit poor papers; (A2) Authors should become reviewers; (R1) Reviewers should submit well-substained reviews; (R2) Reviewers should not favor their friends; and (3) Reviewers should not denigrate competing papers. Although many things are still left out of their analysis, it does seem like an interesting and promising step forward.&lt;br /&gt;&lt;br /&gt;It is really interesting that just as I was reading this article I found out about Google's &lt;a href="http://code.google.com/p/gpeerreview/"&gt;gPeerreview&lt;/a&gt;. According to Google, they "intend (...) to do for scientific publishing what the world wide web has done for media publishing".&lt;br /&gt;&lt;br /&gt;Again, it is clear many of us think things are not working the way they are. Therefore I can only applaud any initiative that brings us closer to a more fair and sustainable system.&lt;br /&gt;&lt;a class="moz-txt-link-freetext" href="http://code.google.com/p/gpeerreview/"&gt;&lt;/a&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5556844302828394043?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5556844302828394043/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5556844302828394043' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5556844302828394043'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5556844302828394043'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/02/something-is-moving-in-scientific-peer.html' title='Something is moving in Scientific Peer Reviewing'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3837426188924443258</id><published>2009-02-15T08:38:00.001-08:00</published><updated>2009-02-15T09:13:06.260-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wsdm'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='wsdm09'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='web'/><title type='text'>WSDM 2009</title><content type='html'>Last week I attended the Web Search and Data Mining (&lt;a href="http://www.wsdm2009.org/"&gt;WSDM '09&lt;/a&gt;) conference in Barcelona. The conference was organized by Ricardo Baeza-Yates and the people at &lt;a href="http://research.yahoo.com/Yahoo_Research_Barcelona"&gt;Yahoo Research Barcelona&lt;/a&gt;, and it is always nice to attend a top research conference without having to take the plane.&lt;br /&gt;&lt;br /&gt;If you ever wondered what researchers at the big internet players do, this is your conference. As a matter of fact, 66% of the accepted papers had an author from Google, Microsoft or Yahoo. Given that the acceptance rate was pretty low (16%) this might come as a surprise. However, not if you think about how much research in the area of Web Search this companies have (probably more than 66% of the researchers in the area work in these three companies, especially if you count research internships).&lt;br /&gt;&lt;br /&gt;If you look at the &lt;a href="http://www.wsdm2009.org/program.php"&gt;program&lt;/a&gt;, you can see that the hottest topics seem to be, arguably: personalization, tagging, and link and click analysis. The best paper went to Fernando Diaz (Yahoo Labs Montreal) for an interesting work in integrating news content into search results. &lt;a href="http://ciir.cs.umass.edu/%7Efdiaz/fdiaz-wsdm2009.pdf"&gt;Here&lt;/a&gt;, you can read his paper (btw, all papers will be available for free download, it seems).&lt;br /&gt;&lt;br /&gt;The highlight of the conference, however, were for me two of the keynotes. First ,  Google's Jeff Dean, of &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;Map Reduce&lt;/a&gt; and &lt;a href="http://labs.google.com/papers/bigtable.html"&gt;Big Table&lt;/a&gt; fame, gave an amazing talk about the evolution of Google's systems and architecture in response to ever growing demand. After his talk I was wondering what percentage of Google's success could be directly linked to their systems development strategy (versus algorithms, interface, and business model).&lt;br /&gt;&lt;br /&gt;On the last day there was a keynote shared with the colocated &lt;a href="http://waw2009.ewi.utwente.nl/"&gt;WAW&lt;/a&gt; conference. Ravi Kumar, from Yahoo, gave an &lt;a href="http://www.wsdm2009.org/kumar_abs_bio.php"&gt;interesting overview&lt;/a&gt; of his research in Social Networks. I enjoyed his talk. However I was a bit annoyed by how lightly the idea of "causality" is used in some of these analysis works. "If you have an obese friend you have 25% more chances of being obese", said Kumar refering to an earlier work by sociologists. This confussion between co-occurrence (or homophily) and causality is a bit scary, especially coming from someone like him that has actually worked on the issue. His attempt to model causality was later described. Using the so-called (random shuffling test) he interpreted temporal precedence as causality. Yet another arguable interpretation.&lt;br /&gt;&lt;br /&gt;Finally, I have to say that I felt most talks were rather on the "boring" side. This is probably nothing to blame WSDM for. It is rather my feeling that talk-based conferences are challenging my short and volatile attention span, especially if I have a laptop on me. I will post more on this soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3837426188924443258?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3837426188924443258/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3837426188924443258' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3837426188924443258'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3837426188924443258'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/02/wsdm-2009.html' title='WSDM 2009'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5646576381969122182</id><published>2009-02-15T08:12:00.000-08:00</published><updated>2009-02-15T08:37:53.392-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='allosphere'/><category scheme='http://www.blogger.com/atom/ns#' term='ted talks'/><title type='text'>Allosphere@TED</title><content type='html'>The &lt;a href="http://www.allosphere.ucsb.edu/"&gt;Allosphere&lt;/a&gt;, the project I was  coordinating a UCSB, was presented last week at the &lt;a href="http://www.ted.com/"&gt;TED&lt;/a&gt; talks. In case you are not aware, the TED talks is something worth knowing about. TED stands for Technology, Enterntainment and Design but, since it started in 1984 its scope has become even broader. In short, TED talks aim to bring together visionary people to discuss the future. Talks verse about many different things, and every year there are a few that make headlines. This year, for instance, &lt;a href="http://www.ted.com/talks/bill_gates_unplugged.html"&gt;Bill Gates threw some mosquitoes&lt;/a&gt; at the audience to make his point on Malaria in the third world.&lt;br /&gt;&lt;a href="http://www.flickr.com/photos/tedconference/3256513545//"&gt;&lt;br /&gt;JoAnn Kuchera-Morin&lt;/a&gt;, director of the Allosphere, was able to give a short 3 minute talk about the project. And, although 3 minutes may seem a really short time, doing this in such a setting has really brought a lot of buzz around the Allosphere in the media.&lt;br /&gt;&lt;br /&gt;Some interesting posts related to the Allosphere at TED:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;At the &lt;a href="http://blog.wired.com/business/2009/02/ted-immersion-i.html"&gt;Wired blog&lt;/a&gt;&lt;/li&gt;&lt;li&gt;The Allosphere at &lt;a href="http://http://news.cnet.com/8301-11386_3-10158063-76.html"&gt;CNET&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://greyatted.blogspot.com/2009/02/session-4-intestitial-joann-kuchera.html"&gt;GreyNYC&lt;/a&gt; talking about the Allosphere&lt;/li&gt;&lt;li&gt;Post about the Allosphere at the &lt;a href="http://artofscience.wordpress.com/2009/02/10/week-of-ted-allosphere/"&gt;Art of Science&lt;/a&gt;&lt;/li&gt;&lt;li&gt;The Allosphere mentioned at the &lt;a href="http://blogs.harvardbusiness.org/now-new-next/2009/02/ted-diary-do-good-feel-good.html"&gt;Harvard Business Blog&lt;/a&gt;&lt;/li&gt;&lt;li&gt;A post in &lt;a href="http://inmycopiousfreetime.typepad.com/in_my_copious_free_time/2009/02/ted2009-day-2-session-1-see.html"&gt;"In my copious free time"&lt;/a&gt;&lt;/li&gt;&lt;li&gt;An interview with JoAnn before the TED talk at &lt;a href="http://www.nowpublic.com/tech-biz/allosphere-professor-joeann-part-1-2-university-ca-santa-barbara-21jan08-kodak-z1012"&gt;NowPublic&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Even  a post in Spanish by &lt;a href="http://www.diegoleal.org/social/blog/blogs/index.php/EduTIC"&gt;Diego Leal&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt; Hopefully all this buzz will help the project get the funding necessary to finalize such a visionary and amazing idea.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5646576381969122182?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5646576381969122182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5646576381969122182' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5646576381969122182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5646576381969122182'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2009/02/allosphereted.html' title='Allosphere@TED'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3508135293671864374</id><published>2008-11-16T15:07:00.000-08:00</published><updated>2008-11-16T15:39:52.092-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='telefonica r+d'/><category scheme='http://www.blogger.com/atom/ns#' term='events'/><category scheme='http://www.blogger.com/atom/ns#' term='open research day'/><title type='text'>Open Research Day</title><content type='html'>A couple of weeks ago we had our Telefonica Open Research Day. Finally I found some time to blog about it.&lt;br /&gt;&lt;br /&gt;It was a half-day event in which we had both talks and demos showcasing the latest developments in Telefonica's Scientific teams.&lt;br /&gt;&lt;br /&gt;In the talks we had several invited speakers: &lt;a href="http://personals.ac.upc.edu/mateo/" title="view  Mateo Valero's biography"&gt;Mateo Valero&lt;/a&gt;, Head of the Computer Architecture Department at &lt;a href="http://www.upc.edu/"&gt;UPC&lt;/a&gt; and Director of the &lt;a href="http://www.bsc.es/"&gt;Barcelona Supercomputing Center&lt;/a&gt; gave a talk on the "Future of Supercomputers"; Sandeep K. Singhal, Product Manager of the Windows Network Team gave a talk on the "Challenges of Networking in the 21st Century"; &lt;a href="http://www.mit.edu/%7Efca/index.htm" title="view  Federico Casalegno's biography"&gt;Federico Casalegno&lt;/a&gt;, head of the Design Lab at the Massachusetts Institute of Technology talked about their projects related to social mobile and information sharing; and &lt;a href="http://www.it.uc3m.es/azcorra/" title="view  Arturo Azcorra's biography"&gt;Arturo Azcorra&lt;/a&gt;, Universidad Carlos III and &lt;a href="http://www.imdea.org/Institutos/Networks/tabid/781/Default.aspx"&gt;IMDEA Networks&lt;/a&gt; talked about Internet 2.&lt;br /&gt;&lt;br /&gt;Then we had our Multimedia Scientific Director, &lt;a href="ttp://www.nuriaoliver.com/"&gt;Nuria Oliver&lt;/a&gt;, &lt;span style="text-decoration: underline;"&gt;&lt;/span&gt; talk about the challenges related to the explosion of content, seamless connectivity, and decreasing attention span from users. And &lt;a href="http://www.rodriguezrodriguez.com/"&gt;Pablo Rodriguez&lt;/a&gt;, our Internet Scientific Director, talked about using the network as a FedEx service to ship bulk data from one point in the globe ot another.&lt;br /&gt;&lt;br /&gt;I took part in a panel entitled "Search, Recommendations, and Personalization: Text and Beyond" where we also had Hugo Zaragoza from Yahoo Research, Xavier Serra from the Music Technology Group at UPF, Ferran Marques from UPC, Marc Torrens from Strands, and Alejandro Jaimes, head of the Telefonica Scientific group on Datamining and User Profiling. Xavier Serra and Ferran talked about ways to bridge the semantic gap in multimedia - in the case of music and images, respectively. Hugo talked about the challenges in Search and Marc on the power of Recommendation. Alejandro talked about how culture should be taken into account in applications and algorithms that deal with people. I gave a talk on how Recommender Systems can become an alternative to Search engines (you can find my slides &lt;a href="http://investigacion.tid.es/opendaybcn2008/documentos/presentaciones/Xavier%20Amatriain_ORD_Panel.pdf"&gt;here&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;You can find all the slides for the talks &lt;a href="http://investigacion.tid.es/opendaybcn2008/?kat=4"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Then we had a bunch of interesting demos from many of our research projects. Josep M. Pujol and his Social Search Engine prototype; Joachim Newman and his project analyzing &lt;a href="http://www.bicing.com/"&gt;Bicing&lt;/a&gt; users  behavior; Xavier Anguera presenting our project on multimodal interfaces for picture browsing on the cell; the This or That project on social interaction over the cellphone and Facebook for shopping in conjunction with MIT; Xiaoyuan Yang's Kangaroo P2P solution for video broadcasting; several projects on wireless, network...&lt;br /&gt;&lt;br /&gt;Overall a quite successful event with over 120 people from different backgrounds (universities, industry...) completing a full house.&lt;br /&gt;&lt;br /&gt;Let me know if you need further information on any project or you would like to be included in next year's list of guests.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3508135293671864374?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3508135293671864374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3508135293671864374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3508135293671864374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3508135293671864374'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/11/open-research-day.html' title='Open Research Day'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-938394467392612085</id><published>2008-11-05T15:08:00.000-08:00</published><updated>2008-11-05T16:04:26.739-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='acm multimedia conference'/><category scheme='http://www.blogger.com/atom/ns#' term='vancouver'/><title type='text'>ACM Multimedia 08</title><content type='html'>My next stop was in Vancouver for the &lt;a href="http://www.mcrlab.uottawa.ca/acmmm2008/"&gt;ACM Multimedia conference&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Vancouver is an awesome city and I was fortunate to have two friends (Alberto and Juan) living there. So I was able to attend the conference, have fun, and eat wonderful sushi, all at the same time.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SRIr-UWJLiI/AAAAAAAAADw/Dq_LBfnq3J8/s1600-h/PH_44.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 209px; height: 164px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SRIr-UWJLiI/AAAAAAAAADw/Dq_LBfnq3J8/s320/PH_44.jpg" alt="" id="BLOGGER_PHOTO_ID_5265319263963000354" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The first two days I was invited to the ACM SIGMM retreat. Around 35 top researchers from around the world gathered to discuss on the future of Multimedia. We discussed issues related to research, education, and industry relations. Overall this was probably the best part of the conference as I got to meet and talk at length to very interesting people. Socializing in such a setting was much easier than in the typical, and overcrowded, coffee breaks. (In the picture Nicolas Georganas and Wolfgang Effelsberg, chairs of one of the breakout sessions).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SRIxHc_MSiI/AAAAAAAAAD4/vCkRsIl5sAo/s1600-h/PH_42.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px; height: 150px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SRIxHc_MSiI/AAAAAAAAAD4/vCkRsIl5sAo/s200/PH_42.jpg" alt="" id="BLOGGER_PHOTO_ID_5265324918459615778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Outside of the confernce, I also got to meet &lt;a href="http://www.tbray.org/ongoing/"&gt;Tim Bray&lt;/a&gt;, currently director of Web Technologies at SUN and well-known for being one of the fathers of technologies such as XML and RDF. I admire him for this and for his views on Agile development and Open Source. I was delighted to hear from him that he is completely in favor of REST and does not believe in the Semantic Web.&lt;br /&gt;&lt;br /&gt;The conference itself was pretty good although I have to admit that I got much more out of demos and posters than out of regular paper presentations. To be honest, I have come to think that most presentations in conferences are a loss of time. Unless you  have a good presenter (and that happens around 10% of the time) you are better off reading the article. Demos and posters, however, are different as they offer a one-to-one interaction with authors.&lt;br /&gt;&lt;br /&gt;The Open Source prize was sponsored by us (Telefonica Research) and the winner was the &lt;a href="http://www.networkmultimedia.org/"&gt;Network-Integrated Multimedia Middelware&lt;/a&gt;, a pretty amazing piece of sofware for addressing multimedia devices over the network. It is great to see such awesome projects getting the price we won in 2006 for &lt;a href="http://clam.iua.upf.edu/"&gt;CLAM&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-938394467392612085?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/938394467392612085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=938394467392612085' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/938394467392612085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/938394467392612085'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/11/acm-multimedia-08.html' title='ACM Multimedia 08'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SRIr-UWJLiI/AAAAAAAAADw/Dq_LBfnq3J8/s72-c/PH_44.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7397680966315212958</id><published>2008-11-05T14:40:00.000-08:00</published><updated>2008-11-05T14:59:31.813-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='strands'/><category scheme='http://www.blogger.com/atom/ns#' term='recsys'/><category scheme='http://www.blogger.com/atom/ns#' term='netflix prize'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Recsys 08</title><content type='html'>Last week I attended the &lt;a href="http://recsys.acm.org/"&gt;2nd ACM Conference on Recommender Systems&lt;/a&gt; in Lausanne. Regardless of being just in its second edition the Recsys conference is already showing signs of becoming a top tier conference very soon. 121 submissions (a 100% increase over the first edition) and a 31% acceptance rate (that is including short papers and posters) are indeed very promising figures. Of course this brings together an overall increase in quality in all &lt;a href="http://recsys.acm.org/program.html"&gt;accepted papers&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Another highlight of the conference were the three very interesting tutorials on the first day. Robin Burke talked about robustness in Recommender Systems. Yehuda Koren, of &lt;a href="http://www.netflixprize.com/"&gt;Netflix Prize&lt;/a&gt; world fame, gave an interesting tutorial on the approach they are using for staying at the top of the Netflix Challenge &lt;a href="http://www.netflixprize.com/leaderboard"&gt;leader board&lt;/a&gt;. Yehuda has now left AT&amp;amp;T and joined Yahoo Research in Israel so it is unclear how much he will continue working on the prize from now on. Finally, Gediminas Adomavicius talked about context-aware Recommender Systems in another very interesting tutorial.&lt;br /&gt;&lt;br /&gt;One of the interesting differences in Recsys in relation to other conferences in related fields is the high percentage of industry participants (around 50%). The guys from &lt;a href="http://www.strands.com"&gt;Strands&lt;/a&gt; have been doing an amazing job of making sure this does not become a purely academic conference. This year they even offered a $100,000 prize for the best start-up idea related to Recommender Systems. The winners, also co-leaders of the Netflix prize, presented an IPTV recommender, which reminded me a lot of some of the work we are doing in Telefonica R&amp;amp;D. You can read more about the prize in Strands' &lt;a href="http://blog.strands.com/2008/10/24/gravity-winner-strands-100k-call/"&gt;blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;At the end of the conference we made a bid to bring Recsys to Barcelona in 2010 (next year is in New York). If we get selected I will be co-chairing the conference with Francisco Martin, Strands' CEO. Stay tuned for more info on how this develops.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7397680966315212958?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7397680966315212958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7397680966315212958' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7397680966315212958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7397680966315212958'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/11/recsys-08.html' title='Recsys 08'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2988213918246634356</id><published>2008-10-08T05:18:00.000-07:00</published><updated>2008-10-08T05:27:23.227-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='teaching'/><category scheme='http://www.blogger.com/atom/ns#' term='software engineering'/><title type='text'>Back to teaching</title><content type='html'>A couple of weeks ago the course started again. After some minor seminars last year I have gone to "real" teaching at the &lt;a href="http://www.upf.edu"&gt;UPF&lt;/a&gt; university this year. I will teach some classes in the Information Retrieval class and I am responsible for the Software Engineering course to 3rd year CS students.&lt;br /&gt;&lt;br /&gt;I used to teach this course before going to the US and going back to it is really enjoyable. The syllabus is divided in two parts: in the first one we cover issues related to the Software Process, Methodologies, and Requirements Engineering; in the second one we focus on advanced object-oriented analysis and design.&lt;br /&gt;&lt;br /&gt;On the practical side, students do a complete life cycle simulation starting with requirements engineering but quickly goint into iterative development including a big focus on test-driven development and patterns.&lt;br /&gt;&lt;br /&gt;This year I have pushed some more content on the "methodologies" side. We focus quite a bit on Agile methodologies (especially Scrum and XP) but also study things like RUP or CMMI.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://xavier.amatriain.net/es1"&gt;Here&lt;/a&gt; is the course website with lots of materials (albeit all of them in Catalan)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2988213918246634356?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2988213918246634356/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2988213918246634356' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2988213918246634356'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2988213918246634356'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/10/back-to-teaching.html' title='Back to teaching'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3317177751737315965</id><published>2008-10-07T16:13:00.000-07:00</published><updated>2008-10-07T16:20:18.831-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='job'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Postdoc position in Recommender Systems</title><content type='html'>&lt;span style="font-size:100%;"&gt;&lt;span style="font-family:georgia;"&gt;Looking for people to work with, maybe you are interested...&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-family:georgia;"&gt;The research group on Multimedia in Telefonica Research* Barcelona invites for applications for a Postdoc position in the area of Recommender Systems.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;We are looking for dynamic, creative, and resourceful individuals to join our research efforts in designing the next-generation of Recommender Systems and Algorithms. Our research impacts all areas of the company, including projects related to IPTV, web, mobile, or internet content distribution. The successful candidate will join a multi-disciplinary team of scientists dedicated to advance and use computational methods to solve challenging user-oriented problems.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;The applicant should have a Ph.D. degree in Computer Science, Applied Mathematics, Statistics, or other related scientific disciplines, combined with strong computational modeling and/or algorithmic skills. Knowledge and experience in additional areas such as statistical data&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; analysis, data mining, machine learning and pattern recognition, and other topics in artificial intelligence are desirable.  Experience in the Recommender Systems field will be taken into account but it is not strictly necessary.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Although this particular position is designed for a postdoctoral candidate the group is also actively seeking for doctoral students in this area that might be interested in finishing their Thesis in Telefonica Research or in doing a research internship. We will also take into consideration strong candidates that are looking for a research position or senior research position. If you are in either of these situations please do not hesitate to apply.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;We offer competitive salary and benefits and a great working atmosphere in beautiful Barcelona (Spain).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Screening of applications will begin immediately and continue until the position is filled. An initial appointment for a one year term is anticipated with the possibility of reappointment.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Inquiries and applications should be sent to&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Xavier Amatriain &lt;/span&gt;&lt;a style="font-family: georgia;" class="moz-txt-link-rfc2396E" href="mailto:xar@tid.es"&gt;&lt;xar@tid.es&gt;&lt;/xar@tid.es&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;cc Nuria Oliver &lt;/span&gt;&lt;a style="font-family: georgia;" class="moz-txt-link-rfc2396E" href="mailto:nuriao@tid.es"&gt;&lt;nuriao@tid.es&gt;&lt;/nuriao@tid.es&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;with the subject line "RS Application"&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3317177751737315965?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3317177751737315965/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3317177751737315965' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3317177751737315965'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3317177751737315965'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/10/postdoc-position-in-recommender-systems.html' title='Postdoc position in Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2214279096883861902</id><published>2008-09-23T03:45:00.000-07:00</published><updated>2008-09-23T03:59:31.786-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Latest research on Recommender Systems</title><content type='html'>In the past months I have focused my activity on Research in Recommender Systems. Although I have worked in many more projects so far we have 3 publications ready and undergoing a submission process. Although I cannot give any details until they are accepted I thought it would be interesting to post what they are about. Just to give you a feeling of the kind of work we are doing.&lt;br /&gt;&lt;br /&gt;In "&lt;span style="font-weight: bold;"&gt;The Wisdom of the Few: Using Experts to Predict Ratings from the Crowds&lt;/span&gt;" - written with Neal Lathia (&lt;a href="http://www.ucl.ac.uk/"&gt;UCL&lt;/a&gt;) and Haewoon Kwak (&lt;a href="http://www.kaist.edu/edu.html"&gt;Kaist&lt;/a&gt;) - we devise a variation over traditional collaborative filtering in which neighbors are sought from a database of "experts". We try to predict user opinions by simply using a very reduced number of expert opinions.&lt;br /&gt;&lt;br /&gt;In "&lt;span style="font-weight: bold;"&gt;Collaborative Filtering With Adaptive Information Sources&lt;/span&gt;" - again with Neal Lathia but also with Josep M. Pujol from Telefonica Research - we study how the problem of collaborative filtering can be turned into a problem of data classification. Instead of using an ensemble of algorithms we propose to use an ensamble of data sources.&lt;br /&gt;&lt;br /&gt;Finally in "&lt;span style="font-weight: bold;"&gt;I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems&lt;/span&gt;" - with Josep M. Pujol and Nuria Oliver from Telefonica Research - we evaluate user natural variability when giving feedback through ratings. In our study we measure different effects and discuss how this might be related to the "magic barrier" in Recommender Systems.&lt;br /&gt;&lt;br /&gt;More details soon... if they are accepted :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2214279096883861902?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2214279096883861902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2214279096883861902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2214279096883861902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2214279096883861902'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/09/latest-research-on-recommender-systems.html' title='Latest research on Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2768940536170211111</id><published>2008-09-22T15:40:00.000-07:00</published><updated>2008-09-23T03:45:14.439-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='techtransfer'/><category scheme='http://www.blogger.com/atom/ns#' term='models'/><title type='text'>Models of R&amp;D</title><content type='html'>In a recent presentation I had to explain what the different models of industrial R&amp;amp;D are in IT. My goal was to explain the basics in a few line so I came up with this idea of presenting R&amp;amp;D organizational models in terms of 3 simple and distinct proto-models.&lt;br /&gt;&lt;br /&gt;I call them the &lt;span style="font-style: italic;"&gt;Microsoft&lt;/span&gt; model, the &lt;span style="font-style: italic;"&gt;IBM&lt;/span&gt; model, and the &lt;span style="font-style: italic;"&gt;Google&lt;/span&gt; model. However, before presenting them I need to stress a few disclaimers:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Overall what I am presenting here is a gruesome over-simplification. Reality is much more complex than this. However, I think these models sample the solution space pretty well. I think that any other R&amp;amp;D model can be understood as a variation or combination of these 3.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I use company names for the models because from what I know they are pretty good representations of what goes on in these companies on average. But again, things are more complex than this and in these very large companies there is space for everything and you can surely find counter-examples where everything works differently.&lt;/li&gt;&lt;li&gt;As you can read for instance &lt;a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;amp;articleId=9108098"&gt;here&lt;/a&gt;, large companies are in fact transitioning and describing their model is indeed a moving target.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-size:130%;"&gt;The &lt;span style="font-style: italic;"&gt;Microsoft&lt;/span&gt; model&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.google.es/url?sa=t&amp;amp;source=web&amp;amp;ct=res&amp;amp;cd=1&amp;amp;url=http%3A%2F%2Fresearch.microsoft.com%2F&amp;amp;ei=XCTYSNfxLqby0QS4uvmXDQ&amp;amp;usg=AFQjCNEuMOmMwPgltZgdL1Utu0CIJ2srkg&amp;amp;sig2=ieYaAJBnDncM18cV06pUwA"&gt;Microsoft Research&lt;/a&gt; is a separate company from Microsoft&lt;/li&gt;&lt;li&gt;MS Research is entirely devoted to research&lt;/li&gt;&lt;li&gt;There are many more researchers than developers&lt;/li&gt;&lt;li&gt;There is an in-house techtransfer team&lt;/li&gt;&lt;li&gt;Techtransfer model&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Project always started on the Researh side&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Inhouse techtransfer takes upon interesting research project&lt;/li&gt;&lt;li&gt;Build prototype and convince product/marketing teams&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SNgfUEWcWsI/AAAAAAAAACQ/KklD_AmhoU8/s1600-h/MicrosoftModel.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 367px; height: 127px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SNgfUEWcWsI/AAAAAAAAACQ/KklD_AmhoU8/s320/MicrosoftModel.png" alt="" id="BLOGGER_PHOTO_ID_5248979795326032578" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:130%;"&gt;The &lt;span style="font-style: italic;"&gt;IBM&lt;/span&gt; model&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.research.ibm.com/"&gt;IBM Research&lt;/a&gt; is also separated from Development (a la MS)&lt;/li&gt;&lt;li&gt;But, when a Research project is successful&lt;/li&gt;&lt;ul&gt;&lt;li&gt;The researchers are moved into the D team to help develop/coordinate&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Extreme patent incentives not so many for publications&lt;/li&gt;&lt;li&gt;Constant feedback from Development to Research on what they need&lt;/li&gt;&lt;/ul&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SNgfodXT8TI/AAAAAAAAACY/F4n9HRbi214/s1600-h/IBMModel.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 365px; height: 176px;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SNgfodXT8TI/AAAAAAAAACY/F4n9HRbi214/s320/IBMModel.png" alt="" id="BLOGGER_PHOTO_ID_5248980145637945650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;The &lt;span style="font-style: italic;"&gt;Google&lt;/span&gt; model&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Google has many PhDs/Researchers&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Many doing advanced development&lt;/li&gt;&lt;li&gt;Others doing applied Research&lt;/li&gt;&lt;li&gt;Very little incentive for publication/patent&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Idea: research “embedded” into everyday product development&lt;/li&gt;&lt;li&gt;However: &lt;a href="http://research.google.com/"&gt;small research groups&lt;/a&gt; are being created with some particular focus&lt;/li&gt;&lt;/ul&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/SNgf41oS3aI/AAAAAAAAACg/uF9oPm00fTo/s1600-h/GoogleModel.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 273px; height: 188px;" src="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/SNgf41oS3aI/AAAAAAAAACg/uF9oPm00fTo/s320/GoogleModel.png" alt="" id="BLOGGER_PHOTO_ID_5248980427029536162" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is hard to say which one (if any) is the best model. However some of the results, again oversimplifying, are clear:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The Microsoft model results in low techtransfer and little innovation coming from research into the products. However, they have high publication rates and happy researchers.&lt;/li&gt;&lt;li&gt;The IBM model produces moderate techtransfer and research-related innovation. They do have a high patent-rate but that has more to do with the quality of lawyers than the quality of research.&lt;/li&gt;&lt;li&gt;Google has a high techtransfer activity (everthing is techtransfer). However they have little publication activity so some might say they are not really doing research.&lt;/li&gt;&lt;/ul&gt;Again, this is a very much simplified picture of reality. But I would like to hear some feedback/comments on what you think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2768940536170211111?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2768940536170211111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2768940536170211111' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2768940536170211111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2768940536170211111'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/09/models-of-r.html' title='Models of R&amp;D'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SNgfUEWcWsI/AAAAAAAAACQ/KklD_AmhoU8/s72-c/MicrosoftModel.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-1731591152667613654</id><published>2008-08-07T16:20:00.000-07:00</published><updated>2008-08-07T09:51:59.390-07:00</updated><title type='text'>Agile Research</title><content type='html'>The 3 main features of any agile methodology are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Iterative development&lt;/li&gt;&lt;li&gt;Focus on executable software&lt;/li&gt;&lt;li&gt;Adaptation to change&lt;/li&gt;&lt;li&gt;Focus on individuals&lt;/li&gt;&lt;/ul&gt;But what does all this have to do with Scientific research? My hypothesis, which I presented in a talk recently, is that research is inherently agile. Research teams can and should indeed apply many  of the agile principles in their day to day work.&lt;br /&gt;&lt;br /&gt;If we take a look at the traditional Scientific Method process (see figure below) we can see that it shares many features with agile methodologies. Above all, the scientific method is inherently iterative. In every iteration we set a hypothesis, test it, and analyze the results.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/SIz2ux6z7QI/AAAAAAAAACI/SESYq7wHPao/s1600-h/overview_scientific_method2.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 253px; height: 242px;" src="http://1.bp.blogspot.com/_xAtUP4Gu6Zk/SIz2ux6z7QI/AAAAAAAAACI/SESYq7wHPao/s200/overview_scientific_method2.gif" alt="" id="BLOGGER_PHOTO_ID_5227824551004728578" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In this sense, it resembles most of Agile process definitions such as Scrum or XP: the first part of the process deals with setting up the general questions. This is similar to Scrum's sprint 0 in order to define the product backlog or to XP's Planning Game. Then we start doing iterations, or sprints, in which we ask smaller questions or decide which of the original hypothesis we are going to test.&lt;br /&gt;&lt;br /&gt;Finally, it is interesting to note that the whole process is hypothesis-driven. This is very similar to the Test-driven development practice promoted by XP'ers. And there are many other practices (such as Peer programming or Simple Design, equivalent to Occam's Razor) that have a direct mapping between agile software development and Scientific Research.&lt;br /&gt;&lt;br /&gt;Based on this I propose an agile scientific method in which:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Use an Iteration 0 (or XP planning game or Scrum Sprint 0) to build up general hypothesis and list those "stories" that you would like to have in your final article (optionally writing the article stub already).&lt;/li&gt;&lt;li&gt;Every 1-2 weeks come up with a list o prioritized taks&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;List all possible tasks (ideally 1-2 day workload)&lt;/li&gt;&lt;li&gt;Measure interest of task towards final goal&lt;/li&gt;&lt;li&gt;Measure cost in terms of predicted hours of work&lt;/li&gt;&lt;li&gt;List them in order of priority value (= interest - cost)&lt;/li&gt;&lt;li&gt;In the iteration planning re-evaluate general hypothesis.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Plan stories as an "executable hypothesis": if I can prove hypothesis H then it should happen that tests t1 and t2 should pass.&lt;/li&gt;&lt;li&gt;Maintain collection of tests as a record of experimetns.&lt;/li&gt;&lt;li&gt;Use the red-green-refactor cycle of Test-driven development to refine model while still complying with experimental data.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;If you are interested you can read the full &lt;a href="http://xavier.amatriain.net/docs/SeminarAgileScience.pdf"&gt;presentation&lt;/a&gt; that Gemma Hornos and I did at a recent seminar where she also presented some agile initiatives in Telefonica.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-1731591152667613654?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/1731591152667613654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=1731591152667613654' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1731591152667613654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1731591152667613654'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/06/agile-research.html' title='Agile Research'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xAtUP4Gu6Zk/SIz2ux6z7QI/AAAAAAAAACI/SESYq7wHPao/s72-c/overview_scientific_method2.gif' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8458669799265706581</id><published>2008-07-27T15:46:00.000-07:00</published><updated>2008-07-27T15:52:08.030-07:00</updated><title type='text'>It's so hard to blog...</title><content type='html'>when you get used to twitting!&lt;br /&gt;&lt;br /&gt;I have around 6 unfinished posts and I find it really hard to get the time to polish them to a bloggable level. Investing that time in finishing a blog post is hard when you know that in the same time you could probably send 10 interesting microposts in twitter.&lt;br /&gt;&lt;br /&gt;In any case, it's not that I like twitter that much. I like the idea of microblogging but now twitter is turning into everything else: post board, public mailing list... It is really hard to find interesting twitters and most of them are more and more using it as a social post board instead (I cannot exclude myself from this category either).&lt;br /&gt;&lt;br /&gt;In any case, I hope to get a couple of this blog posts finished soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8458669799265706581?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8458669799265706581/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8458669799265706581' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8458669799265706581'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8458669799265706581'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/07/its-so-hard-to-blog.html' title='It&apos;s so hard to blog...'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8430396338540150505</id><published>2008-06-08T15:17:00.000-07:00</published><updated>2008-06-08T15:46:41.288-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='3D scene'/><category scheme='http://www.blogger.com/atom/ns#' term='standard'/><category scheme='http://www.blogger.com/atom/ns#' term='description language'/><category scheme='http://www.blogger.com/atom/ns#' term='3D audio'/><title type='text'>Standards for 3D Scene Description</title><content type='html'>What can you do if you want to describe a 3D scene including audio and graphics using some kind of standard? This is the result of a quick survey we did for &lt;a href="http://clam.iua.upf.edu/"&gt;CLAM&lt;/a&gt;, feedback is welcomed!&lt;br /&gt;&lt;br /&gt;There are several standards and open languages to describe 3D graphics, but what if you want to add 3D audio description into the scene? Well, that limits your choices but there are still some possibilities.&lt;br /&gt;&lt;br /&gt;First, you can take a look at MPEG4's &lt;a href="http://www.chiariglione.org/mpeg/technologies/mp04-bifs/index.htm"&gt;BIFS&lt;/a&gt; and &lt;a href="http://sound.media.mit.edu/mpeg4/sa-bifs.html"&gt;AudioBIFS&lt;/a&gt;. BIFS is in fact an extension of &lt;a href="http://en.wikipedia.org/wiki/VRML"&gt;VRML&lt;/a&gt;. MPEG4's SAOL (Structured Audio Orchestra Language) can also be included inside AudioBIFS. Unfortunately I don't know of any Open Source reference implementation of this, although parts of it have been implemented, for instance, in Ross Bencina's &lt;a href="http://www.audiomulch.com/%7Erossb/code/sa/SAQuickref.htm"&gt;Audio Mulch&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.web3d.org/x3d/specifications/"&gt;X3D&lt;/a&gt; is a W3 standard that is also a sucessor of VRML (with the advantage of using an new XML  format), in which the sound is  &lt;a href="http://www.tml.tkk.fi/Opinnot/Tik-111.590/2002s/Paperit/pohja_x3d_sound_OK.pdf"&gt;integrated too&lt;/a&gt;. There are no complete implementations of X3D available and all of them are written in Java. X3D shares a lot with BIFS. As a matter of fact the  MPEG standard already includes a link to &lt;a href="http://en.wikipedia.org/wiki/X3D"&gt;X3D&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Another possibility is using &lt;a href="http://verse.blender.org/vml/"&gt;VML&lt;/a&gt; from the Verse project. This project is very much related to &lt;a href="http://www.blender.org/"&gt;Blender&lt;/a&gt;. Using VML can tie you into their data model. However Verse is released with a BSD license.&lt;br /&gt;&lt;br /&gt;Finally, SpatDIF, is an extension of the SDIF format for sound interchange. It uses OSC for real-time communication and SDIF as the intermediate file format. There are no available implementations yet and it is unclear whether it can be easily extended to include 3D graphics. However, SDIF can be &lt;a href="http://archive.cnmat.berkeley.edu/ICMC99/papers/saol+sdif/icmc99-saol+sdif.pdf"&gt;transcoded into SAOL&lt;/a&gt; and, as already mentioned, SAOL can be included inside MPEG4's BIFS.&lt;br /&gt;&lt;br /&gt;So, in summary, it looks as there are several competing and complementary efforts but none is sufficiently mature yet. Again, any feedback on related experiences will be appreciated.&lt;br /&gt;&lt;a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/X3D"&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8430396338540150505?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8430396338540150505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8430396338540150505' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8430396338540150505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8430396338540150505'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/06/standards-for-3d-scene-description.html' title='Standards for 3D Scene Description'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7578864160144816941</id><published>2008-05-26T22:58:00.000-07:00</published><updated>2008-05-26T14:27:36.792-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='agile'/><category scheme='http://www.blogger.com/atom/ns#' term='scrum'/><category scheme='http://www.blogger.com/atom/ns#' term='extreme Programming'/><category scheme='http://www.blogger.com/atom/ns#' term='company'/><title type='text'>Agile Methodologies in Telefonica R&amp;D</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDsiLSBH0PI/AAAAAAAAABc/VExmXp1CwiQ/s1600-h/260px-LetsAgile1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 146px; height: 63px;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDsiLSBH0PI/AAAAAAAAABc/VExmXp1CwiQ/s320/260px-LetsAgile1.png" alt="" id="BLOGGER_PHOTO_ID_5204791371567190258" border="0" /&gt;&lt;/a&gt;Some time ago I started having interesting but informal conversations with a manager in Telefonica R&amp;amp;D who was also interested in Software Engineering and methodologies. My point was that in a company like ours where innovation is a key issue and almost all development is geared toward bleeding edge products, Agile methodologies should be used. Besides, development teams are usually small, requirements are not clear at the beginning, and there is a need for having working prototypes soon in the development cycle... there was no doubt.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/SDsigyBH0QI/AAAAAAAAABk/wlZm9Odcmvg/s1600-h/290px-WidSBDC.PNG"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_xAtUP4Gu6Zk/SDsigyBH0QI/AAAAAAAAABk/wlZm9Odcmvg/s200/290px-WidSBDC.PNG" alt="" id="BLOGGER_PHOTO_ID_5204791740934377730" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDsjISBH0SI/AAAAAAAAAB0/LuZawJDNR7A/s1600-h/CalendarioPlanning.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDsjISBH0SI/AAAAAAAAAB0/LuZawJDNR7A/s200/CalendarioPlanning.png" alt="" id="BLOGGER_PHOTO_ID_5204792419539210530" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SDsitiBH0RI/AAAAAAAAABs/9qpw45-e4ww/s1600-h/400px-Qualia1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/SDsitiBH0RI/AAAAAAAAABs/9qpw45-e4ww/s200/400px-Qualia1.png" alt="" id="BLOGGER_PHOTO_ID_5204791959977709842" border="0" /&gt;&lt;/a&gt;After not much we managed to convince some key people, including our Methodologies Division (who had had bad experiences in the past with Agile).&lt;br /&gt;&lt;div style="text-align: left;"&gt;Although I was much for starting with &lt;a href="http://www.extremeprogramming.org/"&gt;eXtreme Programming&lt;/a&gt; and do a bottom up approach it soon became obvious for them that in such a large organization as ours they needed a slightly more structured approach with a focus more on management and less on development. That is why we chose to go for &lt;a href="http://www.controlchaos.com/"&gt;Scrum&lt;/a&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Actually, what we are deploying is a mixture of Scrum for the management layer and xP practices on the development layer. This seems to be a favored approach nowadays in many companies (see &lt;a href="http://www.controlchaos.com/about/xp.php"&gt;here&lt;/a&gt;, &lt;a href="http://xpday3.xpday.org/slides/XPScrumPresentationHandouts.pdf"&gt;here&lt;/a&gt; or &lt;a href="http://www.controlchaos.com/download/Primavera%20White%20Paper.pdf"&gt;here&lt;/a&gt;, for example).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDskDSBH0TI/AAAAAAAAAB8/8LrOVfiYopU/s1600-h/TeamMeetingScrum2-2.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDskDSBH0TI/AAAAAAAAAB8/8LrOVfiYopU/s200/TeamMeetingScrum2-2.JPG" alt="" id="BLOGGER_PHOTO_ID_5204793433151492402" border="0" /&gt;&lt;/a&gt;Just a couple of months afterwards we have more than 10 projects that are successfully working Agile. And so far everything seems to be positive: developers feel better and even some of our clients are now turning to us for implementing agile methodologies in their companies. It is still a bit soon but things look bright!&lt;br /&gt;&lt;br /&gt;Bottom line, be careful with informal conversations you have by the coffee machine at work... they might become true and end up having a huge impact :-)&lt;br /&gt;&lt;br /&gt;ps. I hope I have some time to blog a bit more about how to combine xP and Scrum but I have many posts in the queue right now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7578864160144816941?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7578864160144816941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7578864160144816941' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7578864160144816941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7578864160144816941'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/05/agile-methodologies-in-telefonica-r.html' title='Agile Methodologies in Telefonica R&amp;D'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_xAtUP4Gu6Zk/SDsiLSBH0PI/AAAAAAAAABc/VExmXp1CwiQ/s72-c/260px-LetsAgile1.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-4028005556812554891</id><published>2008-05-13T16:59:00.001-07:00</published><updated>2008-05-25T14:51:33.443-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='barcelona'/><category scheme='http://www.blogger.com/atom/ns#' term='web'/><category scheme='http://www.blogger.com/atom/ns#' term='being digital'/><title type='text'>Digital Beers</title><content type='html'>One of the coolest things of "being digital" (i.e. having a blog, twittering...) is how much of that digital activity ends up bleeding into your analog world. Every now and then people that know about me from the web get in touch and tell me that they are going to be around Barcelona. This way I have met really interesting people and projects.&lt;br /&gt;&lt;br /&gt;So, if you are going to be around Barcelona (or any other place that I might be visiting) and want to chat while having a non-digital beer please feel free to get in touch. I am always up for that!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-4028005556812554891?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/4028005556812554891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=4028005556812554891' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4028005556812554891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/4028005556812554891'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/05/digital-beers.html' title='Digital Beers'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7857894976302121106</id><published>2008-04-20T16:27:00.000-07:00</published><updated>2008-04-20T16:39:39.186-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='course'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>A Course on Recommender Systems</title><content type='html'>Last week I gave an internal course on Recommender Systems in Telefonica. Although I only had 12 hours I ended up preparing a course that could well expand over at least 24 hours of class. Except from some details I am pretty happy with the syllabus I came up with. Just in case it might be of any help to anyone, this is the index of the course:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;PART I. Introduction to Recommender Systems&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The paradox of choice&lt;/li&gt;&lt;li&gt;What is a Recommender System? &lt;/li&gt;&lt;ol&gt;&lt;li&gt; The recommender problem&lt;/li&gt;&lt;li&gt;General scheme of a RS&lt;/li&gt;&lt;li&gt;Tools of the trade&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Approaches to Recommendation&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Collaborative Filtering&lt;/li&gt;&lt;ol&gt;&lt;li&gt;User-based&lt;/li&gt;&lt;li&gt;  Item-based&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;  Memory-based&lt;/li&gt;&lt;li&gt; Content-based&lt;/li&gt;&lt;li&gt; Other approaches&lt;/li&gt;&lt;ol&gt;&lt;li&gt;  Demographic Methods&lt;/li&gt;&lt;li&gt;  Utility Methods&lt;/li&gt;&lt;li&gt;  Knowledge-based&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Hybrid approaches&lt;/li&gt;&lt;/ol&gt;&lt;ol&gt;&lt;ol&gt;&lt;li&gt;   Weighted&lt;/li&gt;&lt;li&gt;  Switching&lt;/li&gt;&lt;li&gt;  Mixed&lt;/li&gt;&lt;li&gt;  Feature Combination&lt;/li&gt;&lt;li&gt;  Cascade&lt;/li&gt;&lt;li&gt;  Feature Augmentation&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;li&gt;Evaluating RS&lt;/li&gt;&lt;li&gt;Personalized Search&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;Part II. Data mining for RS&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Introduction&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Why mine data?&lt;/li&gt;&lt;li&gt;Data mining tasks&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Data Preprocessing&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Types of data&lt;/li&gt;&lt;li&gt;Problems with data&lt;/li&gt;&lt;li&gt; Aggregation  &lt;/li&gt;&lt;li&gt;Sampling&lt;/li&gt;&lt;li&gt; Reducing dimensionality (SVD)&lt;/li&gt;&lt;li&gt; Feature Selection&lt;/li&gt;&lt;li&gt;Discretization and Binarization&lt;/li&gt;&lt;li&gt; Variable Transformation&lt;/li&gt;&lt;li&gt; Feature Selection&lt;/li&gt;&lt;/ol&gt;&lt;li&gt; Distance Measures&lt;/li&gt;&lt;li&gt;Classification&lt;/li&gt;&lt;ol&gt;&lt;li&gt;General Approach&lt;/li&gt;&lt;li&gt;Decision Trees&lt;/li&gt;&lt;li&gt;Rule-based&lt;/li&gt;&lt;li&gt;Nearest-Neighbor&lt;/li&gt;&lt;li&gt;Bayesian Classifiers&lt;/li&gt;&lt;li&gt;Artificial Neural Networks&lt;/li&gt;&lt;li&gt;Support Vector Machines&lt;/li&gt;&lt;li&gt;Ensambles of classifyiers&lt;/li&gt;&lt;li&gt;Issues in classifyiers&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Model Overfitting&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Evaluation of Classifiers&lt;/li&gt;&lt;li&gt;Comparing Classifiers&lt;/li&gt;&lt;li&gt; Metrics for classifyiers&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Cluster Analysis&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Introduction&lt;/li&gt;&lt;li&gt;K-means&lt;/li&gt;&lt;li&gt;DBSCAN&lt;/li&gt;&lt;li&gt;Cluster Validation&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Association Analysis&lt;/li&gt;&lt;ol&gt;&lt;li&gt;Frequent Itemset Generation and the Apriori Principle&lt;/li&gt;&lt;li&gt;Rule Generation&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;PART III. Designing a RS&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Defining the problem&lt;/li&gt;&lt;li&gt;Working with the data&lt;/li&gt;&lt;li&gt;Taking context into account&lt;/li&gt;&lt;li&gt;The decision process&lt;/li&gt;&lt;li&gt;Presenting results&lt;/li&gt;&lt;li&gt;Some notes on domain-specific adaptation&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7857894976302121106?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7857894976302121106/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7857894976302121106' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7857894976302121106'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7857894976302121106'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/04/course-on-recommender-systems.html' title='A Course on Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-1127841600666062697</id><published>2008-04-01T15:11:00.000-07:00</published><updated>2008-04-01T15:21:24.794-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='web'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>The Future of Web Search Workshop</title><content type='html'>&lt;a href="http://research.yahoo.com/Yahoo_Research_Barcelona"&gt;Yahoo! Researh Barcelona&lt;/a&gt; and the &lt;a href="http://grupoweb.upf.es/WRG/"&gt;UPF&lt;/a&gt; are organizing a very interesting two-day &lt;a href="http://grupoweb.upf.es/tfws08/"&gt;workshop&lt;/a&gt; in Andorra starting on Thursday. The workshop is the third of a series that, under the same name of "Future of Web Search", started in 2006. I will be giving a talk entitled &lt;span class="_destacado"&gt;"Search and Recommendation: two sides of the same coin?". Below is the abstract that can give you an idea of what I will be talking about:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;     Recently the field of Recommender Systems has gained growing popularity     among the research community with new conferences such as the     ACM Recsys going into its 2ond edition and established conferences     such as SIGKDD or SIGCHI focusing a great deal of attention on this     topic.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;     The Recommendation field started from a different background than     web search, namely Data Mining and HCI versus Information Retrieval.     While the goal of Recommendation Systems is to optimize a     fitness function between content and users by "discovering" hidden     relations in the data, Search Engines focus on "retrieving" pre-existing     data.&lt;br /&gt;&lt;br /&gt;  However there are clear trends that point to both fields coming closer     together. On the one hand, web search is becoming more and more     personalized, highlighting the need for user profiling and collaborative     filtering. On the other hand, it is becoming clear that in many cases     search strategies are essential for the performance of Recommender     Systems.&lt;br /&gt;&lt;br /&gt;  As a result, some claim that search is just a "simpler form of recommendation",     where the fitness function to be optimized is that of a     generic average user (e.g. using algorithms such as Page Rank) Obviously     statements in the opposite direction can also be made.     In this talk we will assume that the audience is familiar with Web     Search systems and therefore we will focus on describing the basic techniques     and current research trends in Recommender Systems, highlighting     where and how they are similar or different.     At the end of the talk, We hope to convey the message that the "Future     of Web Search is in Recommendation", hoping that such a claim will     spark an interesting discussion and debate throughout the workshop. &lt;br /&gt;&lt;/span&gt; &lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;a href="http://grupoweb.upf.es/tfws08/program_full.html"&gt;Here&lt;/a&gt; you can see the detailed workshop program with very interesting speakers including Yahoo's own CDO, &lt;/span&gt;&lt;/span&gt;     Usama Fayyad.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-1127841600666062697?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/1127841600666062697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=1127841600666062697' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1127841600666062697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/1127841600666062697'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/04/future-of-web-search-workshop.html' title='The Future of Web Search Workshop'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2524837695746332227</id><published>2008-03-23T17:53:00.000-07:00</published><updated>2008-03-23T17:56:38.450-07:00</updated><title type='text'>Call for Students: CLAM in Google Summer of Code 08</title><content type='html'>(Please help distribute)&lt;br /&gt;&lt;br /&gt;We are glad to announce that 2008 summer is also going to be a Summer of Code for &lt;a href="http://code.google.com/soc/2008/clam/about.html"&gt;CLAM &lt;/a&gt;. In other words, CLAM has been accepted as a mentoring organization for the &lt;a href="http://code.google.com/opensource/gsoc/2008/faqs.html"&gt;Google Summer of Code&lt;/a&gt;, a program that offers student developers stipends of 4500 USD to write code for open source projects.&lt;br /&gt;&lt;br /&gt;CLAM (C++ Library for Audio and Music) is a project that aims at developing a full-featured application framework for Audio and Music Applications. It offers a conceptual metamodel as well as many different tools for that particular domain. One of its most relevant features is the availability of a visual building dataflow application that allows to develop rapid prototypes without writing code. The project started 7 years ago and, among other highlights, it won the ACM award to the Best Open Source Multimedia Software in 2006.&lt;br /&gt;&lt;br /&gt;Now we are looking for smart students who enjoy coding free software so that they can earn some bucks for the summer. Last year, &lt;a href="http://iua-share.upf.edu/wikis/clam/index.php/GSoC_2007"&gt;GSoC 2007&lt;/a&gt; was a very fun and productive experience and we are looking forward to repeat it. Take a look at the CLAM GSoC 2008 &lt;a href="http://iua-share.upf.edu/wikis/clam/index.php/GSoC_2008"&gt;wiki page&lt;/a&gt; for more information on how to apply and some sample &lt;a href="http://iua-share.upf.edu/wikis/clam/index.php/SoC_ideas"&gt;ideas&lt;/a&gt; for projects.&lt;br /&gt;&lt;br /&gt;We are waiting for you!&lt;br /&gt;&lt;br /&gt;Application deadline: March 31&lt;br /&gt;&lt;br /&gt;If you have any question about any of the information below please&lt;br /&gt;contact clam-info@iua.upf.edu or join the #clam channel at FreeNode&lt;br /&gt;IRC.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://clamnews.files.wordpress.com/2008/03/soc-clam-flyer2008.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://clamnews.files.wordpress.com/2008/03/soc-clam-flyer2008.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2524837695746332227?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2524837695746332227/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2524837695746332227' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2524837695746332227'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2524837695746332227'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/03/call-for-students-clam-in-google-summer.html' title='Call for Students: CLAM in Google Summer of Code 08'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3239682113513472659</id><published>2008-02-18T14:03:00.000-08:00</published><updated>2008-02-18T14:16:41.278-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CLAM'/><category scheme='http://www.blogger.com/atom/ns#' term='GSoC'/><title type='text'>CLAM 1.2 released</title><content type='html'>Many things have happened in between our last two releases but we finally managed to pull the 1.2 release together codenamed "the gsocked plugged in release". This release includes all the cool stuff our students from the Google Summer of Code developed. For this reason CLAM was also featured in the &lt;a href="http://googlesummerofcode.blogspot.com/2008/02/clam-12-released.html"&gt;GSoC blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Congratulations to everyone who worked on this release and special thanks to David who did a great job as release manager for this one.&lt;br /&gt;&lt;br /&gt;More news and downloads in &lt;a href="http://clam.iua.upf.edu"&gt;CLAM website&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3239682113513472659?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3239682113513472659/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3239682113513472659' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3239682113513472659'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3239682113513472659'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/02/clam-12-released.html' title='CLAM 1.2 released'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-185493230035571035</id><published>2008-02-12T14:09:00.000-08:00</published><updated>2008-02-18T14:03:17.255-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='object-oriented'/><category scheme='http://www.blogger.com/atom/ns#' term='graph'/><title type='text'>Everything is a Graph (part 2)</title><content type='html'>In a &lt;a href="http://technocalifornia.blogspot.com/2008/01/everything-is-graph-part-1.html"&gt;previous post&lt;/a&gt; I talked about how graphical models of computation are being used beyond the "traditional" areas of networking and low-level system modeling. It may come as a surprise that being this such a rich an useful paradigm it has not become so widely spread as the object-oriented approach. So now I will discuss whether that assessment is true and what might be the reasons.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why don't we have graph-oriented programming languages?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;First of all it is interesting to note that object-oriented itself comes from graph-oriented (or process-oriented) approaches. It is widely accepted that &lt;a href="http://en.wikipedia.org/wiki/Kristen_Nygaard"&gt;Kristen Nygaard&lt;/a&gt;'s &lt;a href="http://en.wikipedia.org/wiki/Simula"&gt;Simula&lt;/a&gt; language was the first OO language to see the light. However, Simula was a (as its name might imply) a simulation language in which the most important concept were the processes. Simula did follow a graph-oriented approach and it was only in their later versions (Simula 67 and later) that the idea of "objects" was explicitly presented. So, in some sense, OO can be seen as a generalization of the graph-oriented approach. As a matter of fact you can indeed understand a graph as a set of interconnected objects called nodes. In the same sense you can read an OO design in a graphical way where classes or objects are nodes and relations become the graph edges. You can read more about this interpretation of OO &lt;a href="http://xavier.amatriain.net/Thesis/html/node20.html"&gt;in my thesis&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Furthermore, at some point in time a few people realized that it would make sense to define a graph-oriented paradigm and design languages to support it. This gave birth to the so called &lt;a href="http://en.wikipedia.org/wiki/Actor_model"&gt;actor-oriented&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Process-oriented_programming"&gt;process-oriented&lt;/a&gt; languages. I am unsure why these languages miserably failed to make it mainstream but I can safely assume that it was a mix of different factors such as bad PR (who would chose an approach called actor-oriented?), bad implementation, and even bad timing.&lt;br /&gt;&lt;br /&gt;However, does that mean that graph-oriented languages have failed as a whole? And my answer to this would be a definite NO. What happens is that the graph-oriented paradigm lends itself much better to a graphical (for based on graphics) a representation. Therefore, graph-oriented languages skipped on or two steps in the logical evolution of a programming language: general purpose textual language -&gt; general purpose graphical notation -&gt; domain specific graphical models. The OO paradigm started producing a large collection of textual languages, then a general purpose graphical notation (UML), and is currently gearing toward domain-specific graphical modeling languages.&lt;br /&gt;&lt;br /&gt;Curiously enough graph-based models jumped directly to the latter and you can find many examples of domain-specific graphical languages that go from some with a broader scope such as &lt;a href="http://www.mathworks.com/products/simulink/"&gt;Simulink&lt;/a&gt; or &lt;a href="http://ptolemy.berkeley.edu/"&gt;Ptolemy&lt;/a&gt; to some that target a more specific domain such as &lt;a href="http://www.clam.iua.upf.edu/"&gt;CLAM&lt;/a&gt;, &lt;a href="http://www.puredata.org/"&gt;Pd&lt;/a&gt;, or &lt;a href="http://gstreamer.freedesktop.org/"&gt;GStreamer&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;And now with OO tending to the same place and tools like &lt;a href="http://www.metacase.com/"&gt;Metaedit&lt;/a&gt; offering ways to quickly develop your graphical DSM we are seeing how the OO and graph-oriented paradigms are finally coming together again.&lt;br /&gt;&lt;br /&gt;So yes, everything is still an object... and almost everything is becoming a graph!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-185493230035571035?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/185493230035571035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=185493230035571035' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/185493230035571035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/185493230035571035'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/02/everything-is-graph-part-2.html' title='Everything is a Graph (part 2)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2134014148222662462</id><published>2008-01-25T16:11:00.000-08:00</published><updated>2008-01-25T16:17:14.119-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='music business'/><title type='text'>Pay-what-you-listen</title><content type='html'>It occurred to me that with all these services (such as &lt;a href="http://www.last.fm/"&gt;lastfm&lt;/a&gt;, &lt;a href="http://www.mystrands.com/"&gt;mystrands&lt;/a&gt;...) tracking what you listen during the day we can finally implement a really fair business model for music: pay-what-you-listen. Imagine you paid a fixed subscription price and you were ensured that this money would go directly to the artist and divided according to how many tracks from that artist you played during that month. I'd sign in for this service today!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2134014148222662462?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2134014148222662462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2134014148222662462' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2134014148222662462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2134014148222662462'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/01/pay-what-you-listen.html' title='Pay-what-you-listen'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-8436191443692342867</id><published>2008-01-24T13:41:00.000-08:00</published><updated>2008-01-24T14:32:17.303-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Social Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='lastfm'/><category scheme='http://www.blogger.com/atom/ns#' term='facebook'/><title type='text'>Why opening APIs is not enough: the Facebook vs. Lastfm case</title><content type='html'>Every day I read of &lt;a href="http://www.fluidinfo.com/terry/2008/01/03/i-just-deactivated-my-facebook-account/"&gt;more people&lt;/a&gt; that are becoming deceived with Facebook and its usuability... I am one of them. The other day I had a depressive experience when I tried to find an interesting message someone had sent over the past few weeks. I could not find the message. Was it a wall posting? or something SuperWall or FunWall? Or maybe simply a note... or a link. Or a message or anything sent with any of the dozens of applications that I &lt;span style="font-weight: bold;"&gt;have&lt;/span&gt; to have.&lt;br /&gt;&lt;br /&gt;Facebook is ok to stay tuned to what your friends are up to. The problem is that in order to do that you are forced to accept the many applications that people end up aggregating. And then it is a complete mess. You have many friends with many different apps and organizing information in a sensible way is impossible so you end up reading things you don't care about and possibly missing interesting information. Opening the API made Facebook big but, will it die of success? Opening things up and hoping that they will organize themselves works in some cases (e.g. the web) but in some others is a recipe for disaster.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.plaxo.com/"&gt;Plaxo&lt;/a&gt; does  a much better job of organizing things and keeping them manageable. However plaso is still too new to guess in which direction it will grow.&lt;br /&gt;&lt;br /&gt;Lately I have been becoming a more intensive &lt;a href="http://www.last.fm/"&gt;lastfm&lt;/a&gt; user. And I can say that it is becoming my favorite social network. Of course I am an absolute music lover and that makes a difference in this case. But that is not the point. The point is that lastfm is a "focused" and manageable app. You sign in because of music and find friends because of music but you can also add your pre-existing friends and laugh at their bad taste :-) Also lastfm has been constantly adding new features like the &lt;a href="http://blog.last.fm/2008/01/23"&gt;recently announced&lt;/a&gt; availability of full tracks for free.&lt;br /&gt;&lt;br /&gt;You can argue that lastfm is simply a tiny part of what facebook is. So what? It is useful, enjoyable and fun, what else can you ask for? Completeness? If you think so I'd recommend you read Barry Schwarzt's &lt;a href="http://www.amazon.com/Paradox-Choice-Why-More-Less/dp/0060005688"&gt;The Paradox of Choice: Why More is Less&lt;/a&gt;, or watch his &lt;a href="http://video.google.com/videoplay?docid=6127548813950043200"&gt;great talk&lt;/a&gt; at Google.&lt;br /&gt;&lt;br /&gt;So instead of a dominating social network like FB aggregating everything I envision dedicated ones (such as lastfm, flixter, mystrands....) becoming more and more popular and aggregating services like Plaxo being used as a common entry into all these different worlds.&lt;br /&gt;&lt;br /&gt;In a sense is a bit like the evolution that took place in Software Framework design. At one point people were trying to build the "one framework for all". Even if that was ever possible, users of such a monster would be unable to understand and use it. The same has happened to all-encompassing metadata standards such as MPEG7, ontologies...&lt;br /&gt;&lt;br /&gt;Bottom-up design has always been better than top down approaches and I believe Social Networks will prove no different in that sense.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-8436191443692342867?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/8436191443692342867/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=8436191443692342867' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8436191443692342867'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/8436191443692342867'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/01/why-opening-apis-is-not-enough-facebook.html' title='Why opening APIs is not enough: the Facebook vs. Lastfm case'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-273379636650770882</id><published>2008-01-22T09:23:00.000-08:00</published><updated>2008-01-22T09:27:31.872-08:00</updated><title type='text'>Telefonica Research: Doctoral Researcher position in Recommender Systems</title><content type='html'>&lt;span style="font-style: italic; font-family: arial;"&gt;(Thought I'd pass this on as it clearly involves me :-)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;The research group on &lt;/span&gt;&lt;a style="font-family: arial;" href="http://research.tid.es/internet"&gt;Internet&lt;/a&gt;&lt;span style="font-family: arial;"&gt; in Telefónica R&amp;amp;D &lt;/span&gt;&lt;span style="font-family: arial;"&gt;Barcelona invites for applications for a &lt;/span&gt;&lt;span style="font-weight: bold; font-family: arial;"&gt;Junior Research position&lt;/span&gt;&lt;span style="font-family: arial;"&gt; in &lt;/span&gt;&lt;span style="font-family: arial;"&gt;the area of &lt;/span&gt;&lt;span style="font-weight: bold; font-family: arial;"&gt;Recommendation Systems&lt;/span&gt;&lt;span style="font-family: arial;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;We are looking for dynamic, creative, and resourceful individuals to &lt;/span&gt;&lt;span style="font-family: arial;"&gt;join our research efforts in modeling of complex systems and networks &lt;/span&gt;&lt;span style="font-family: arial;"&gt;related to recommending engines. Our research impacts all areas of the &lt;/span&gt;&lt;span style="font-family: arial;"&gt;company, including projects related to IPTV or internet content &lt;/span&gt;&lt;span style="font-family: arial;"&gt;distribution. The successful candidate will join a multi-disciplinary &lt;/span&gt;&lt;span style="font-family: arial;"&gt;team of scientists dedicated to advance and use computational methods to &lt;/span&gt;&lt;span style="font-family: arial;"&gt;solve challenging user-oriented problems.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;The applicant should have a Master degree in Computer Science, &lt;/span&gt;&lt;span style="font-family: arial;"&gt;Electrical Engineering, Applied Mathematics, Statistics, or other &lt;/span&gt;&lt;span style="font-family: arial;"&gt;related scientific disciplines, combined with strong computational &lt;/span&gt;&lt;span style="font-family: arial;"&gt;modeling and/or algorithmic skills. Knowledge and experience in &lt;/span&gt;&lt;span style="font-family: arial;"&gt;additional areas such as statistical data analysis, data mining, signal &lt;/span&gt;&lt;span style="font-family: arial;"&gt;processing, machine learning and pattern recognition, and other topics &lt;/span&gt;&lt;span style="font-family: arial;"&gt;in artificial intelligence are desirable.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;The candidate will carry out both theoretical and applied research &lt;/span&gt;&lt;span style="font-family: arial;"&gt;leading to a PhD degree in Computer Science and will actively &lt;/span&gt;&lt;span style="font-family: arial;"&gt;participate in innovative projects related to this area of research in &lt;/span&gt;&lt;span style="font-family: arial;"&gt;the company. Our research group follows an open research model in &lt;/span&gt;&lt;span style="font-family: arial;"&gt;collaboration with universities and other research institutions and &lt;/span&gt;&lt;span style="font-family: arial;"&gt;favor the dissemination of our work both through publications and &lt;/span&gt;&lt;span style="font-family: arial;"&gt;technology transfer. The successful candidate will also be enrolled in a &lt;/span&gt;&lt;span style="font-family: arial;"&gt;local university and will be tutored by a Professor in order to obtain &lt;/span&gt;&lt;span style="font-family: arial;"&gt;the PhD degree. Otherwise particular agreements with the candidate's &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;original institution are also feasible.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;Although this particular position is designed for a doctoral student  &lt;/span&gt;&lt;span style="font-family: arial;"&gt;candidate the group is also actively seeking for postdoctoral candidates &lt;/span&gt;&lt;span style="font-family: arial;"&gt;in this area. If you are in this situation please do not hesitate to apply.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;We offer competitive salary and benefits and a great working atmosphere &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;in beautiful Barcelona (Spain).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;Screening of applications will begin immediately and continue until the &lt;/span&gt;&lt;span style="font-family: arial;"&gt;position is filled. An initial appointment for a two years term is &lt;/span&gt;&lt;span style="font-family: arial;"&gt;anticipated with the possibility of reappointment.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;Inquiries and applications should be sent to&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: arial;"&gt;Xavier Amatriain &lt;&lt;/span&gt;&lt;a style="font-family: arial;" href="mailto:xar@tid.es"&gt;xar@tid.es&lt;/a&gt;&lt;span style="font-family: arial;"&gt;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;&lt;/span&gt;&lt;span style="font-family: arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;with the subject line "RSDoc Application"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;* Telefónica is a world leader in the telecommunication sector, with &lt;/span&gt;&lt;span style="font-family: arial;"&gt;presence in Europe, Africa and Latin America. As of March 2007, &lt;/span&gt;&lt;span style="font-family: arial;"&gt;Telefónica had 206.6 million customers.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;Telefónica Research and Development is the innovation company of the &lt;/span&gt;&lt;span style="font-family: arial;"&gt;Telefónica Group. Owned 100% by Telefónica, this subsidiary was formed &lt;/span&gt;&lt;span style="font-family: arial;"&gt;it 1988, with the aim of strengthening the Group's competitiveness &lt;/span&gt;&lt;span style="font-family: arial;"&gt;through technological innovation.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: arial;"&gt;It is the most important private R&amp;amp;D company in Spain, in terms of both &lt;/span&gt;&lt;span style="font-family: arial;"&gt;activities and resources, and in terms of number of staff, and it is one &lt;/span&gt;&lt;span style="font-family: arial;"&gt;of the most important companies on the continent as regards &lt;/span&gt;&lt;span style="font-family: arial;"&gt;participation in European Research projects.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;&lt;/tt&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-273379636650770882?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/273379636650770882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=273379636650770882' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/273379636650770882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/273379636650770882'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/01/telefonica-research-doctoral-researcher.html' title='Telefonica Research: Doctoral Researcher position in Recommender Systems'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-3029984981876794767</id><published>2008-01-20T14:24:00.000-08:00</published><updated>2008-01-20T15:47:37.396-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='object-oriented'/><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='graph'/><title type='text'>Everything is a graph (part 1)</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/Alan_Kay"&gt;Alan Kay&lt;/a&gt; summarized object-orientation by stating that "everything is an object". Therefore any part of the world that needs to be modeled in a software system can be described in terms of objects and their relations.&lt;br /&gt;&lt;br /&gt;Already in my &lt;a href="http://xavier.amatriain.net/thesis"&gt;thesis&lt;/a&gt; I worked on relating the object-oriented paradigm to graphical (or graph-based) models of computation in the context of signal processing systems. Actually there are many graph-based frameworks and applications in the context of signal processing, multimedia, and related fields. However, lately I have been working with graphical models in many different situations.&lt;br /&gt;&lt;br /&gt;Graphical models are gaining more importance in data mining, for instance, through the use of bayesian belief networks and other &lt;a href="http://www.cs.ubc.ca/%7Emurphyk/Bayes/bayes.html"&gt;graphical models&lt;/a&gt;. On the other hand the study of &lt;a href="http://en.wikipedia.org/wiki/Complex_network"&gt;complex networks&lt;/a&gt; and systems has introduced yet other ways to look at graphs from a statistic perspective.&lt;br /&gt;&lt;br /&gt;So in a sense, graphs can give an us an equivalent yet complementary view to the object-oriented paradigm. Where in OO we have objects in graphs we have nodes, and where in OO we have "relations between objects" in graphical models we have edges. So we can conclude that &lt;span style="font-weight: bold;"&gt;Everything is a Graph&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-3029984981876794767?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/3029984981876794767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=3029984981876794767' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3029984981876794767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/3029984981876794767'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/01/everything-is-graph-part-1.html' title='Everything is a graph (part 1)'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6289654199511383028</id><published>2008-01-02T13:39:00.000-08:00</published><updated>2008-01-02T13:55:28.758-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='radiohead'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><category scheme='http://www.blogger.com/atom/ns#' term='business'/><title type='text'>The Future of the Music Business</title><content type='html'>I just read a couple of very interesting articles in Wired Magazine discussing the Future of the Music Business. In the &lt;a href="http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all"&gt;first one&lt;/a&gt;, David Byrne, concludes that although the "traditional" model for music business is close to over music is much more than that and the future is much more promising for artists who can now chose from up to six different models. In the &lt;a href="http://www.wired.com/entertainment/music/magazine/16-01/ff_yorke?currentPage=all"&gt;second one&lt;/a&gt; Byrne has an interesting conversation with Radiohead's Thom Yorke.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.wired.com/images/article/magazine/1601/ff_yorke2_630.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 226px; height: 221px;" src="http://www.wired.com/images/article/magazine/1601/ff_yorke2_630.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Very interesting reads from two of the most interesting minds in the business, indeed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6289654199511383028?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6289654199511383028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6289654199511383028' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6289654199511383028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6289654199511383028'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2008/01/future-of-music-business.html' title='The Future of the Music Business'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-5389161236321189995</id><published>2007-12-29T15:22:00.000-08:00</published><updated>2007-12-29T16:11:27.173-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iptv'/><category scheme='http://www.blogger.com/atom/ns#' term='imagenio'/><category scheme='http://www.blogger.com/atom/ns#' term='telefonica r+d'/><category scheme='http://www.blogger.com/atom/ns#' term='adsl'/><title type='text'>Cool Projects in Telefonica R&amp;D: the New Generation TV</title><content type='html'>&lt;object height="340" width="150"&gt;In this new series I will introduce some of the cool projects we are working in Telefonica R&amp;amp;D Barcelona. Although I am not working full-time in any of these projects I do have some contact with them and the people in charge.&lt;br /&gt;&lt;br /&gt;&lt;param value="MYFLASH.swf" name="movie"&gt;&lt;br /&gt;&lt;embed src="http://vision.tid.es/felicitaciones/christmas.swf" height="137" width="325"&gt;&lt;/embed&gt;&lt;br /&gt;&lt;br /&gt;The first project I decided to talk about is one of the most eye-catching projects we have. No wonder it usually becomes the center of attraction of the many demo and open show cases we have in our center.&lt;br /&gt;&lt;br /&gt;But before it is important to explain that Telefonica's business not only deals with fixed and mobile phone. For instance, Telefonica owns &lt;a href="http://en.wikipedia.org/wiki/Imagenio"&gt;Imagenio&lt;/a&gt;, an &lt;a href="http://en.wikipedia.org/wiki/Iptv"&gt;IPTV&lt;/a&gt; over ADSL that operates in Spain. As a matter of fact, Imagenio is one of the world's largest operating IPTV networks in the world. Imagenio is streamed directly over standard ADSL with a QoS and it has proved that this technology  is scalable enough to serve large audiences and offer new interactive services like video-on-demand that allows the user to watch the content any time in the following 24 hours and stop/play/rewind at will.&lt;br /&gt;&lt;br /&gt;The project that I am writing about is actually a number of different projects, all of them working towards defining the new generation of interactive TV. On the one hand we are working on improving the audiovisual experience by providing 3D image on autostereoscopic displays and 3D audio. The first pilot is about to be deployed in the following months. On the other hand, there is ongoing research and development on providing personalized content and advertisements by creating user profiles and deploying automatic recommendation systems.&lt;br /&gt;&lt;br /&gt;More innovative and breakthrough ideas are also on the works but I will keep those to ourselves for the time being for obvious reasons :-)&lt;br /&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-5389161236321189995?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/5389161236321189995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=5389161236321189995' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5389161236321189995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/5389161236321189995'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2007/12/cool-projects-in-telefonica-r-new.html' title='Cool Projects in Telefonica R&amp;D: the New Generation TV'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2800582393654628551</id><published>2007-12-22T15:49:00.000-08:00</published><updated>2007-12-22T15:53:44.604-08:00</updated><title type='text'>Imagine 2040: a Christmas Story</title><content type='html'>Last week we had a story writing contest at work. Of course the context was a bit geeky so the story had to be about imagining life in 2040. If you are curious here you have my original submission in &lt;a href="http://xavier.amatriain.net/docs/Infijo.pdf"&gt;spanish&lt;/a&gt; and a quick translation into &lt;a href="http://xavier.amatriain.net/docs/Infix.pdf"&gt;english&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2800582393654628551?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2800582393654628551/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2800582393654628551' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2800582393654628551'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2800582393654628551'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2007/12/imagine-2040-christmas-story.html' title='Imagine 2040: a Christmas Story'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-7919215486747425407</id><published>2007-12-20T15:13:00.000-08:00</published><updated>2007-12-20T15:21:45.313-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='university'/><category scheme='http://www.blogger.com/atom/ns#' term='spin-offs'/><title type='text'>University Research Groups trying to become companies: the MTG/BMAT case</title><content type='html'>When some weeks ago I received an email about some researchers being fired from the &lt;a href="http://www.upf.edu/"&gt;Pompeu Fabra University&lt;/a&gt; the issue smelled really bad but I did not plan on writing about it on my blog. However, after talking to some of them and realizing that they have received no support from the University or their colleagues I feel that I least have the obligation of making this public.&lt;br /&gt;&lt;br /&gt;The five researchers that I mentioned had previously sent an internal letter complaining of the lack of separation between the &lt;a href="http://mtg.upf.edu/"&gt;Music Technology Group&lt;/a&gt;, a research group at a public university, and the &lt;a href="http://www.bmat.com/"&gt;BMAT&lt;/a&gt; company. As a response they were fired from the group and the university and the email one of them then sent to the Chancellor of the university has so far gone unanswered.&lt;br /&gt;&lt;br /&gt;This is the last of many sad episodes related to the creation of this pseudo-company. Among these, the one that has affected me personally most: the mistreatment of the CLAM project. Even though CLAM was successful enough to win an ACM award or get into the Google Summer of Code, for instance, some time ago it was decided that the project was not only not interesting for the MTG group but was actually "dangerous". The people in charge of BMAT/MTG (it's really hard to distinguish which is which) interpreted that it was a direct competition for some of the projects in BMAT. As a result we had to move CLAM, and its two remaining developers, onto a different university group.&lt;br /&gt;&lt;br /&gt;Maybe what these five fired researchers did not know, being all of them non-Spanish, is that these research groups that are half companies and vice versa are common practice here in Spain and I would dare to say in many other places in Europe.&lt;br /&gt;&lt;br /&gt;Let me be clear: I am all in favor of spin-offs and start-ups connected to universities. This should be done by supporting researchers in the institution that feel can create a company by commercializing a particular idea or application that was originally developed in the university. However a very different thing is to give all the public IP related to a public institution to a company (see &lt;a href="http://www.bmat.com/about_us"&gt;here&lt;/a&gt; how BMAT has an exclusive license for &lt;span style="font-weight: bold;"&gt;all&lt;/span&gt; MTG's technologies). Even worse it is to hire researchers in the public institution, with public funds, who are in fact working for the company and only respond to the strategy the company dictates.&lt;br /&gt;&lt;br /&gt;But, to be fair, I cannot blame the people behind BMAT for acting like this. As a matter of fact they have mostly been pushed to doing (or at least allowed to do) what they are doing because the University system in Spain is, as its whole education system, extremely mismanaged. Full-time professors have low salaries (so imagine doctoral or postdoctoral researchers) and the overall investment in R&amp;amp;D is to laugh at (or to cry), see &lt;a href="http://www.forfas.ie/ncc/reports/ncc_annual_05/images/fig148.jpg"&gt;here&lt;/a&gt; and &lt;a href="http://www.forfas.ie/ncc/reports/ncc_annual_05/images/fig150.jpg"&gt;here&lt;/a&gt;. In this situation, they have no other way out that to complement their salaries by getting involved in mostly European Research projects or to create companies that are mostly subsidized. Furthermore, academics are then mostly evaluated on how much money they have brought into the institution, in one way or another.&lt;br /&gt;&lt;br /&gt;So there is no easy way out of this as it involves both governments and companies investing more in R&amp;amp;D. But the least one can expect is for people behind these initiatives and the university to behave under some moral principles. And firing five people because they complain of an unfair situation does not fit under my criteria of morally acceptable behavior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-7919215486747425407?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/7919215486747425407/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=7919215486747425407' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7919215486747425407'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/7919215486747425407'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2007/12/university-research-groups-trying-to.html' title='University Research Groups trying to become companies: the MTG/BMAT case'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2157205173517505400</id><published>2007-12-07T14:19:00.000-08:00</published><updated>2007-12-07T14:30:28.882-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='awk'/><category scheme='http://www.blogger.com/atom/ns#' term='linux'/><category scheme='http://www.blogger.com/atom/ns#' term='sorting'/><category scheme='http://www.blogger.com/atom/ns#' term='data files'/><title type='text'>Sorting rows in text files</title><content type='html'>If you work with scientific data in text files you have probably faced the need to sort rows in a text file following some criteria. If the text file is small enough you can import it as a CSV into a spreadsheet editor and sort it there. But if you have thousands of rows you have to use a script or command line instruction such as *sort* in Linux.&lt;br /&gt;&lt;br /&gt;The problem with &lt;span style="font-style: italic;"&gt;sort&lt;/span&gt; is that it is not easy to get exactly what you want ... and that is what happened to me some days back. I had a file that included some rows like the following:&lt;br /&gt;&lt;br /&gt;1    100    9&lt;br /&gt;1     17     8&lt;br /&gt;&lt;br /&gt;I wanted to sort rows by first column and then second so I tried using commands such as:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-style: italic;"&gt;sort -n -k 1,2&lt;/span&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;Unfortunately this did not work out as it would leave the sample lines in the exact same order. Supposedly the -n switch should make sort interpret columns as numbers but that is not exactly the case. According to sort 17 is greater than 100 simply because it is a 1 followed by a 7, versus a 1 followed by a zero.&lt;br /&gt;Using all other switches did not help much.&lt;br /&gt;&lt;br /&gt;So, in order to get this working I had no other option than to pad with left zeros to get equal length numbers in all columns. For doing that I used awk with its printf functionalities.&lt;br /&gt;&lt;br /&gt;Putting that together in a one-liner yielded the following command that worked like a charm:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center; font-style: italic;"&gt; awk '{printf "%4.0f\t 005.0f \t %1.1f\n",$1,$2,$3}' data.txt&lt;br /&gt;| sort -n -k 1,2 &gt; test&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-2157205173517505400?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/2157205173517505400/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=2157205173517505400' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2157205173517505400'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/2157205173517505400'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2007/12/sorting-rows-in-text-files.html' title='Sorting rows in text files'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-6011919989561717244</id><published>2007-12-02T14:55:00.000-08:00</published><updated>2007-12-02T15:14:13.482-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='telefonica i+d'/><category scheme='http://www.blogger.com/atom/ns#' term='recommender systems'/><title type='text'>Open Research Day at Telefonica R&amp;D</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/R1M5hQZkPoI/AAAAAAAAABU/bHQWAnhbh_s/s1600-R/OpenResearchDay.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_xAtUP4Gu6Zk/R1M5hQZkPoI/AAAAAAAAABU/HwtUjndWiC4/s320/OpenResearchDay.png" alt="" id="BLOGGER_PHOTO_ID_5139514843260534402" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last week we had our first Open Research Day in our newly created research group in Telefonica R&amp;amp;D. It was a great event that had invited speakers from external labs as well as our own presentations and demos. Also we had a pretty good attendance of researchers from mostly local companies and universities.&lt;br /&gt;&lt;br /&gt;The event had also some impact in spanish press as you can read in &lt;a href="http://www.upf.edu/recull/2007/novembre/07112824.pdf"&gt;Expansión&lt;/a&gt;, &lt;a href="http://www.elpais.com/articulo/cataluna/Telefonica/invierte/millones/Cataluna/elpepuespcat/20071128elpcat_8/Tes"&gt;El Pais&lt;/a&gt;, or &lt;a href="http://www.tv3.cat/su/tvc/tvcConditionalAccess.jsp?ALTERNATE_OPEN=YES&amp;amp;ID_BACKUP=&amp;amp;ID=113939&amp;amp;QUALITY=A&amp;amp;FORMAT=WM"&gt;TV3&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I gave a presentation talking about some of our projects in the area of Recommender Systems, including some of the things we are doing in relation to the Netflix prize. You can download the presentation &lt;a href="http://xavier.amatriain.net/docs/RaidersLostStar.pdf"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/17171206-6011919989561717244?l=technocalifornia.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://technocalifornia.blogspot.com/feeds/6011919989561717244/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=17171206&amp;postID=6011919989561717244' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6011919989561717244'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/17171206/posts/default/6011919989561717244'/><link rel='alternate' type='text/html' href='http://technocalifornia.blogspot.com/2007/12/open-research-day-at-telefonica-r.html' title='Open Research Day at Telefonica R&amp;D'/><author><name>Xavier Amatriain</name><uri>http://www.blogger.com/profile/14166119485952054870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/_xAtUP4Gu6Zk/Sif0hGYaOLI/AAAAAAAAAE0/UoOaqOybIp8/S220/xamat6.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xAtUP4Gu6Zk/R1M5hQZkPoI/AAAAAAAAABU/HwtUjndWiC4/s72-c/OpenResearchDay.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-17171206.post-2146863484933412242</id><published>2007-11-26T15:20:00.000-08:00</published><updated>2007-12-02T14:05:25.301-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='netflix prize'/><category scheme='http://www.blogger.com/atom/ns#' term='kdd'/><title type='text'>Netflix Prize loosing its charm?</title><content type='html'>About a year ago &lt;a href="http://www.netflix.com/"&gt;Netflix&lt;/a&gt; announced that it would give $1M to whoever could improve their existing recommendation engine by a 10%. That announcement raised a huge expectation among hackers and the scientific community. Thousands of researchers/developers jumped on the &lt;a href="http://www.netflixprize.com/"&gt;challenge&lt;/a&gt; and started working on it.&lt;br /&gt;&lt;br /&gt;The original system had an RMSE (Root mean squared error) of 0.95, which pretty quickly was improved to below 0.9. One of the people that most contributed to this first improvements was &lt;a href="http://www.google.es/url?sa=t&amp;amp;ct=res&amp;amp;cd=2&amp;amp;url=http%3A%2F%2Fwww.sigkdd.org%2Fexplorations%2Fissues%2F9-1-2007-06%2Fsimon-funk-explorations.pdf&amp;amp;ei=6SdTR5PCOYGkwgGiyaicDg&amp;amp;usg=AFQjCNGzFsdJLZuWSfi5TrrAvPE2pw4pJg&amp;amp;sig2=nnkE-LGH2QJ-nv57sClhmw"&gt;Simon Funk&lt;/a&gt;, a cool developer who implemented a &lt;a href="http://sifter.org/%7Esimon/journal/20061211.html"&gt;variation&lt;/a&gt; over the "traditional" Singular Value Decomposition matrix factorization algorithm. He published his results and code and led the way for other people to reuse his approach and enhance it, very few leaders (if any) since then have done so without adding at least partially Simon's solution. The graph below (borrowed from &lt;a href="http://whimsley.typepad.com/whimsley/2007/07/the-limitations.html"&gt;Tom Slee's blog&lt;/a&gt;) shows the improvement of the RMSE along time:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://docs.google.com/File?id=dfx7r5rf_84dr8gwqgt"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 409px; height: 273px;" src="http://docs.google.com/File?id=dfx7r5rf_84dr8gwqgt" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;One of the rules of the competition were that every year a progress prize of $50K would be given to the leader team until the final improvement was reached. A couple of weeks ago Netflix &lt;a href="http://www.netflixprize.com//community/viewtopic.php?id=799"&gt;announced&lt;/a&gt; that the Progress Prize this year was going to a team made up of &lt;a href="http://www.research.att.com/%7Evolinsky/netflix/"&gt;two researchers from AT&amp;amp;T&lt;/a&gt;. Also as part of the prize rules they had to &lt;a href="http://www.research.att.com/%7Evolinsky/netflix/ProgressPrize2007BellKorSolution.pdf"&gt;uncover&lt;/a&gt; how they are working on the solution.&lt;br /&gt;&lt;a href="http://www.research.att.com/%7Evolinsky/netflix/"&gt;&lt;/a&gt;&lt;br /&gt;To summarize their approach is easy: they are using a combination of 107 individual predictors!! (at this point I am already wondering how many interns you need to implement 107 predictors :-)&lt;br /&gt;&lt;br /&gt;As part of the KDD Cup this year leader teams also &lt;a href="http://www.cs.uic.edu/%7Eliub/KDD-cup-2007/proceedings.html"&gt;partially revealed&lt;/a&gt; many of their solutions.&lt;br /&gt;&lt;a href="http://www.research.att.com/%7Evolinsky/netflix/ProgressPrize2007BellKorSolution.pdf"&gt;
