Guy Fieri wants your help! For a TV show on statistical models for real estate

I got the following email from David Mulholland:

I’m a producer at Citizen Pictures where we produce Food Network’s “Diners, Dives and Drive-Ins” and Bravo’s digital series, “Going Off The Menu,” among others. A major network is working with us to develop a show that pits “data” against a traditional real estate agent to see who can find a home buyer the best house for them. In this show, both the real estate agent and the data team each choose two properties using their very different methods. The show will ask the question: “Who will do a better job of figuring out what the client wants, ‘data’ or the traditional real estate agent?”

TV and real estate are two topics I know nothing about, so I pointed Mulholland to some Finnish dudes who do sophisticated statistical modeling of the housing market. They didn’t think it was such a good fit for them, with Janne Sinkkonen remarking that “Models are good at finding trends and averages from large, geographically or temporally sparse data. The richness of a single case, seen on the spot, is much better evaluated by a human.”

That makes sense, but it is also possible that a computer-assisted human can do better than a human alone. Say you have a model that gives quick price estimates for every house. Those estimates are sitting on the computer. A human then goes to house X and assesses its value at, say, $350,000. The human then looks up and sees that the computer gave an assessment, based on some fitted algorithm, of $420,000. What does the human conclude? Not necessarily that the computer is wrong; rather, at this point the human can introspect and consider why the computer estimate is so far off. What features of the house make it so much less valuable than the computer “thinks”? Perhaps some features not incorporated into the computer’s model, for example the state of the interior of the house, or a bad paint job and unkempt property, or something about the location that had not been in the model. This sort of juxtaposition can be valuable.

That said, I still know nothing about real estate or about what makes good TV, so I offered to post Mulholland’s question here. He said sure, and added:

I’m particularly delighted to hear your analysis of a “computer-assisted human” as that is a direction we have been investigating. Simply put, we do not have the resources to implement any sort of fully computerized solution. I think the computer-assisted human is definitely a direction we would take.

I’d love to hear the thoughts of blog readers. At the moment, the big question we are considering is, “Assuming that we have full access to a users data (with the user’s cooperation of course . . . data example include Facebook, web browser history, online shopping history, geotracking, etc), how can we use human and computer to best sort through this data to find the house the user will like the most?”

Ball’s in your court now, TV-savvy blog readers!

21 thoughts on “Guy Fieri wants your help! For a TV show on statistical models for real estate

  1. It might be interesting to hear from people who’ve been real estate agents for a long time. My impression is that a modern real estate agent depends heavily on what we might call exploratory data analysis — running filters on the database, looking for outliers, etc. In that sense, they are already computer assisted humans.

    Similarly, the modern home buyer has probably spent most of their initial search time on Zillow to narrow down the candidates.

    In fact, you could run this as a three-sided competition: (1) prospective home buyer searching the internet for candidates, (2) real estate agent, (3) statistical model. I think people always wonder how much value the real estate agent adds.

    I wouldn’t bet on the statistical model here for the reason cited (prediction of a single case). Even with an obvious anchor (assessed valuation for tax purposes) Zillow and Trulia often differ wildly in valuation. I looked up a property my child is interested in yesterday. Zillow values it at $382k, Trulia at $169k.

    • I wonder if you could use these really large discrepancies to reverse engineer how each website is weighting various amenities.

      On a related note, given enough input/output from a single agent, you could build a personalized model. Call it the Trusted Agent Score (TM) or something. Buyers pick agents that are most similar to them, and then the Trusted Agent Score can guide you to houses that have favorable feature/price trade offs.

    • From my past experience in sales it is “modern real estate agent depends heavily on what we might call exploratory data analysis” but of the client’s cognitive space – what do they value and what is likely to satisfy rather than optimize that they can then search for.

      And unfortunately my many hours having to sort of watch “Love it or leave it” and “Property brothers” given my wife’s addiction to them – what is initially shown to folks primary functions to get a better sense of the client’s cognitive space.

  2. I actually worked on something similar for my undergrad thesis in the agent-based modeling group at the MIT Media Lab. It’s a super interesting problem, and there are a number of different ways to approach it. One is some version of collaborative filtering- i.e. the guts of, say, an Amazon or Netflix recommendation engine, at least approximately speaking. In this approach, you would need to see not only data for the show subject but also for other people who own properties (and how much they like them) in order to match on, but you could construct something of the form “people like you like houses like these” and then go find such a property.

    Another approach could be something like what, say, OkCupid does by asking a bunch of questions and having the user give an answer and an importance rating. This information could then be matched against property characteristics to find the one that generates the highest score. This approach could even be combined with a collaborative filtering algorithm, and it could be a fun thing for the show to have viewers submit information themselves to help fill out the overall dataset. One thing that is crucial to the success of the mathematical approach is the amount of information available about the properties- I am confident that a rich enough dataset could make the automated approach very powerful, but not all of the attributes that would be relevant are generally collected in an organized fashion. (One potentially interesting downfall, on the other hand, is the possibility that people aren’t good at being self aware enough to know what they want before they see it!)

    Overall, this idea seems to be the moneyball of real estate, which could be super fun! I remember my experience with a realtor, and, while I got something that I think was pretty optimal for me in the end, I was definitely struck by how little the basic information given about a property (square feet, bedrooms, bathrooms, etc.) correlated with whether I liked it or not. Maybe it’s just me- living in Boston, I’ve gotten sort of obsessed with real estate markets and trying to identify good deals. I’d be happy to chat about this further, just drop me an email.

    In related news, you might find this (related) study interesting: http://www.nytimes.com/2005/02/20/business/yourmoney/why-a-real-estate-agent-may-skip-the-extra-mile.html?_r=0

    • I think there is a great idea right there: Involve the audience in some way. One would be like Jodi says, by asking them to fill a survey that can be used for making a prediction based on the demographics/ psychographics of the home buyer. Another is using real-time voting to ask the audience to vote what home out of three choices is the best fit for the prospect buyer featured in the show. Then weigh those votes according to some matching between the registered voters and the prospect. If this is not feasible because the show needs to be pre-recorded, the show could be a three-side competition. The real-estate agent using only intuition, a computer using simple machine learning based on a demographic database, etc and a panel of non-experts, who voice their opinions and those opinions are aggregated using some clever weighting system. And see who wins.

  3. The vast majority of real estate shows (e.g. House Hunters) actually showcase individuals who already bought a house, and then the producers fabricate a scenario for them. After all, it’s not a good show if the buyer does choose any of the options! Not to mention the wasted production costs if the buyer does not actually buy a house

    More than likely, this show will be rigged in a similar fashion to ensure the buyer always buys something. So the data team will win half the time and the agent will win half the time.

  4. Hasn’t computer-assisted-human play caused a mini-revolution in Chess recently with a lot of superbly creative games that apparently neither computer nor human playing alone resulted in?

    PS. I suck at chess, so I may be spouting BS. I think I read about it.

    • It may be chess, but you may have read about the five game Go match this year between Lee Sedol and Google’s AlphaGo artificial intelligence. Lee Sedol lost the first three games, but he modified his strategy in unexpected ways as he learned and won the fourth game. Game 5 was extremely close. The AI had its own stumbles as Lee modified his strategy. So a TV show about the question of whether the “data” or the realtor identifies the best house could turn into a story that includes unanticipated twists in the trade-offs that inevitably occur in the buying process. (“We never thought we would want that town, that school, a swimming pool, living next to a kennel” etc.)

  5. >>>The richness of a single case, seen on the spot, is much better evaluated by a human.”<<<

    Doesn't Netflix still do darn well? Or do we just never know how much better a human-reccomender would have done?

    • Netflix depends on past choices by this person. In real estate, you aren’t likely to have a lot of past choices by this person. And the reason for their interest in another house may be precisely because their previous choices aren’t relevant (e.g. have children now).

    • My experience is that humans are **much** better at recommending movie I will like than Netflix is, despite all the progress they have made in this problem.

      • There’s probably a difference between a human who knows you well (so takes your preferences into account) and a human who bases their recommendations on their preferences rather than taking yours into account.

  6. So is a traditional real estate agent really about “figuring out what the client wants”?

    Or is it more about convincing the client that the house you’d like to sell is the one that the client really wants.

    i.e. Is a typical real estate agent more about elucidating preferences or manipulating preferences?

  7. Assessors use a few different statistical models to set home values. In some places, they actually do inspection sampling to test their models.

    In terms of “traditional” agent, they represent the seller, though it appears otherwise on TV and are paid by the seller in all but a few cases. Their job is to try to show you homes you may want and to guide you through the pricing. That advice is sometimes useful, sometimes not because some markets are straightforward while others may set prices low in the expectation of multiple bids that will drive the price past what would have been the straightforward pricing choice. Anyone with half a brain can use the MLS services and various apps to find houses that fit. They may still need help bidding.

  8. OR…you could do some sort of conjoint analysis as is used in marketing (basically a series of “would you rather” type questions), since that would create a visual that lends itself nicely to television. Ok, I’ve thought about this too much.

  9. All kinds of problems with analogies to existing models. Re Netflix type systems, not only is there little track record on the buyer’s preferences, there is also sparse data on the home’s track record. Re dating systems, the house is not “looking’ for a buyer in any comparable sense. All that said, it would be possible, I’m sure, to do a decent job of predicting a new buyer’s search behavior given similar customers online. A zillow or realtor.com should be able to predict (and presumably does) where I am going next given where I’ve been and who I am after just a few minutes on the site. And it should know roughly when and how to promote a property to a person engaging in a search.

    As a TV show, though, I agree that it is constrained. Let’s suppose that by some metric the computer beats the human by a 60-40 split. Makes a cool conference presentation after N = 100. But how do you “reveal” this pattern by cheering on a series of single events?

  10. I don’t have them handy but I think there are studies demonstrating that real estate agents have definite biases regarding price advice. Perhaps a good regression model would be more accurate in predicting actual selling price. That’s one aspect where an unbiased model might well outperform biased human “expertise”.

    • The seller’s agent seems to be incentivized to trade price for speed of sale. That is, they can pitch a high suggested price to the seller to get their business. Then, when they become the official seller agent their incentive is to allow the price to drop. If price goes down 10%, they only lose 10% of their 3%, an amount that is likely to be marginal when balanced with multiple transactions and weighed against time. The seller, though, may only have a 30% equity stake in the property, and so the 10% price reduction is actually a 33% haircut to their profit. It’s impossible to believe that the seller agent is equally sensitive to drops in price.

      Doesn’t have anything really to do with the TV show idea. Just something on my mind lately.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *