Facebook recently changed the way they gather data from submitted links via user ‘status updates’. At this point the new engine, otherwise known as ‘facebookexternalhit’ is responsible for quickly grabbing the first 49 words of the page you submit, along with any relevant pictures or video. Its very quick considering the engine can gather data in less then 3 seconds and neatly format the text for your update.
Now if you were to look at the image below – something doesn’t seem quite right with the way this data was cached. Exactly 2 seconds after I went to externally view a page – the fbexternalhit spider had cached the exact same data as I was looking at. Although its quite possible this could be a coincidence and in fact somebody had re-posted this page on facebook. But it almost seems as if FB is somehow monitoring or watching/caching the data of sites I look at, even though I’m not logged into my account. Its smart if it is true, but what is the value of knowing where I search when you have a log of my facebook activity which I would assume is way more valuable.
Just thinking out loud here but definitely an interesting concept and its kind of funny if FB decided to cut out all of the third party cookies and tracking devices and started to gather the data themselves
. I guess it is indded true, you can never have enough information about your users… Right Google?
Good stuff!!!
[Update] Since the facebookexternalhit engine only loads one page and stays for less than 1 second, you will not see the referrer in your google live analytics console. Honestly, it would be very interesting to see via analytics how many times one of your URL’s are submitted. Note, you do see the referring site if somebody does click on the users status via Facebook (and stays for more than 5 seconds).

