Facebook recently changed the way they gather data from submitted links via user ‘status updates’. At this point the new engine, otherwise known as ‘facebookexternalhit’ is responsible for quickly grabbing the first 49 words of the page you submit, along with any relevant pictures or video. Its very quick considering the engine can gather data in less then 3 seconds and neatly format the text for your update.

Now if you were to look at the image below – something doesn’t seem quite right with the way this data was cached. Exactly 2 seconds after I went to externally view a page – the fbexternalhit spider had cached the exact same data as I was looking at. Although its quite possible this could be a coincidence and in fact somebody had re-posted this page on facebook. But it almost seems as if FB is somehow monitoring or watching/caching the data of sites I look at, even though I’m not logged into my account. Its smart if it is true, but what is the value of knowing where I search when you have a log of my facebook activity which I would assume is way more valuable.

Just thinking out loud here but definitely an interesting concept and its kind of funny if FB decided to cut out all of the third party cookies and tracking devices and started to gather the data themselves :) . I guess it is indded true, you can never have enough information about your users… Right Google?

Good stuff!!!

[Update] Since the facebookexternalhit engine only loads one page and stays for less than 1 second, you will not see the referrer in your google live analytics console. Honestly, it would be very interesting to see via analytics how many times one of your URL’s are submitted. Note, you do see the referring site if somebody does click on the users status via Facebook (and stays for more than 5 seconds).

Sep 252011
 

Its been quite some time since I’ve seen something like this one. Analytics are compleely amazing and web trending has come very far since the first attempts at user statistical analysis back in the early 90′s and when most people only cared about ‘hit counters’.

Every now and then one visitor sparks my interest and this one wins. Northcom.mil has a spider which has either been completely updated or is NEW and is not following the criteria for web bots/spiders to identify themselfes uppon site attachment.  You know.. Its still cool with me.

Honestly, government entities need to be more recursive with site caching and reporting, so I’m a big fan of seeing this. They also need to become less and less reliant on Google and Yahoo to provide them with their data. I mean – the US government has an unlimited budget right?

You can see in the next image this specific spider [unknown] has spend roughly 15 seconds caching data on this site. But whats neat and is they started at /feed/ and then did a backward recursive search [newest to oldest] in the same manner as google does.

 

© 2012 random technology [RT] technology documentation

Optimized by SEO Ultimate