Codaholic

Does the dog have Turing nature?

Social Hash as Identity

(This is a “brain-dump” post).

Many write-ups and studies on social networks and privacy issues talk about the risk of using one’s personal information disclosed within social networks for identity theft. That made me think about whether it is really possible for an identity thief to steal my entire social environment and history or whether such theft is simply an indication of how broken the current paper-world identity systems are in the digital age?

Enter the “social hash” – a short identification string that can be generated from the data in your social network’s history that is very highly likely to be unique to you and very difficult to duplicate in a “prove your identity” challenge.

Let us consider a person’s “social history” (say on Facebook for simplicity, but the same concept will also apply to google+) – i.e. The linear sequence of events related to the establishment and termination of all my connections with other people on the network. Each event is one of –

joined on DATETIME at GPSLOC
added/removed PERSON to/from CIRCLE on DATETIME at GPSLOC
checkedin at GPSLOC on DATETIME

A stream of such events is guaranteed to be linear in time since they are all actions taken by one person … with no two events having the same time time stamp even at the 1 second resolution. A strong hash of this kind of “history string” therefore captures “me” in terms of my social network history. Authorities needing to verify my identity can request for a “recent ID” that they can then verify using a back channel into the social network doubling as an identity service.

Now, what should we use in the PERSON field for an event? It seems obvious that it has to be that person’s ID hash at that particular event’s DATETIME when the event happened. With this scheme, it seems to me that the only way an attacker can steal your identity is to actually login as you – i.e. only by hacking into your account. If the trusted social network and id provider (do I hear a “yeah, right”?) uses a multiple-point verification system prior to revealing a recent social snapshot hash of you to yourself (ex: SMS one time passcode and digipass like banks do now), an attacker might have reasonable difficulty in producing your ID string even if s/he has your account hacked. At each such point at which your ID is verified successfully by an agency, a new verification event can also be tacked on to your running hash similar to a checkin. Also, identity verification can be conducted without actually having anyone peek into your facebook/g+ account since they will only need to check hash equality. Existing social networks will also be able to generate such an ID for you retrospectively as well.

Do you think such a “social snapshot” as an ID can make identity theft difficult? There are lots of corner cases and improvements that can be done on the basic scheme described above, but my question is only that of viability of the scheme.

Computing the hash at any given time would not require the whole history string to be available. I described the hash as though it were computed on one long string for simplicity, but we only need to compute a recursive hash – i.e. when a new event happens, a new hash is generated by hashing together the most recent hash and the new event record. This way, you don’t have to keep a long running string anywhere in the social network provider’s backend.

If the provider’s user interface and APIs never reveal time information to a sub-second accuracy and a millisecond accurate time stamp is used in the hash, this can provide some additional difficulty in reproducing the hash as well.

(PS: My original google plus post on this is here.)