A GQL query cannot perform a SQL-like "join" query, how should i work around this?
> > > his users HAVE_MANY friends, so there's some way to do joins
> > This is not necessarily a join, an entity can contain a list of keys
> > to other entities.
> > BigTable has some join capability so I suspect they will eventually
> > introduce something.
You are looking for read time functionality. Everything about how
google works is trading request time for disk space. Push you effort
into pre-computing things at write time and you will be going with the grain of BigTable.
>> In the previous example of:
>> Select * from Person,Contact > Where Person.ContactID = Contact.ID
>> What exactly would you do?
Merge the concepts of Person and Contact, for starters.
In the dim dark past when relational databases came to the fore, disk was expensive. So we made sure to slice and dice things such that there was no wasted space. Thus instead of optional fields, you created seperate tables such that the optional fields could be pulled in using a join.
In this new world of disk space being free, merge these previously split concepts such that the optional fields are in the main object. Thus the reason why you keep seeing denormalisation being bandied about this group as a tactic for dealing with BigTable. Make few, large entities with optional fields, instead of lots of small entities.
This is the same lesson we had to learn with RPC. With normal procedure calls, having lots of small calls with a few parameters made sense, because stack space needed to be conserved, and the latency of a local call is almost nothing. With RPC, each individual call is expensive, both in computational terms at each end for serialisation and deserialisation and also in raw network latency. Suddenly you had to change the shape of the functions. Instead of lots of little calls, you suddenly had a few calls that returned lots of data. Cheaper to return a copy of the world than make fifty calls to get the small part of the world you were interested in.
> Select * from Person,Contact
> Where Contact.PersonID = Person.ID
To fit with this new model, you have to give up normalization for ease of scalability. This can be tricky in some circumstances, but surely there's no need for a separate. Contact table/model in this case? This seems like a trivial "denormalization" to fit the BigTables model.
To read more about the BigTables technology, the whitepaper discusses the theory and practice nicely [1].
No comments:
Post a Comment