Is future keyworded ?

Introduction

That’s curious. After many years of study, computer scientists decided to structure data and transform data into information reading these from common data structures, tables or whatever. Of course I’m just making things simpler here just to get the point of this post, but data structure is a huge field of study in Computer Science and is a key part of any system. The way you design the system is always coupled with the way you will persist, model and query data.

Well, that said, the fact is that today we are dealing with a “revolution” about the way we’re dealing with data. The way we’re doing persistence is changing. The way we are storing data is changing. The relational concept that everyone thought was “good enough” for database design, is changing too. And all these changes are everywhere.

Analysis

These changes are not limited to databases, but it’s also affecting some other aspects of software development. (Agile methodology ?)

For example:

  • Web Services: JSON x XML (XML Schema, XSLT, etc) or REST x SOAP style
  • Databases: CouchDB x Neo4J x RDBMS

In these cases we can clearly see that after years using these technologies for our systems, today with everything going “Web X.0” or going “SOA enabled” (or any of these “new wave techs” we hear everywhere), fact is: things had to change.

These 2 new “databases engines”: CouchDB and Neo4J are very promising and they’re already making their space. And remember that CouchDB is just at Apache Incubator yet… It’s a great important asset for NoSQL enthusiasts. In the other example, JSON/REST web services are growing up faster too.

But the point here is, aren’t we missing anything ?! I mean, is it really that simple to create a new instance of a database to support a high volume system ?! Is it really nice to forget about data types and try to make everything with keywords, like JSON and strings ? All these dynamic languages like Ruby, Python, Groovy are hidding complexity for development and to create a web system with data access, CRUD operations and some Ajax you don’t need to spend more than 15 minutes, according to many videos you find over the web teaching “how to develop” with these languages. This is really good improvement on software development, since things become more clear and easy to get done. But, don’t you like to create something yourself and not delegate all this/that responsibility to some crazy framework/language that you’ve learned yesterday ? I really like Python, Ruby and Groovy, but sometimes things get done so “magically” that I’m a feeling that I’m missing something. There are some counter-points that must have be considered on all these new generation technologies that most of the “them” (I’m still thinking a name for them) don’t see.
I’ll make a quick list from the top of my head, and probably make some mistakes since I’m not an expert on any of these technologies, but comments are welcome discussing the concepts.

  • Data structure: Relational databases are still necessary. Loosely coupled, non-relation data are great for high-traffic websites, but commercial systems will store data in a relational manner, one day or other, this is the destiny of most of the data. Of course it’s an elegant solution use these NoSQL (or “NoREL”) databases as a front-end to collect all the data and then parse,validate e consolidate in a relational base. But, IMHO there are people selling this as the most revolutionary… Thanks God, there are really good people involved on the NoSQL movement and they say the same, they don’t want to take place on the relational database world, the idea is to have more one architectural option for some situations.
  • Web Services: Sending data without the well-known XML overhead is awesome, fast and very good. But you might not forget that some backend validation will still be necessary and sometimes, you’ll save a lot of time doing your RESTful interface with JSON, but will spend a good time designing mechanisms to convert and validate this into other structures. Validation is my main concern here since by definition, an SOAP/XML service, designed based on an WSDL/XSD, already offer a good validation mechanism since these specs were built with that in mind. My advise here is take a deep look into your requirements and if you decide by using a web service without contracts (WSDL) or XML as an input, if you really need that performance and have available time to create the validation on the backend.

Conclusion

This post started by the end of 2008 and I was only able to finish it today. Probably some ideas here have changed, but the main message is still the same: All these new technologies are great allies when designing a new system, but it’s not the solution for your life. Humanity history is a cycle and we always return to the same point and the samething is happening with IT. We evolved to relational data for a reason, to structured data for another reason and are getting away from that now without a clear reason.

One thought on “Is future keyworded ?

  1. Também não sou especialista nessas tecnologias (ou qualquer outra, na verdade), mas pelo menos do ponto de vista de base de dados, essas novas abordagens pretendem resolver problemas diferentes.

    Acho que 99% de nós continuaremos usando RDBMS, pois vivemos num mundo imediatista, altamente controlado – não dá pra imaginar a persistência de movimentação de uma conta corrente, por exemplo, sendo feita no esquema “eventualmente consistente” de um mecanismo NoSQL. Mas no Twitter, que adotou essa abordagem, não tem problema as novas mensagens dos seus amigos demorarem alguns segundos para aparecer.
    Cada abordagem tem mais comprometimento com alguns aspectos, e menos com outros (CAP: ACID vs. BASE).

    Voltando no primeiro parágrafo, não coloco aspas no “novas” por que realmente o são: apesar de parecer uma regressão para dados não estruturados, não podemos comparar esse estado atual (dados em memória, distribuidos entre servidores, com alta disponibilidade) com o que tinhamos 20 anos atrás, antes dos bancos relacionais dominarem – arquivos sequenciais, sem indice, guardados em um rolo de fita magnética que devia ser encontrado e montado em um leitor pra que o acesso pudesse acontecer…

    Sobre REST/JSON vs SOAP/WSDL, mais dificil achar uma linha divisoria. Já usei ambos de forma contrária ao que vc colocou: o primeiro trafegando estruturas de dados (e não somente “property bags”), e quase nunca usando a “validação” de WSDL/XSD – essa sim entre aspas, por ser um recurso bem pobre.
    Talvez (me) falte fazer um exercício mental (ou conhecer alguém da Caelum) pra ver como poderia encaixar essa abordagem na definição de um sistema corporativo de grande porte, mas acho que a diferença principal aqui não é tanto o tráfego da informação (ou sua estrutura), mas sim se existe necessidade de mecanismos mais complexos como os que WS-* te dão (ordenação, addressing, transação, etc.).

    A validação de payloads teve uma ótima abordagem externa com o Schematron (que foi absorvido ela ISO/IEC). Não pesquisei, mas se não existe algo semelhante pra JSON, que tal criarmos algo? Podemos até virar um projeto incubado pela Apache :-)

    Ia escrever em inglês, deu preguiça, sorry.

    Abraço!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s