During November 17th and 18th we attended the fifth edition of Big Data Spain, the spanish biggest conference related to big data, organized by Paradigma. This event serves as a meeting point and exchange of knowledge of innovative technologies.
Óscar Méndez, CEO of Stratio, opened the conference and explained how Big Data Spain has grown over the years. It began in 2012 with 220 attendees and 14 talks and this year they had 1100 attendees and 65 talks and workshops.
First round: Between market lessons and trends
After welcome, it was the turn of the Leader of the O’Reilly Learning team, Paco Nathan. On his keynote, Nathan not just highlighted the efforts of his company spreading the knowledge and trends in distributed systems, machine learning, predictive modeling and cloud computing. He also showed examples, facts and caveats about the impact of Artificial Intelligence for people immersed in Data Science, Machine Learning, Distributed Systems, Cloud technologies or DevOps Practice during next years. Moreover, quoting to Pedro Domingos, author of «The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World», Nathan believes that future belongs to those who understand at a very deep level how to combine their unique (and human) expertise with what algorithms do best.
Ignacio Bernal, Global Head of Architecture and IT Innovation at BBVA, gave an interesting talk about the need of a complete digital transformation in BBVA. “We must change our mission and our relationship with the customers”. The bank solutions is a global cloud platform, developed by different startups as Datio, Beeva, Iventia, etc. For this digital transformation, BBVA need a new technology stack in which the greatest developers are ownership of their work.
And the first conference block of the day finished with the talk of Alan Gates, Co-Founder of Hortonworks, who explained how his company, using Apache Hadoop and many tools (such as Hive, Spark, Kafka, NiFi, Storm) and projects (as Ranger and Apache Atlas) is responding to market trends in data processing, querying and what changes they are pushing into the Hadoop ecosystem and optimizing themselves for the cloud environment.
The human brain is still the key
In the talk Scaling data engineering, Michael Hausenblas exposed the need to manage data from multiple business areas that increase year by year. This translates into opportunities and challenges: real-time data, variety of sources, etc.
It also concluded the need to unify batch and current processing technologies (spark, storm, hadoop, flink, samza, etc.)
The key point of the talk was to highlight the presence of technologies that offer an agnostic PaaS of the underlying hardware and of the frames that run in it.
Stefan Kolmar talked about polyglot architectures and their need in today’s technological environments.
Nowadays it is not normal to find a solution based on a single database technology, but the combination of several database technologies usually give better results.
The reality is that any technology platform faces large amounts of data, the need for high availability in accesses, structured data to take advantage of connections, the ability to connect data on demand, the flexibility of connecting data with different schemes .
Relational databases have advantages in their use but also have problems: the need to know the business by users, the difficulty to establish complex relationships between data, the need to optimize queries.
The most important moment of the talk came when he asked about what was the most powerful database on the market … the answer was the human brain because of the ability to establish relationships between concepts and data.
From there he went on to expose the benefits of graphically oriented databases. The concepts of nodes / relationships and the improvement they offer to better detail the relationships between nodes.
The advantages are multiple: the most efficient operations, flexibility, etc. As well as their use cases: fraud detection, sending of packages or engines of recommendation.
Chris Fregly introduced the technology developed by his company pipeline.io, based on a model of deployment and testing for machine learning, where model training is optimized. For this, it is carried out a continuous training, as incremental or partial.
Another advantage of this technology stack is its ability to run both on cloud and on premise.
AI, machine learning, real time, cloud and more about data analytics
The talk Prepping your analytics organization for Artificial Intelligence by Ramkumar Ravichandran was quite enjoyable. He tried to present artificial intelligence as a positive element for society, making references to science fiction films, which gave a negative and toxic image.
The difference between artificial and analytical intelligence was exposed.
Similarly, emphasis was placed on the fact that not all problems of Artificial Intelligence nor Artificial Intelligence are valid for all problems.
As an example, three levels of artificial intelligence were put in which the actual use was compared with the use that it has given in the films:
- Narrow artificial intelligence: a specific task. Movies: assassins drones. Real life: google seo.
- General artificial intelligence: behavior similar to a human. Movies: terminator. Real life: autonomous cars.
- Superior artificial intelligence: behavior superior to what a human can do. Movies: skynet. Real life: google now.
At AI with the Machine Learning Canvas, Louis Dorard spoke of a framework in charge of connecting data collections, machine learning and value creation. In the context of data and artificial intelligence, a canvas or framework can be very useful to describe the current learning that takes place in intelligent systems:
- About what data are we learning?
- How are we using predictions powered by that learning?
- How are we sure that the set works well throughout the cycle?
During his second speech Computable Content with Jupyter, Docker, Mesos, Paco Nathan focused on the Jupyter environment (and little on Docker or Mesos). Specifically in the training part, he showed three online sites where learning is facilitated:
- https://www.safaribooksonline.com/oriole/ (an initiative where Paco Nathan has participated in person)
Returning to Jupyter, Nathan described it as a web interface that improves the notebook alternative, and argued several reasons: code and video synchronized, a docker container running in the cloud for each web session or the views 100% HTML.
Following a tutorial style, in Chris Fregly’s talk Building a Complete, End-to-End Batch and Real-time Recommendation Engine, the PipelineIO Research Scientist performed a live demonstration about how a real-time recommendation system works. In fact, attendees were asked to participate by choosing three Actors and actresses from a web http://demo.pipeline.io, which collected the requests and offered similar proposals depending on the individual selection.
Through Big data in 140 characters, Joe Rice, Data Sales & Partnerships EMEA at Twitter, showed the relationship between his company from the beginning and the entire technology stack that touches every aspect of our lives called big data.
The networking triumph
As every year, the Big Data Spain generates a high expectation and even overrating due to the exhibition and announcements, but the true is that for many others the best of this event, is the networking in public areas, such as an interaction between peers whose exposed every day their opinions about talks and technology trends.
Finally, this edition of BDS was also very enriching for everyone in Datio, because we lived it for the very first time not only as participants in the talks, but also as sponsors of the exhibition, sharing stand with BBVA Data & Analytics, i4s and Beeva. And there we’ll be next year as the big data will be increasing and Datio will continue to develop in this growing ecosystem. See you at #BDS17!