New times are marked by the sign of the digital age, globalization and the huge amount of mass data generated daily. Big Data is rigged to great challenges and better opportunities.
Beyond the famous 5 Vs that characterize it (volume, velocity, variety, veracity and value), Big Data have great possibilities in the most unimaginable fields. And it does, especially, because new technologies have emerged to respond, with unprecedented efficiency, to the needs of storage and analysis of big data.
There are many technologies and concepts that are part of this universe of big data, whose growth is unstoppable, as the Internet of Things (IoT), data exchange machine to machine (M2M), the increasingly complex environment of IT or predictive machine learning.
The importance of process data
The Big Data challenges, in effect, require capable approaches and systems to collect, store, make efficient searches and, finally, carry out analysis, whose results could be conveniently displayed.
But these challenges are accompanied by a whole world of opportunities for many different actors, in a scientific or enterprise level, or with regard to public organizations.
The knowledge revolution has arrived and the goal is clear: valuing data to exploit them, a key point in which, in addition to innovative technologies, come into play data science and data scientist´s figure.
It is, in short, the way to apply technologies, creating ad hoc strategies and methodologies to implement complex algorithms that provide us an insider view to better decision-making.
Not surprisingly, the data serve less if we do not implement custom solutions. Without a purpose and a technology capable to manage it, the data value will be zero. In contrast, processing that information in the desired direction provides a good data management to obtain comparative advantages.
The same finding of the information sought is, in itself, a great success, the key that will allow us to advance in our goal. Therefore, the data is not an end in them, but it is the best way to reach this really valuable information, that will make the difference.
The goals could be very different types, from monetizing the information to make it an effective tool to improve governance, as occurs in smart cities projects.
Machine learning, predictive artificial intelligence
Within this complex but exciting context, the fashionable part of artificial intelligence dedicated to learning by machines, machine learning is based on systems that automatically learn.
We understand that action learning as identifying complex patterns in millions of data. Basically, the machine is able to predict behaviour “learning” an algorithm that checks the data.
The peculiarity of these methods of prediction, based on algorithmic methods where the certainty of the theoretical model makes way for approximate models based on Probability and Statistics. This way of modelling reality, based on probabilities, is that our brain follows, supported by its large capacity computing. The absolute certainty does not exist for our brain; each of us interprets reality and adjusted it according to a certain probability, required at the time.
Free absolute certainty model, as previously described, its confidence level will move within a given fork, considered a significant level of efficiency. To this end, it will be decisive the practical utility that could provide a certain percentage of correct answers.
And the results could be spectacular, as demonstrated by two of his greatest achievements: the Google voice recognition or facial Facebook. In both cases, without reaching the actual model, which underlies each one, applying machine learning algorithms and obtaining approximate models with an acceptable margin of error and with a huge computing speed. It is this computation speed what we pursue in most cases, looking for a very small error in prediction in a time acceptable to the application.
Machine learning applications
Fields of application of machine learning are endless. Sectors such as e-commerce and marketing, in general, are just a tiny sample of how much a project of machine learning can offer.
When planning any initiative, imagination can play a big role, no restrictions other than the legality and ethics. The scope, in short, depends on the margin, budget and data are available.
Data science teams have in machine learning a great ally. While there are hybrid approaches, they are self learning systems within an ocean of data, without further programming.
Machine learning, for example, is the heart of the recommendations systems of giants network, like eBay, Amazon, Twitter, Facebook or LinkedIn, as well as a host of projects fraud detection networks, data communications, recognition voice, breakdowns in machinery, technological equipment failures, algorithms for predicting disease, leads, crimes or consumer trends.
Machine learning applications in Smart City
At the level of the smart city, any progress in this area could have very interesting applications. From the facial or voice recognition to carry out programs of social inclusion of disabled people to, say, flexible behaviour of a mobile application to suit the preferences and needs of each user.
Predicting the urban traffic or make medical pre-diagnosis based on the patient’s symptoms are other examples of projects under the environment of the smart city that could arise, exploiting the great potential of anonymous data.
Far from being useless, they can actually contribute much value, as demonstrated in the following three next examples, focused on improving public health, sustainable mobility and safety in cities.
First, American psychologists found useful information to improve preventive health policies. This time, they succeeded analyzing the optimistic or pessimistic tone of the tweets in different geographical areas and establishing a correlation with death rates for heart problems.
By overlaying a map generated by the tweets on the map pointing mortality data for coronary pathologies, they found striking similarities. By analyzing 148 million tweets from 1347 US counties, predicting rates of heart disease more effectively than traditional risk factors, including obesity, diabetes or smoking.
The conclusion of the study, conducted by scientists at the University of Pennsylvania, offers no doubt: the social and spatial environment has a decisive influence on coronary problems. Ultimately, an effective social level analysis can not apply to private individuals, but it is of great importance to implement ad hoc policies. And even to track the results, once implemented campaigns.
In the design of smart cities, on the other hand, sustainable mobility is one of the major objectives. In this regard, the IEEE International Workshop on Urban Mobility and Intelligent Transportation Systems UMITS 2016 was a landmark event in which leading initiatives for urban mobility and intelligent transportation systems were presented.
In the event, the work named “Understanding Daily Mobility Patterns in Urban Road Traffic Flow Networks using Analytics” was presented, developed by Tecnalia, within the project a cloud platform services and tools with intelligent mobility in the context of smart cities.
One of the strengths of the project, precisely, lies in the development of different traffic variables prediction algorithms (intensity of vehicles on a road or occupancy), fed by data in real time from the city of Madrid.
According to its creators, it was by applying machine learning techniques, with the objective to study different issues, affecting the traffic, as it was possible to improve the effectiveness of predictions.
Finally, a joint effort of Fondazione Bruno Kessler (FBK), MIT and Telefónica R&D is one of the main achievements of machine learning to anticipate social risks from the analysis of human behaviour.
In this case, related to crime. Its project, “Crime Hot Spots” is a minefield of data generated by smartphones, which could detect future crime scenes. Specifically, it could predict in which district is more likely that a crime occurs in the city of London, with an accuracy of about 70 percent.
Compared to conventional systems, it represents a huge step forward. Instead of relying on costly and time consuming data collection of crime statistics and local demographics, it uses a log made from criminals and demographic statistics City sources, along with data emitted by mobile phones to collect key information about its owners, as its geolocation in real time, as well as sex and age.
After a phase of refining the system to ensure the anonymity of the data and after adapting to other cultural environments, it is important to provide information for public use. Its creators are not doing anything wrong when they claim that their results could be of great interest to governments and security forces.