It might appear at first glance that data scientists have some sort of magical quality to them. They are given a data set to which they wave a wand which produce insights out of thin air that will result in transforming the business and taking it to greater heights. However, looks really are deceiving -there is so much more that actually goes into the data science process and the building of a machine learning model.
Machine learning teaches computers to do what comes so naturally to humans and animals, which is to learn from experience, by finding natural patterns in data that generate insight. These models are used daily to make those critical decisions which result in a business taking off and flying or keeping it stuck on the ground. Businesses such as YouTube and Netflix rely on machine learning algorithms to sift through millions of options to give users song or movie recommendations. Ever wondered why your car insurance costs less if you pay your account on time? Data scientists in the insurance industry found that people who pay their accounts promptly are less likely to be involved in accidents. In short, these invaluable insights are generated by machine learning. The question to ask is how it comes up with these insights. Where does it all start? This post does not attempt to offer a machine learning work flow but rather discusses key considerations to keep in mind to ensure that you start the machine learning process off correctly and approach it in the most efficient way.
A specific question
Always start the machine learning project with a clear question in mind. Asking the right questions (which can be thought of as generating hypotheses), will result in answers that solve known business problems or even unearth unknown insights that will create business value. As you start to formulate your question, it is important to consider what it is that you are trying to achieve. Generating the right type of questions that should be asked to help further the business is crucial and therefore it is important to have a solid understanding of the business. For example, the problem of traffic congestion is something that affects us all in countless ways. Traffic congestion is both a time and money waster. A good, clear question around this issue would be “How can the duration of traffic lights be optimised by using data on traffic patterns, weather, and pedestrian traffic?”.
Clear goals with measurable outcomes
Machine learning implementation requires a definite purpose and corresponding goals. Only once there are well-defined goals can you begin to survey the available resources and all the possibilities for moving toward these goals. It is important to decide how you will define success and how you will measure whether you have achieved your goal. In the case of the optimisation of traffic light duration, an example of measurable outcomes may be to see a:
- % decrease in the time people spend commuting
- % decrease in length of traffic jams
These goals are both easy to define and to measure. If the model is successful, we should see a % decrease in both of these metrics.
The correct data to answer the question
The saying, “Garbage in, garbage out” applies. The model cannot be very effective if fed bad data. A good question that relies on irrelevant data will make an answer difficult to find. It is therefore important to perform an initial data collection and see if sufficient data is available. Another step is the use of descriptive statistics and visualisations to gain initial insights into the quality of the data. It pays to know the data you have and what it can do for you. Preparing your data and making sure you have sufficient, good quality data is arguably the most time-consuming portion of the machine learning process. It is the process of taking data and information in difficult and unstructured formats and converting this into something that can be used by the machine learning model. There is no one way or one tool to accomplish the goal of making messy data clean, for example one could use Python, R, SQL or any number of other tools to get your data ready for the machine learning process.
Do not rush the process! Machine learning is iterative
With all the important groundwork complete, the data scientist will get down to ‘the fun stuff’— diving into a clean data set and applying the algorithms that will extract value. This stage normally requires the data scientists to play around with various models until they find a good fit, making this process highly iterative. It is important at every stage to get feedback on how well the model is performing . This can push data scientists back to previous steps before reaching the end of the process, forcing them to revisit their methods, techniques, or even to reconsider whether or not the original question was the right one in the first place. When you finally arrive at a definitive result, rather than sitting back and patting yourself on the back, you will almost always find that the answer simply sparks more questions and the process begins again.
Making use of these guidelines will enable you to navigate your way through your first machine learning project and help ensure that you extract value from your data. You will find that once you know how to approach and perform the magic of machine learning, it can become a powerful tool to answering your businesses questions. For example, the question of whether traffic light durations can be optimised to solve the problem of traffic congestion has already been answered in Pittsburg by a company called . They have installed an adaptive traffic control system at 50 intersections and since launching the system, the company claims they have been able to reduce waiting times at intersections by 40%, journey times in the city by 25% and vehicle emissions by up to 20%. Their AI models process available data to come up with the best way to move traffic through an intersection. To the everyday person, having traffic lights act autonomously and speak to other traffic lights to reduce the amount of time commuters spend waiting in traffic may seem like a magic trick. But once you know how the trick is done, it becomes easy to replicate, and the results will have customers and competitors watching in amazement.