Automated machine learning has the potential to greatly increase the productivity of data scientist and democratize machine learning tools. It can be a powerful solution to the well documented scarcity of data scientists.
What is automated machine learning?
According to Wikipedia:
Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.
Automated ML solutions aim to automate some or all steps of the machine learning process which includes:
- Data pre-processing
- Feature engineering
- Feature extraction
- Feature selection
- Algorithm selection & hyperparameter optimization
Since accuracy of machine learning solutions can be measured, automated systems can fine-tune data, features, algorithms and hyperparameters of algorithms to generate accurate models relying on established machine learning knowledge and trial&error.
What are the benefits of autoML?
- Cost reductions
- Increased productivity for data scientists
- Democratization of machine learning reduces demand for data scientists
- Increased revenues and customer satisfaction
- Rolling out more models with increased accuracy can improve other, less tangible business results as well. For example, models lead to automation which improves employee engagement allowing them to focus on more interesting tasks
Why do we rely so much on data scientists while there are auto ML approaches?
Data scientists have 2 advantages in model building when compared to current auto ML approaches:
- Conformance to custom specifications: Most autoML tools optimize for model performance however that is just one of the specifications of real life machine learning projects. For example:
- If model needs to be embedded in edge devices, computing and storage requirements force companies to choose simpler models.
- If explanability is desirable, only certain types of models can be used.
- Model performance: On Kaggle, the community of machine learning competitions, humans are still easily beating models generated by autoML tools. autoML tools have yet to win any data science competitions.
Over time, it is likely that autoML tools will grow stronger and these advantages will diminish or disappear. More importantly, data scientists and their managers are responsible for important tasks beyond modeling:
- Identify the models to be built. The first step in data science is also the most important one. It requires understanding the business, data accessible through internal and external resources, data quality issues, privacy and computing requirements, organizational challenges.
- Manage the human aspects of model implementation. They convince subject matter experts and executives of the superiority of the model compared to current solution. They explain possible shortcomings of the model and take steps to overcome those shortcoming.
These are more fundamental strengths of data scientists and solving these problems will not come into the realm of machines for quite some time. However, that does not mean that data scientists are uniquely qualified to handle these challenges.
I expect basics of data science to become as common knowledge as basics of statistics today. While not everyone is familiar with advanced statistics, critical concepts like distributions, variance and mean are common knowledge and inform corporate decisions. Like excel democratized data storage and manipulation and augmented all white collar workers, autoML tools have the potential to democratize data science for companies.
Which autoML tool should we start with?
We analyzed the ecosystem of autoML providers in this comprehensive article.
Featured image source: gooddata.com