AutoML: In-depth Guide to Automated Machine Learning in 2024

Cem Dilmegani

Automated Machine Learning

Updated on Dec 22

5 min read

Share on Linkedin Share on Twitter

AutoML: In-depth Guide to Automated Machine Learning in 2024

Table of contents

What is automated machine learning?Which machine learning processes to automate?AutoAI vs AutoML Why is it important now?Have we reached peak autoML?What are the benefits of autoML?Why do we rely so much on data scientists while there are auto ML approaches?Which autoML tool should we start with?For more AutoML

Automated machine learning (AutoML) has the potential to increase the productivity of data scientists significantly and democratize machine learning tools. As the need for data scientists is increasing, autoML tools/services become more popular and help companies use machine learning successfully to extract business insights in an effective and scalable manner. It can be a powerful solution to the well documented scarcity of data scientists.

What is automated machine learning?

Automated Machine Learning (AutoML) is an emerging technology to automate manual and repetitive machine learning tasks. Automation of these tasks will accelerate processes, reduce errors and costs, and provide more accurate results, as it enables businesses to select the best-performing algorithm. Here is Wikipedia’s definition of autoML:

Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.

Which machine learning processes to automate?

AutoML services aim to automate some or all steps of the machine learning process which includes:

Data pre-processing: This process includes improving data quality and converting unstructured, raw data to a structured format with methods like data cleaning, data integration, data transformation, and data reduction.
Feature engineering: AutoML can automate this method to create features that are more compatible with machine learning algorithms by analyzing the input data.
Feature extraction: This process includes combining different features, or datasets to generate new features that will enable more accurate results and reduce the size of data being processed.
Feature selection: AutoML can automate the task of selecting only useful features for processing.
Algorithm selection & hyperparameter optimization: AutoML tools can choose optimal hyperparameters and algorithms without human intervention.

Since accuracy of machine learning solutions can be measured, automated systems can fine-tune data, features, algorithms and hyperparameters of algorithms to generate accurate models relying on established machine learning knowledge and trial-and-error.

Please see the image below by DataRobot, a leading autoML vendor, where areas highlighted in gray illustrate which parts of the machine learning process are automated via autoML.

AutoAI vs AutoML

There is no strict distinction between AutoAI and AutoML. Some vendors explain automated AI as a variant of AutoML that uses intelligent automation to automate tasks throughout the entire lifecycle of machine learning (ML) and artificial intelligence models.

On the other hand, there may be AutoML tools that use intelligent automation and automate as many tasks as possible within the ML lifecycle. So, if you come across different tools labeled as AutoML or AutoAI, you should check:

How these tools automate model building processes?
Which processes can be automated?
What additional features does it offer compared to other tools?

in order to choose the tool that best suits your needs.

Why is it important now?

Need for more data scientists

As data science becomes a more integrated part of our lives, businesses need more solutions in this field and demand more data scientists to build these solutions. Without data science methods, companies might be unable to understand their processes, monitor performance levels, or take certain actions to prevent huge losses.

According to BLS, data scientist jobs will grow by 36% from 2021 to 2031 while the average growth rate for all occupations is 5%. Considering the scarcity of data scientists and the amount of time for building data science solutions, autoML solutions can help businesses satisfy their demand for data scientists.

Errors in applying machine learning algorithms

It is up to data scientists to implement machine learning algorithms and choose a method that works best for the business case. However, the implementation process is prone to human made errors and bias. AutoML tools can automate this process and also run a broader set of machine learning algorithms to select the best one, which might not be considered by data scientists before.

Today, Facebook trains around 300,000 machine learning models to improve its machine learning processes and even created its AutoML engineer named Asimo to generate improved versions of existing models automatically.

As these capabilities will accelerate machine learning processes, autoML solutions will improve the return on investment (ROI) of machine learning projects.

Have we reached peak autoML?

The autoML market has generated a revenue of $270 million in 2019 and is expected to reach $14,512 million by 2030, advancing at a CAGR of 43.7% during the forecast period (2020–2030). Considering that, we believe autoML hasn’t reached a peak, and that interest in autoML will continue to grow.

What are the benefits of autoML?

Cost reductions
- Increased productivity for data scientists
- Democratization of machine learning reduces demand for data scientists
Increased revenues and customer satisfaction
Rolling out more models with increased accuracy can improve other, less tangible business results as well. For example, models lead to automation which improves employee engagement allowing them to focus on more interesting tasks

Why do we rely so much on data scientists while there are auto ML approaches?

Data scientists have 2 advantages in model building when compared to current auto ML approaches:

Conformance to custom specifications: Most autoML tools optimize for model performance however that is just one of the specifications of real life machine learning projects. For example:
- If a model needs to be embedded in edge devices, computing and storage requirements force companies to choose simpler models.
- If explanability is desirable, only certain types of models can be used.
Model performance: On Kaggle, the community of machine learning competitions, humans are still easily beating models generated by autoML tools. autoML tools have yet to win any data science competitions.

Over time, it is likely that autoML tools will grow stronger and these advantages will diminish or disappear. More importantly, data scientists and their managers are responsible for important tasks beyond modeling:

Identify the models to be built. The first step in data science is also the most important one. It requires understanding the business, data accessible through internal and external resources, data quality issues, privacy and computing requirements, organizational challenges.
Manage the human aspects of model implementation. They convince subject matter experts and executives of the superiority of the model compared to the current solution. They explain possible shortcomings of the model and take steps to overcome those shortcomings.

These are more fundamental strengths of data scientists and solving these problems will not come into the realm of machines for quite some time. However, that does not mean that data scientists are uniquely qualified to handle these challenges.

We expect the basics of data science to become as common knowledge as the basics of statistics today. While not everyone is familiar with advanced statistics, critical concepts like distributions, variance and mean are common knowledge and inform corporate decisions. Like excel democratized data storage and manipulation and augmented all white collar workers, autoML tools have the potential to democratize data science for companies.

Which autoML tool should we start with?

We analyzed the ecosystem of autoML providers in this comprehensive article. You can also see our sortable list of the most recent AutoML vendors in our website. Here are some of the leading vendors:

Dataiku
DataRobot
dotData
Google Cloud AutoML
H2O.ai
Tazi.ai
TPOT

For more AutoML

If you are interested, feel free to read our AutoML case studies article. AutoML is an important part of future of AI, for more on trends shaping AI, feel free to read our research on future of AI.

If you are unsure about where to start when choosing a vendor, we have data-driven lists of vendors for:

ML software
And AutoML software

Finally, if you still had any question, reach out to us:

Find the Right Vendors

Featured image source: gooddata.com

Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Comments

Your email address will not be published. All fields are required.

0 Comments

Related research

What is Data Augmentation? Techniques & Examples in 2024

Dec 264 min read

What is Synthetic Data? Use Cases & Benefits in 2024

Jan 187 min read

AutoML: In-depth Guide to Automated Machine Learning in 2024

What is automated machine learning?

Which machine learning processes to automate?

AutoAI vs AutoML

Why is it important now?

Need for more data scientists

Errors in applying machine learning algorithms

Have we reached peak autoML?

What are the benefits of autoML?

Why do we rely so much on data scientists while there are auto ML approaches?

Which autoML tool should we start with?

For more AutoML

Next to Read

ML Model Management: Challenges & Best Practices in 2024

ML Metadata Store in 2024: What is it? & Benefits

Chatbot Testing in 2024: A/B, Auto, & Manual Testing

Comments

Related research

What is Data Augmentation? Techniques & Examples in 2024

What is Synthetic Data? Use Cases & Benefits in 2024