Machine Learning Mastery
https://whysper.io/feeds/showfeed/M1iNrG5m
[Podcast created from https://feeds.feedburner.com/machinelearningmastery by Whysper™  The world, in audio™]
enus
Copyright © 2020 Whysper™
Thu, 22 Oct 2020 18:05:48 Z
https://s3.amazonaws.com/lysten/podcastfeeds/M1iNrG5m/M1iNrG5m_artwork.jpg
Machine Learning Mastery
https://whysper.io/feeds/showfeed/M1iNrG5m

https://machinelearningmastery.com/?p=11098whysperdebugkey:bPegO
https://machinelearningmastery.com/whatisensemblelearning/
A Gentle Introduction to Ensemble Learning (texttospeech Powered by Whysper™)
[Episode created from https://machinelearningmastery.com/whatisensemblelearning/ by Whysper™  The world, in audio™]
Thu, 22 Oct 2020 18:00:25 Z
20201022T18:05:48Z
0:11:55

https://machinelearningmastery.com/?p=11073whysperdebugkey:yh1s4
https://machinelearningmastery.com/ensemblelearningbooks/
6 Books on Ensemble Learning (texttospeech Powered by Whysper™)
[Podcast episode created from https://machinelearningmastery.com/ensemblelearningbooks/ by Whysper™  The world, in audio™]
Tue, 20 Oct 2020 18:00:44 Z
20201020T18:03:57Z
0:14:21

https://machinelearningmastery.com/?p=10597whysperdebugkey:4Aod3
https://machinelearningmastery.com/softmaxactivationfunctionwithpython/
Softmax Activation Function with Python (texttospeech Powered by Whysper™)
[Podcast episode created from https://machinelearningmastery.com/softmaxactivationfunctionwithpython/ by Whysper™  The world, in audio™]
Sun, 18 Oct 2020 18:00:19 Z
20201018T18:04:18Z
0:15:27

https://machinelearningmastery.com/?p=10591whysperdebugkey:TWIqH
https://machinelearningmastery.com/larsregressionwithpython/
How to Develop LARS Regression Models in Python (texttospeech Powered by Whysper™)
[Podcast episode created from https://machinelearningmastery.com/larsregressionwithpython/ by Whysper™  The world, in audio™]
Thu, 15 Oct 2020 18:00:47 Z
20201015T18:06:12Z
0:00:00

https://machinelearningmastery.com/?p=10580whysperdebugkey:EG7ua
https://machinelearningmastery.com/ridgeregressionwithpython/
How to Develop Ridge Regression Models in Python (texttospeech Powered by Whysper™)
[Podcast episode created from https://machinelearningmastery.com/ridgeregressionwithpython/ by Whysper™  The world, in audio™]
Thu, 08 Oct 2020 18:00:08 Z
20201012T01:10:57Z
0:00:00

https://machinelearningmastery.com/?p=10683whysperdebugkey:LmZAo
https://machinelearningmastery.com/gaussianprocessesforclassificationwithpython/
Gaussian Processes for Classification With Python (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/gaussianprocessesforclassificationwithpython/ by Whysper™  The world, in audio™]
Thu, 01 Oct 2020 19:00:05 Z
20201001T19:02:26Z
0:07:43

https://machinelearningmastery.com/?p=10675whysperdebugkey:www4q
https://machinelearningmastery.com/radiusneighborsclassifieralgorithmwithpython/
Radius Neighbors Classifier Algorithm With Python (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/radiusneighborsclassifieralgorithmwithpython/ by Whysper™  The world, in audio™]
Tue, 29 Sep 2020 19:00:45 Z
20200929T19:05:28Z
0:00:00

https://machinelearningmastery.com/?p=10668whysperdebugkey:MV2cK
https://machinelearningmastery.com/lineardiscriminantanalysiswithpython/
Linear Discriminant Analysis With Python (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/lineardiscriminantanalysiswithpython/ by Whysper™  The world, in audio™]
Sun, 27 Sep 2020 19:00:20 Z
20200927T19:05:48Z
0:00:00

https://machinelearningmastery.com/?p=10960whysperdebugkey:DiHVg
https://machinelearningmastery.com/traintothetestsetinmachinelearning/
How to Train to the Test Set in Machine Learning (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/traintothetestsetinmachinelearning/ by Whysper™  The world, in audio™]
Tue, 22 Sep 2020 19:00:41 Z
20200922T19:05:07Z
0:00:00

https://machinelearningmastery.com/?p=10522whysperdebugkey:v2DDl
https://machinelearningmastery.com/automllibrariesforpython/
Automated Machine Learning (AutoML) Libraries for Python (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/automllibrariesforpython/ by Whysper™  The world, in audio™]
Thu, 17 Sep 2020 19:00:56 Z
20200917T19:06:04Z
0:15:27

https://machinelearningmastery.com/?p=10512whysperdebugkey:GHIbl
https://machinelearningmastery.com/combinedalgorithmselectionandhyperparameteroptimization/
Combined Algorithm Selection and Hyperparameter Optimization (CASH Optimization) (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/combinedalgorithmselectionandhyperparameteroptimization/ by Whysper™  The world, in audio™]
Tue, 15 Sep 2020 19:00:08 Z
20200915T19:03:45Z
0:00:00

https://machinelearningmastery.com/?p=10482whysperdebugkey:ANUeT
https://machinelearningmastery.com/hyperoptforautomatedmachinelearningwithscikitlearn/
HyperOpt for Automated Machine Learning With ScikitLearn (texttospeech powered by Whysper™  whysper.io)
[Podcast episode created from https://machinelearningmastery.com/hyperoptforautomatedmachinelearningwithscikitlearn/ by Whysper™  Your world, in audio™]
Thu, 10 Sep 2020 19:00:18 Z
20200910T19:05:48Z
0:07:43

https://machinelearningmastery.com/?p=10475whysperdebugkey:zFTu3
https://machinelearningmastery.com/tpotforautomatedmachinelearninginpython/
TPOT for Automated Machine Learning in Python (texttospeech Powered by Whysper)
Automated Machine Learning (AutoML) refers to techniques for automatically discovering wellperforming models for predictive modeling tasks with very little user involvement. TPOT is an opensource library for performing AutoML in Python. It makes use of the popular ScikitLearn machine learning library for data transforms and machine learning algorithms and uses a Genetic Programming stochastic global […]
The post TPOT for Automated Machine Learning in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/tpotforautomatedmachinelearninginpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 08 Sep 2020 19:00:21 Z
20200908T19:04:02Z
0:07:43

https://machinelearningmastery.com/?p=10463whysperdebugkey:adU3N
https://machinelearningmastery.com/autosklearnforautomatedmachinelearninginpython/
AutoSklearn for Automated Machine Learning in Python (texttospeech Powered by Whysper)
Automated Machine Learning (AutoML) refers to techniques for automatically discovering wellperforming models for predictive modeling tasks with very little user involvement. AutoSklearn is an opensource library for performing AutoML in Python. It makes use of the popular ScikitLearn machine learning library for data transforms and machine learning algorithms and uses a Bayesian Optimization search procedure […]
The post AutoSklearn for Automated Machine Learning in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/autosklearnforautomatedmachinelearninginpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 06 Sep 2020 19:00:39 Z
20200906T19:05:46Z
0:00:00

https://machinelearningmastery.com/?p=10456whysperdebugkey:TotN1
https://machinelearningmastery.com/scikitoptimizeforhyperparametertuninginmachinelearning/
ScikitOptimize for Hyperparameter Tuning in Machine Learning (texttospeech Powered by Whysper)
Hyperparameter optimization refers to performing a search in order to discover the set of specific model configuration arguments that result in the best performance of the model on a specific dataset. There are many ways to perform hyperparameter optimization, although modern methods, such as Bayesian Optimization, are fast and effective. The ScikitOptimize library is an […]
The post ScikitOptimize for Hyperparameter Tuning in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/scikitoptimizeforhyperparametertuninginmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 03 Sep 2020 19:00:41 Z
20200903T19:05:53Z
0:00:00

https://machinelearningmastery.com/autokerasforclassificationandregression/
https://machinelearningmastery.com/autokerasforclassificationandregression/
How to Use AutoKeras for Classification and Regression (texttospeech Powered by Whysper)
AutoML refers to techniques for automatically discovering the bestperforming model for a given dataset. When applied to neural networks, this involves both discovering the model architecture and the hyperparameters used to train the model, generally referred to as neural architecture search. AutoKeras is an opensource library for performing AutoML for deep learning models. The search […]
The post How to Use AutoKeras for Classification and Regression appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/autokerasforclassificationandregression/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 01 Sep 2020 19:00:46 Z
20200901T19:04:39Z
0:00:00

https://machinelearningmastery.com/multilabelclassificationwithdeeplearning/
https://machinelearningmastery.com/multilabelclassificationwithdeeplearning/
MultiLabel Classification with Deep Learning (texttospeech Powered by Whysper)
Multilabel classification involves predicting zero or more class labels. Unlike normal classification tasks where class labels are mutually exclusive, multilabel classification requires specialized machine learning algorithms that support predicting multiple mutually nonexclusive classes or “labels.” Deep learning neural networks are an example of an algorithm that natively supports multilabel classification problems. Neural network models for […]
The post MultiLabel Classification with Deep Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/multilabelclassificationwithdeeplearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 30 Aug 2020 19:00:35 Z
20200830T19:03:31Z
0:00:00

https://machinelearningmastery.com/deeplearningmodelsformultioutputregression/
https://machinelearningmastery.com/deeplearningmodelsformultioutputregression/
Deep Learning Models for MultiOutput Regression (texttospeech Powered by Whysper)
Multioutput regression involves predicting two or more numerical variables. Unlike normal regression where a single value is predicted for each sample, multioutput regression requires specialized machine learning algorithms that support outputting multiple variables for each prediction. Deep learning neural networks are an example of an algorithm that natively supports multioutput regression problems. Neural network models […]
The post Deep Learning Models for MultiOutput Regression appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/deeplearningmodelsformultioutputregression/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 27 Aug 2020 19:00:19 Z
20200827T19:18:28Z
0:15:27

https://machinelearningmastery.com/timeseriesforecastingwithprophetinpython/
https://machinelearningmastery.com/timeseriesforecastingwithprophetinpython/
Time Series Forecasting With Prophet in Python (texttospeech Powered by Whysper)
Time series forecasting can be challenging as there are many different methods you could use and many different hyperparameters for each method. The Prophet library is an opensource library designed for making forecasts for univariate time series datasets. It is easy to use and designed to automatically find a good set of hyperparameters for the […]
The post Time Series Forecasting With Prophet in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/timeseriesforecastingwithprophetinpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 25 Aug 2020 19:00:44 Z
20200825T19:03:19Z
0:00:00

https://machinelearningmastery.com/numpyaxisforrowsandcolumns/
https://machinelearningmastery.com/numpyaxisforrowsandcolumns/
How to Set Axis for Rows and Columns in NumPy (texttospeech Powered by Whysper)
NumPy arrays provide a fast and efficient way to store and manipulate data in Python. They are particularly useful for representing data as vectors and matrices in machine learning. Data in NumPy arrays can be accessed directly via column and row indexes, and this is reasonably straightforward. Nevertheless, sometimes we must perform operations on arrays […]
The post How to Set Axis for Rows and Columns in NumPy appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/numpyaxisforrowsandcolumns/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 23 Aug 2020 19:00:54 Z
20200823T19:06:09Z
0:00:00

https://machinelearningmastery.com/hypothesistestforcomparingmachinelearningalgorithms/
https://machinelearningmastery.com/hypothesistestforcomparingmachinelearningalgorithms/
Hypothesis Test for Comparing Machine Learning Algorithms (texttospeech Powered by Whysper)
Machine learning models are chosen based on their mean performance, often calculated using kfold crossvalidation. The algorithm with the best mean performance is expected to be better than those algorithms with worse mean performance. But what if the difference in the mean performance is caused by a statistical fluke? The solution is to use a […]
The post Hypothesis Test for Comparing Machine Learning Algorithms appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/hypothesistestforcomparingmachinelearningalgorithms/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 20 Aug 2020 19:00:24 Z
20200820T19:02:53Z
0:00:00

https://machinelearningmastery.com/calculatethebiasvariancetradeoff/
https://machinelearningmastery.com/calculatethebiasvariancetradeoff/
How to Calculate the BiasVariance Tradeoff with Python (texttospeech Powered by Whysper)
The performance of a machine learning model can be characterized in terms of the bias and the variance of the model. A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is […]
The post How to Calculate the BiasVariance Tradeoff with Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/calculatethebiasvariancetradeoff/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 18 Aug 2020 19:00:24 Z
20200818T19:02:17Z
0:00:00

https://machinelearningmastery.com/differentresultseachtimeinmachinelearning/
https://machinelearningmastery.com/differentresultseachtimeinmachinelearning/
Why Do I Get Different Results Each Time in Machine Learning? (texttospeech Powered by Whysper)
Are you getting different results for your machine learning algorithm? Perhaps your results differ from a tutorial and you want to understand why. Perhaps your model is making different predictions each time it is trained, even when it is trained on the same data set each time. This is to be expected and might even […]
The post Why Do I Get Different Results Each Time in Machine Learning? appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/differentresultseachtimeinmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 16 Aug 2020 19:00:08 Z
20200816T19:05:22Z
0:00:00

https://machinelearningmastery.com/plotadecisionsurfaceformachinelearning/
https://machinelearningmastery.com/plotadecisionsurfaceformachinelearning/
Plot a Decision Surface for Machine Learning Algorithms in Python (texttospeech Powered by Whysper)
Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision […]
The post Plot a Decision Surface for Machine Learning Algorithms in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/plotadecisionsurfaceformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 13 Aug 2020 19:00:23 Z
20200813T19:02:23Z
0:00:00

https://machinelearningmastery.com/seaborndatavisualizationformachinelearning/
https://machinelearningmastery.com/seaborndatavisualizationformachinelearning/
How to use Seaborn Data Visualization for Machine Learning (texttospeech Powered by Whysper)
Data visualization provides insight into the distribution and relationships between variables in a dataset. This insight can be helpful in selecting data preparation techniques to apply prior to modeling and the types of algorithms that may be most suited to the data. Seaborn is a data visualization library for Python that runs on top of […]
The post How to use Seaborn Data Visualization for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/seaborndatavisualizationformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 09 Aug 2020 19:00:07 Z
20200809T19:01:29Z
0:00:00

https://machinelearningmastery.com/multiclassimbalancedclassification/
https://machinelearningmastery.com/multiclassimbalancedclassification/
MultiClass Imbalanced Classification (texttospeech Powered by Whysper)
Imbalanced classification are those prediction tasks where the distribution of examples across class labels is not equal. Most imbalanced classification examples focus on binary classification tasks, yet many of the tools and techniques for imbalanced classification also directly support multiclass classification problems. In this tutorial, you will discover how to use the tools of imbalanced […]
The post MultiClass Imbalanced Classification appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/multiclassimbalancedclassification/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 06 Aug 2020 19:00:35 Z
20200806T19:22:46Z
0:00:00

https://machinelearningmastery.com/xgboostfortimeseriesforecasting/
https://machinelearningmastery.com/xgboostfortimeseriesforecasting/
How to Use XGBoost for Time Series Forecasting (texttospeech Powered by Whysper)
XGBoost is an efficient implementation of gradient boosting for classification and regression problems. It is both fast and efficient, performing well, if not the best, on a wide range of predictive modeling tasks and is a favorite among data science competition winners, such as those on Kaggle. XGBoost can also be used for time series […]
The post How to Use XGBoost for Time Series Forecasting appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/xgboostfortimeseriesforecasting/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 04 Aug 2020 19:00:17 Z
20200804T19:21:01Z
0:00:00

https://machinelearningmastery.com/repeatedkfoldcrossvalidationwithpython/
https://machinelearningmastery.com/repeatedkfoldcrossvalidationwithpython/
Repeated kFold CrossValidation for Model Evaluation in Python (texttospeech Powered by Whysper)
The kfold crossvalidation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the kfold crossvalidation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated kfold crossvalidation provides a […]
The post Repeated kFold CrossValidation for Model Evaluation in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/repeatedkfoldcrossvalidationwithpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 02 Aug 2020 19:00:20 Z
20200802T19:10:28Z
0:00:00

https://machinelearningmastery.com/nestedcrossvalidationformachinelearningwithpython/
https://machinelearningmastery.com/nestedcrossvalidationformachinelearningwithpython/
Nested CrossValidation for Machine Learning with Python (texttospeech Powered by Whysper)
The kfold crossvalidation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same crossvalidation procedure and […]
The post Nested CrossValidation for Machine Learning with Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/nestedcrossvalidationformachinelearningwithpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 28 Jul 2020 19:00:28 Z
20200728T19:08:17Z
0:00:00

https://machinelearningmastery.com/loocvforevaluatingmachinelearningalgorithms/
https://machinelearningmastery.com/loocvforevaluatingmachinelearningalgorithms/
LOOCV for Evaluating Machine Learning Algorithms (texttospeech Powered by Whysper)
The LeaveOneOut CrossValidation, or LOOCV, procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a computationally expensive procedure to perform, although it results in a reliable and unbiased estimate of model performance. Although simple to use […]
The post LOOCV for Evaluating Machine Learning Algorithms appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/loocvforevaluatingmachinelearningalgorithms/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 26 Jul 2020 19:00:35 Z
20200726T19:13:23Z
0:00:00

https://machinelearningmastery.com/traintestsplitforevaluatingmachinelearningalgorithms/
https://machinelearningmastery.com/traintestsplitforevaluatingmachinelearningalgorithms/
TrainTest Split for Evaluating Machine Learning Algorithms (texttospeech Powered by Whysper)
The traintest split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive […]
The post TrainTest Split for Evaluating Machine Learning Algorithms appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/traintestsplitforevaluatingmachinelearningalgorithms/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 23 Jul 2020 19:00:49 Z
20200723T19:09:20Z
0:00:00

https://machinelearningmastery.com/selectivelyscalenumericalinputvariablesformachinelearning/
https://machinelearningmastery.com/selectivelyscalenumericalinputvariablesformachinelearning/
How to Selectively Scale Numerical Input Variables for Machine Learning (texttospeech Powered by Whysper)
Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully […]
The post How to Selectively Scale Numerical Input Variables for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/selectivelyscalenumericalinputvariablesformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 21 Jul 2020 19:00:15 Z
20200721T19:13:17Z
0:00:00

https://machinelearningmastery.com/binaryflagsformissingvaluesformachinelearning/
https://machinelearningmastery.com/binaryflagsformissingvaluesformachinelearning/
Add Binary Flags for Missing Values for Machine Learning (texttospeech Powered by Whysper)
Missing values can cause problems when modeling classification and regression prediction problems with machine learning algorithms. A common approach is to replace missing values with a calculated statistic, such as the mean of the column. This allows the dataset to be modeled as per normal but gives no indication to the model that the row […]
The post Add Binary Flags for Missing Values for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/binaryflagsformissingvaluesformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 19 Jul 2020 19:00:08 Z
20200719T19:24:03Z
0:00:00

https://machinelearningmastery.com/createcustomdatatransformsforscikitlearn/
https://machinelearningmastery.com/createcustomdatatransformsforscikitlearn/
How to Create Custom Data Transforms for ScikitLearn (texttospeech Powered by Whysper)
The scikitlearn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]
The post How to Create Custom Data Transforms for ScikitLearn appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/createcustomdatatransformsforscikitlearn/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 16 Jul 2020 19:00:23 Z
20200716T19:13:17Z
0:00:00

https://machinelearningmastery.com/gridsearchdatapreparationtechniques/
https://machinelearningmastery.com/gridsearchdatapreparationtechniques/
How to Grid Search Data Preparation Techniques (texttospeech Powered by Whysper)
Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data […]
The post How to Grid Search Data Preparation Techniques appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/gridsearchdatapreparationtechniques/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 14 Jul 2020 19:00:19 Z
20200714T19:38:11Z
0:00:00

https://machinelearningmastery.com/frameworkfordatapreparationformachinelearning/
https://machinelearningmastery.com/frameworkfordatapreparationformachinelearning/
Framework for Data Preparation Techniques in Machine Learning (texttospeech Powered by Whysper)
There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and highdimensionality of […]
The post Framework for Data Preparation Techniques in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/frameworkfordatapreparationformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 12 Jul 2020 19:00:29 Z
20200712T19:13:46Z
0:00:00

https://machinelearningmastery.com/dimensionalityreductionalgorithmswithpython/
https://machinelearningmastery.com/dimensionalityreductionalgorithmswithpython/
6 Dimensionality Reduction Algorithms With Python (texttospeech Powered by Whysper)
Dimensionality reduction is an unsupervised learning technique. Nevertheless, it can be used as a data transform preprocessing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms. There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good […]
The post 6 Dimensionality Reduction Algorithms With Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/dimensionalityreductionalgorithmswithpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 09 Jul 2020 19:00:07 Z
20200709T19:17:05Z
0:00:00

https://machinelearningmastery.com/modelbasedoutlierdetectionandremovalinpython/
https://machinelearningmastery.com/modelbasedoutlierdetectionandremovalinpython/
4 Automatic Outlier Detection Algorithms in Python (texttospeech Powered by Whysper)
The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling performance. Identifying and removing outliers is challenging with simple statistical methods for most machine learning datasets given the large number of input variables. Instead, automatic outlier detection methods can be used in the modeling pipeline […]
The post 4 Automatic Outlier Detection Algorithms in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/modelbasedoutlierdetectionandremovalinpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 07 Jul 2020 19:00:35 Z
20200708T01:17:30Z
0:00:00

https://machinelearningmastery.com/featureextractionontabulardata/
https://machinelearningmastery.com/featureextractionontabulardata/
How to Use Feature Extraction on Tabular Data for Machine Learning (texttospeech Powered by Whysper)
Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithm, then carefully choose the most appropriate data […]
The post How to Use Feature Extraction on Tabular Data for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/featureextractionontabulardata/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 05 Jul 2020 19:00:24 Z
20200705T19:25:21Z
0:00:00

https://machinelearningmastery.com/choosedatapreparationmethodsformachinelearning/
https://machinelearningmastery.com/choosedatapreparationmethodsformachinelearning/
How to Choose Data Preparation Methods for Machine Learning (texttospeech Powered by Whysper)
Data preparation is an important part of a predictive modeling project. Correct application of data preparation will transform raw data into a representation that allows learning algorithms to get the most out of the data and make skillful predictions. The problem is choosing a transform or sequence of transforms that results in a useful representation […]
The post How to Choose Data Preparation Methods for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/choosedatapreparationmethodsformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 02 Jul 2020 19:00:50 Z
20200702T19:21:50Z
0:00:00

https://machinelearningmastery.com/booksondatacleaningdatapreparationandfeatureengineering/
https://machinelearningmastery.com/booksondatacleaningdatapreparationandfeatureengineering/
8 Top Books on Data Cleaning and Feature Engineering (texttospeech Powered by Whysper)
Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes […]
The post 8 Top Books on Data Cleaning and Feature Engineering appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/booksondatacleaningdatapreparationandfeatureengineering/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 30 Jun 2020 19:00:36 Z
20200630T19:07:31Z
0:00:00

https://machinelearningmastery.com/datapreparationformachinelearning7dayminicourse/
https://machinelearningmastery.com/datapreparationformachinelearning7dayminicourse/
Data Preparation for Machine Learning (7Day MiniCourse) (texttospeech Powered by Whysper)
Data Preparation for Machine Learning Crash Course. Get on top of data preparation with Python in 7 days. Data preparation involves transforming raw data into a form that is more appropriate for modeling. Preparing data may be the most important part of a predictive modeling project and the most timeconsuming, although it seems to be […]
The post Data Preparation for Machine Learning (7Day MiniCourse) appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/datapreparationformachinelearning7dayminicourse/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 28 Jun 2020 19:00:35 Z
20200628T19:22:13Z
0:00:00

https://machinelearningmastery.com/featureengineeringandselectionbookreview/
https://machinelearningmastery.com/featureengineeringandselectionbookreview/
Feature Engineering and Selection (Book Review) (texttospeech Powered by Whysper)
Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a […]
The post Feature Engineering and Selection (Book Review) appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/featureengineeringandselectionbookreview/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 25 Jun 2020 19:00:48 Z
20200625T19:16:15Z
0:00:00

https://machinelearningmastery.com/knnimputationformissingvaluesinmachinelearning/
https://machinelearningmastery.com/knnimputationformissingvaluesinmachinelearning/
kNN Imputation for Missing Values in Machine Learning (texttospeech Powered by Whysper)
Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. This is called missing data imputation, or imputing for short. A popular approach to missing […]
The post kNN Imputation for Missing Values in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/knnimputationformissingvaluesinmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 23 Jun 2020 19:00:11 Z
20200623T19:02:04Z
0:00:00

https://machinelearningmastery.com/datapreparationwithoutdataleakage/
https://machinelearningmastery.com/datapreparationwithoutdataleakage/
How to Avoid Data Leakage When Performing Data Preparation (texttospeech Powered by Whysper)
Data preparation is the process of transforming raw data into a form that is appropriate for modeling. A naive approach to preparing data applies the transform on the entire dataset before evaluating the performance of the model. This results in a problem referred to as data leakage, where knowledge of the holdout test set leaks […]
The post How to Avoid Data Leakage When Performing Data Preparation appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/datapreparationwithoutdataleakage/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 21 Jun 2020 19:00:58 Z
20200621T19:41:47Z
0:00:00

https://machinelearningmastery.com/datapreparationtechniquesformachinelearning/
https://machinelearningmastery.com/datapreparationtechniquesformachinelearning/
Tour of Data Preparation Techniques for Machine Learning (texttospeech Powered by Whysper)
Predictive modeling machine learning projects, such as classification and regression, always involve some form of data preparation. The specific data preparation required for a dataset depends on the specifics of the data, such as the variable types, as well as the algorithms that will be used to model them that may impose expectations or requirements […]
The post Tour of Data Preparation Techniques for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/datapreparationtechniquesformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 18 Jun 2020 19:00:15 Z
20200618T19:14:14Z
0:00:00

https://machinelearningmastery.com/whatisdatapreparationinmachinelearning/
https://machinelearningmastery.com/whatisdatapreparationinmachinelearning/
What Is Data Preparation in a Machine Learning Project (texttospeech Powered by Whysper)
Data preparation may be one of the most difficult steps in any machine learning project. The reason is that each dataset is different and highly specific to the project. Nevertheless, there are enough commonalities across predictive modeling projects that we can define a loose sequence of steps and subtasks that you are likely to perform. […]
The post What Is Data Preparation in a Machine Learning Project appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/whatisdatapreparationinmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 16 Jun 2020 19:00:47 Z
20200616T19:31:38Z
0:00:00

https://machinelearningmastery.com/datapreparationisimportant/
https://machinelearningmastery.com/datapreparationisimportant/
Why Data Preparation Is So Important in Machine Learning (texttospeech Powered by Whysper)
On a predictive modeling project, machine learning algorithms learn a mapping from input variables to a target variable. The most common form of predictive modeling project involves socalled structured data or tabular data. This is data as it looks in a spreadsheet or a matrix, with rows of examples and columns of features for each […]
The post Why Data Preparation Is So Important in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/datapreparationisimportant/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 14 Jun 2020 19:00:24 Z
20200614T19:20:14Z
0:00:00

https://machinelearningmastery.com/onehotencodingforcategoricaldata/
https://machinelearningmastery.com/onehotencodingforcategoricaldata/
Ordinal and OneHot Encodings for Categorical Data (texttospeech Powered by Whysper)
Machine learning models require all input and output variables to be numeric. This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model. The two most popular techniques are an Ordinal Encoding and a OneHot Encoding. In this tutorial, you will discover how […]
The post Ordinal and OneHot Encodings for Categorical Data appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/onehotencodingforcategoricaldata/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 11 Jun 2020 19:00:14 Z
20200611T19:29:01Z
0:00:00

https://machinelearningmastery.com/lineardiscriminantanalysisfordimensionalityreductioninpython/
https://machinelearningmastery.com/lineardiscriminantanalysisfordimensionalityreductioninpython/
Linear Discriminant Analysis for Dimensionality Reduction in Python (texttospeech Powered by Whysper)
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Linear Discriminant Analysis, or LDA for short, is a predictive modeling algorithm for multiclass classification. It can also […]
The post Linear Discriminant Analysis for Dimensionality Reduction in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/lineardiscriminantanalysisfordimensionalityreductioninpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 12 May 2020 19:00:59 Z
20200512T19:04:56Z
0:26:41

https://machinelearningmastery.com/singularvaluedecompositionfordimensionalityreductioninpython/
https://machinelearningmastery.com/singularvaluedecompositionfordimensionalityreductioninpython/
Singular Value Decomposition for Dimensionality Reduction in Python (texttospeech Powered by Whysper)
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Perhaps the more popular technique for dimensionality reduction in machine learning is Singular Value Decomposition, or SVD for […]
The post Singular Value Decomposition for Dimensionality Reduction in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/singularvaluedecompositionfordimensionalityreductioninpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 10 May 2020 19:00:50 Z
20200510T19:12:14Z
0:26:35

https://machinelearningmastery.com/principalcomponentsanalysisfordimensionalityreductioninpython/
https://machinelearningmastery.com/principalcomponentsanalysisfordimensionalityreductioninpython/
Principal Component Analysis for Dimensionality Reduction in Python (texttospeech Powered by Whysper)
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for […]
The post Principal Component Analysis for Dimensionality Reduction in Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/principalcomponentsanalysisfordimensionalityreductioninpython/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 07 May 2020 19:00:46 Z
20200507T19:21:35Z
0:26:25

https://machinelearningmastery.com/dimensionalityreductionformachinelearning/
https://machinelearningmastery.com/dimensionalityreductionformachinelearning/
Introduction to Dimensionality Reduction for Machine Learning (texttospeech Powered by Whysper)
The number of input variables or features for a dataset is referred to as its dimensionality. Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. Although on […]
The post Introduction to Dimensionality Reduction for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/dimensionalityreductionformachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 05 May 2020 19:00:25 Z
20200505T19:02:43Z
0:12:29

https://machinelearningmastery.com/degreesoffreedominmachinelearning/
https://machinelearningmastery.com/degreesoffreedominmachinelearning/
A Gentle Introduction to Degrees of Freedom in Machine Learning (texttospeech Powered by Whysper)
Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the […]
The post A Gentle Introduction to Degrees of Freedom in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/degreesoffreedominmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Thu, 23 Apr 2020 19:00:50 Z
20200423T19:05:02Z
0:13:23

https://machinelearningmastery.com/howtohandlebigplittlenpninmachinelearning/
https://machinelearningmastery.com/howtohandlebigplittlenpninmachinelearning/
How to Handle Bigp, Littlen (p >> n) in Machine Learning (texttospeech Powered by Whysper)
What if I have more Columns than Rows in my dataset? Machine learning datasets are often structured or tabular data comprised of rows and columns. The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many […]
The post How to Handle Bigp, Littlen (p >> n) in Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/howtohandlebigplittlenpninmachinelearning/ by Whysper  The Website to Podcast Converter  whysper.io]
Tue, 14 Apr 2020 19:00:03 Z
20200414T19:30:30Z
0:16:27

https://machinelearningmastery.com/onevsrestandonevsoneformulticlassclassification/
https://machinelearningmastery.com/onevsrestandonevsoneformulticlassclassification/
How to Use OnevsRest and OnevsOne for MultiClass Classification (texttospeech Powered by Whysper)
Not all classification predictive models support multiclass classification. Algorithms such as the Perceptron, Logistic Regression, and Support Vector Machines were designed for binary classification and do not natively support classification tasks with more than two classes. One approach for using binary classification algorithms for multiclassification problems is to split the multiclass classification dataset into multiple […]
The post How to Use OnevsRest and OnevsOne for MultiClass Classification appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/onevsrestandonevsoneformulticlassclassification/ by Whysper  The Website to Podcast Converter  whysper.io]
Sun, 12 Apr 2020 19:00:00 Z
20200414T19:26:35Z
0:17:32

https://machinelearningmastery.com/argmaxinmachinelearning/
https://machinelearningmastery.com/argmaxinmachinelearning/
What Is Argmax in Machine Learning?
Argmax is a mathematical function that you may encounter in applied machine learning. For example, you may see “argmax” or “arg max” used in a research paper used to describe an algorithm. You may also be instructed to use the argmax function in your algorithm implementation. This may be the first time that you encounter […]
The post What Is Argmax in Machine Learning? appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/argmaxinmachinelearning/ by Whysper  The website to podcast converter  whysper.io]
Thu, 02 Apr 2020 18:00:22 Z
20200402T18:18:29Z
0:12:31

https://machinelearningmastery.com/multioutputregressionmodelswithpython/
https://machinelearningmastery.com/multioutputregressionmodelswithpython/
How to Develop MultiOutput Regression Models with Python
Multioutput regression are regression problems that involve predicting two or more numerical values given an input example. An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multistep time series forecasting that involves predicting multiple future time series of a given variable. Many machine […]
The post How to Develop MultiOutput Regression Models with Python appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/multioutputregressionmodelswithpython/ by Whysper  The website to podcast converter  whysper.io]
Thu, 26 Mar 2020 18:00:06 Z
20200327T20:53:50Z
0:34:00

https://machinelearningmastery.com/distancemeasuresformachinelearning/
https://machinelearningmastery.com/distancemeasuresformachinelearning/
4 Distance Measures for Machine Learning
Distance measures play an important role in machine learning. They provide the foundation for many popular and effective machine learning algorithms like knearest neighbors for supervised learning and kmeans clustering for unsupervised learning. Different distance measures must be chosen and used depending on the types of the data. As such, it is important to know […]
The post 4 Distance Measures for Machine Learning appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/distancemeasuresformachinelearning/ by Whysper  The website to podcast converter  whysper.io]
Tue, 24 Mar 2020 18:00:55 Z
20200324T18:03:34Z
0:26:16

https://machinelearningmastery.com/basicdatacleaningformachinelearning/
https://machinelearningmastery.com/basicdatacleaningformachinelearning/
Basic Data Cleaning for Machine Learning (That You Must Perform)
Tweet Share Share Data cleaning is a critically important step in any machine learning project.
In tabular data, there are many different statistical analysis and data visualization techniques you can use to explore your data in order to identify data cleaning operations you may want to perform.
Before jumping to the sophisticated methods, there are some very basic data cleaning operations that you probably should perform on every single machine learning project. These are so basic that they are often overlooked by seasoned machine learning practitioners, yet are so critical that if skipped, models may break or report overly optimistic performance results.
In this tutorial, you will discover basic data cleaning you should always perform on your dataset.
After completing this tutorial, you will know: How to identify and remove column variables that only have a single value.
How to identify and consider column variables with very few unique values.
How to identify and remove rows that contain duplicate observations. Let’s get started.
Basic Data Cleaning You Must Perform in Machine LearningPhoto by Allen McGregor, some rights reserved.
Tutorial Overview
This tutorial is divided into five parts; they are: Identify Columns That Contain a Single Value
Delete Columns That Contain a Single Value
Consider Columns That Have Very Few Values
Identify Rows that Contain Duplicate Data
Delete Rows that Contain Duplicate Data Identify Columns That Contain a Single Value
Columns that have a single observation or value are probably useless for modeling.
Here, a single value means that each row for that column has the same value. For example, the column X1 has the value 1.0 for all rows in the dataset:X1
1.0
1.0
1.0
1.0
1.0
...Columns that have a single value for all rows do not contain any information for modeling.
Depending on the choice of data preparation and modeling algorithms, variables with a single value can also cause errors or unexpected results.
You can detect rows that have this property using the unique() NumPy function that will report the number of unique values in each column.
The example below loads the oilspill classification dataset that contains 50 variables and summarizes the number of unique values for each column.# summarize the number of unique values for each column using numpy
from urllib.request import urlopen
from numpy import loadtxt
from numpy import unique
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
data = loadtxt(urlopen(path), delimiter=',')
# summarize the number of unique values in each column
for i in range(data.shape[1]): print(i, len(unique(data[:, i])))Running the example loads the dataset directly from the URL and prints the number of unique values for each column.
We can see that column index 22 only has a single value and should be removed.0 238
1 297
2 927
3 933
4 179
5 375
6 820
7 618
8 561
9 57
10 577
11 59
12 73
13 107
14 53
15 91
16 893
17 810
18 170
19 53
20 68
21 9
22 1
23 92
24 9
25 8
26 9
27 308
28 447
29 392
30 107
31 42
32 4
33 45
34 141
35 110
36 3
37 758
38 9
39 9
40 388
41 220
42 644
43 649
44 499
45 2
46 937
47 169
48 286
49 2A simpler approach is to use the nunique() Pandas function that does the hard work for you.
Below is the same example using the Pandas function.# summarize the number of unique values for each column using numpy
from pandas import read_csv
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
df = read_csv(path, header=None)
# summarize the number of unique values in each column
print(df.nunique())Running the example, we get the same result, the column index, and the number of unique values for each column.0 238
1 297
2 927
3 933
4 179
5 375
6 820
7 618
8 561
9 57
10 577
11 59
12 73
13 107
14 53
15 91
16 893
17 810
18 170
19 53
20 68
21 9
22 1
23 92
24 9
25 8
26 9
27 308
28 447
29 392
30 107
31 42
32 4
33 45
34 141
35 110
36 3
37 758
38 9
39 9
40 388
41 220
42 644
43 649
44 499
45 2
46 937
47 169
48 286
49 2
dtype: int64
Delete Columns That Contain a Single Value
Variables or columns that have a single value should probably be removed from your dataset
Columns are relatively easy to remove from a NumPy array or Pandas DataFrame.
One approach is to record all columns that have a single unique value, then delete them from the Pandas DataFrame by calling the drop() function.
The complete example is listed below.# delete columns with a single unique value
from pandas import read_csv
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
df = read_csv(path, header=None)
print(df.shape)
# get number of unique values for each column
counts = df.nunique()
# record columns to delete
to_del = [i for i,v in enumerate(counts) if v == 1]
print(to_del)
# drop useless columns
df.drop(to_del, axis=1, inplace=True)
print(df.shape)Running the example first loads the dataset and reports the number of rows and columns.
The number of unique values for each column is calculated, and those columns that have a single unique value are identified. In this case, column index 22.
The identified columns are then removed from the DataFrame, and the number of rows and columns in the DataFrame are reported to confirm the change.(937, 50)
[22]
(937, 49)
Consider Columns That Have Very Few Values
In the previous section, we saw that some columns in the example dataset had very few unique values.
For example, there were columns that only had 2, 4, and 9 unique values. This might make sense for ordinal or categorical variables. In this case, the dataset only contains numerical variables. As such, only having 2, 4, or 9 unique numerical values in a column might be surprising.
These columns may or may not contribute to the skill of a model.
Depending on the choice of data preparation and modeling algorithms, variables with very few numerical values can also cause errors or unexpected results. For example, I have seen them cause errors when using power transforms for data preparation and when fitting linear models that assume a “sensible” data probability distribution.
To help highlight columns of this type, you can calculate the number of unique values for each variable as a percentage of the total number of rows in the dataset.
Let’s do this manually using NumPy. The complete example is listed below.# summarize the percentage of unique values for each column using numpy
from urllib.request import urlopen
from numpy import loadtxt
from numpy import unique
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
data = loadtxt(urlopen(path), delimiter=',')
# summarize the number of unique values in each column
for i in range(data.shape[1]): num = len(unique(data[:, i])) percentage = float(num) / data.shape[0] * 100 print('%d, %d, %.1f%%' % (i, num, percentage))Running the example reports the column index and the number of unique values for each column, followed by the percentage of unique values out of all rows in the dataset.
Here, we can see that some columns have a very low percentage of unique values, such as below 1 percent.0, 238, 25.4%
1, 297, 31.7%
2, 927, 98.9%
3, 933, 99.6%
4, 179, 19.1%
5, 375, 40.0%
6, 820, 87.5%
7, 618, 66.0%
8, 561, 59.9%
9, 57, 6.1%
10, 577, 61.6%
11, 59, 6.3%
12, 73, 7.8%
13, 107, 11.4%
14, 53, 5.7%
15, 91, 9.7%
16, 893, 95.3%
17, 810, 86.4%
18, 170, 18.1%
19, 53, 5.7%
20, 68, 7.3%
21, 9, 1.0%
22, 1, 0.1%
23, 92, 9.8%
24, 9, 1.0%
25, 8, 0.9%
26, 9, 1.0%
27, 308, 32.9%
28, 447, 47.7%
29, 392, 41.8%
30, 107, 11.4%
31, 42, 4.5%
32, 4, 0.4%
33, 45, 4.8%
34, 141, 15.0%
35, 110, 11.7%
36, 3, 0.3%
37, 758, 80.9%
38, 9, 1.0%
39, 9, 1.0%
40, 388, 41.4%
41, 220, 23.5%
42, 644, 68.7%
43, 649, 69.3%
44, 499, 53.3%
45, 2, 0.2%
46, 937, 100.0%
47, 169, 18.0%
48, 286, 30.5%
49, 2, 0.2%We can update the example to only summarize those variables that have unique values that are less than 1 percent of the number of rows.# summarize the percentage of unique values for each column using numpy
from urllib.request import urlopen
from numpy import loadtxt
from numpy import unique
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
data = loadtxt(urlopen(path), delimiter=',')
# summarize the number of unique values in each column
for i in range(data.shape[1]): num = len(unique(data[:, i])) percentage = float(num) / data.shape[0] * 100 if percentage Running the example, we can see that 11 of the 50 variables have numerical variables that have unique values that are less than 1 percent of the number of rows.
This does not mean that these rows and columns should be deleted, but they require further attention.
For example: Perhaps the unique values can be encoded as ordinal values?
Perhaps the unique values can be encoded as categorical values?
Perhaps compare model skill with each variable removed from the dataset? 21, 9, 1.0%
22, 1, 0.1%
24, 9, 1.0%
25, 8, 0.9%
26, 9, 1.0%
32, 4, 0.4%
36, 3, 0.3%
38, 9, 1.0%
39, 9, 1.0%
45, 2, 0.2%
49, 2, 0.2%For example, if we wanted to delete all 11 columns with unique values less than 1 percent of rows; the example below demonstrates this.# delete columns where number of unique values is less than 1% of the rows
from pandas import read_csv
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/oilspill.csv'
# load the dataset
df = read_csv(path, header=None)
print(df.shape)
# get number of unique values for each column
counts = df.nunique()
# record columns to delete
to_del = [i for i,v in enumerate(counts) if (float(v)/df.shape[0]*100) Running the example first loads the dataset and reports the number of rows and columns.
The number of unique values for each column is calculated, and those columns that have a number of unique values less than 1 percent of the rows are identified. In this case, 11 columns.
The identified columns are then removed from the DataFrame, and the number of rows and columns in the DataFrame are reported to confirm the change.(937, 50)
[21, 22, 24, 25, 26, 32, 36, 38, 39, 45, 49]
(937, 39)
Identify Rows That Contain Duplicate Data
Rows that have identical data are probably useless, if not dangerously misleading during model evaluation.
Here, a duplicate row is a row where each value in each column for that row appears in identically the same order (same column values) in another row.
From a probabilistic perspective, you can think of duplicate data as adjusting the priors for a class label or data distribution. This may help an algorithm like Naive Bayes if you wish to purposefully bias the priors. Typically, this is not the case and machine learning algorithms will perform better by identifying and removing rows with duplicate data.
From an algorithm evaluation perspective, duplicate rows will result in misleading performance. For example, if you are using a train/test split or kfold crossvalidation, then it is possible for a duplicate row or rows to appear in both train and test datasets and any evaluation of the model on these rows will be (or should be) correct. This will result in an optimistically biased estimate of performance on unseen data.
If you think this is not the case for your dataset or chosen model, design a controlled experiment to test it. This could be achieved by evaluating model skill with the raw dataset and the dataset with duplicates removed and comparing performance. Another experiment might involve augmenting the dataset with different numbers of randomly selected duplicate examples.
The pandas function duplicated() will report whether a given row is duplicated or not. All rows are marked as either False to indicate that it is not a duplicate or True to indicate that it is a duplicate. If there are duplicates, the first occurrence of the row is marked False (by default), as we might expect.
The example below checks for duplicates.# locate rows of duplicate data
from pandas import read_csv
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
# load the dataset
df = read_csv(path, header=None)
# calculate duplicates
dups = df.duplicated()
# report if there are any duplicates
print(dups.any())
# list all duplicate rows
print(df[dups])Running the example first loads the dataset, then calculates row duplicates.
First, the presence of any duplicate rows is reported, and in this case, we can see that there are duplicates (True).
Then all duplicate rows are reported. In this case, we can see that three duplicate rows that were identified are printed.True 0 1 2 3 4
34 4.9 3.1 1.5 0.1 Irissetosa
37 4.9 3.1 1.5 0.1 Irissetosa
142 5.8 2.7 5.1 1.9 Irisvirginica
Delete Rows That Contain Duplicate Data
Rows of duplicate data should probably be deleted from your dataset prior to modeling.
There are many ways to achieve this, although Pandas provides the drop_duplicates() function that achieves exactly this.
The example below demonstrates deleting duplicate rows from a dataset.# delete rows of duplicate data from the dataset
from pandas import read_csv
# define the location of the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
# load the dataset
df = read_csv(path, header=None)
print(df.shape)
# delete duplicate rows
df.drop_duplicates(inplace=True)
print(df.shape)Running the example first loads the dataset and reports the number of rows and columns.
Next, the rows of duplicated data are identified and removed from the DataFrame. Then the shape of the DataFrame is reported to confirm the change.(150, 5)
(147, 5)
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Tutorials How To Load Machine Learning Data in Python
Data Cleaning: Turn Messy Data into Tidy Data APIs numpy.unique API.
pandas.DataFrame.nunique API.
pandas.DataFrame.drop API.
pandas.DataFrame.duplicated API.
pandas.DataFrame.drop_duplicates API. Summary
In this tutorial, you discovered basic data cleaning you should always perform on your dataset.
Specifically, you learned: How to identify and remove column variables that only have a single value.
How to identify and consider column variables with very few unique values.
How to identify and remove rows that contain duplicate observations. Do you have any questions?
Ask your questions in the comments below and I will do my best to answer. Tweet Share Share The post Basic Data Cleaning for Machine Learning (That You Must Perform) appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/basicdatacleaningformachinelearning/ by Whysper  The website to podcast converter  whysper.io]
Thu, 19 Mar 2020 18:00:22 Z
20200323T04:33:35Z
0:32:00

https://machinelearningmastery.com/neuralnetworksarefunctionapproximators/
https://machinelearningmastery.com/neuralnetworksarefunctionapproximators/
Neural Networks are Function Approximation Algorithms
Tweet Share Share Supervised learning in machine learning can be described in terms of function approximation.
Given a dataset comprised of inputs and outputs, we assume that there is an unknown underlying function that is consistent in mapping inputs to outputs in the target domain and resulted in the dataset. We then use supervised learning algorithms to approximate this function.
Neural networks are an example of a supervised machine learning algorithm that is perhaps best understood in the context of function approximation. This can be demonstrated with examples of neural networks approximating simple onedimensional functions that aid in developing the intuition for what is being learned by the model.
In this tutorial, you will discover the intuition behind neural networks as function approximation algorithms.
After completing this tutorial, you will know: Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
How to develop and evaluate a small neural network for function approximation. Let’s get started.
Neural Networks are Function Approximation AlgorithmsPhoto by daveynin, some rights reserved.
Tutorial Overview
This tutorial is divided into three parts; they are: What Is Function Approximation
Definition of a Simple Function
Approximating a Simple Function What Is Function Approximation
Function approximation is a technique for estimating an unknown underlying function using historical or available observations from the domain.
Artificial neural networks learn to approximate a function.
In supervised learning, a dataset is comprised of inputs and outputs, and the supervised learning algorithm learns how to best map examples of inputs to examples of outputs.
We can think of this mapping as being governed by a mathematical function, called the mapping function, and it is this function that a supervised learning algorithm seeks to best approximate.
Neural networks are an example of a supervised learning algorithm and seek to approximate the function represented by your data. This is achieved by calculating the error between the predicted outputs and the expected outputs and minimizing this error during the training process.
It is best to think of feedforward networks as function approximation machines that are designed to achieve statistical generalization, occasionally drawing some insights from what we know about the brain, rather than as models of brain function.
— Page 169, Deep Learning, 2016.
We say “approximate” because although we suspect such a mapping function exists, we don’t know anything about it.
The true function that maps inputs to outputs is unknown and is often referred to as the target function. It is the target of the learning process, the function we are trying to approximate using only the data that is available. If we knew the target function, we would not need to approximate it, i.e. we would not need a supervised machine learning algorithm. Therefore, function approximation is only a useful tool when the underlying target mapping function is unknown.
All we have are observations from the domain that contain examples of inputs and outputs. This implies things about the size and quality of the data; for example: The more examples we have, the more we might be able to figure out about the mapping function.
The less noise we have in observations, the more crisp approximation we can make of the mapping function. So why do we like using neural networks for function approximation?
The reason is that they are a universal approximator. In theory, they can be used to approximate any function.
… the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any […] function from one finitedimensional space to another with any desired nonzero amount of error, provided that the network is given enough hidden units
— Page 198, Deep Learning, 2016.
Regression predictive modeling involves predicting a numerical quantity given inputs. Classification predictive modeling involves predicting a class label given inputs.
Both of these predictive modeling problems can be seen as examples of function approximation.
To make this concrete, we can review a worked example.
In the next section, let’s define a simple function that we can later approximate.
Definition of a Simple Function
We can define a simple function with one numerical input variable and one numerical output variable and use this as the basis for understanding neural networks for function approximation.
We can define a domain of numbers as our input, such as floatingpoint values from 50 to 50.
We can then select a mathematical operation to apply to the inputs to get the output values. The selected mathematical operation will be the mapping function, and because we are choosing it, we will know what it is. In practice, this is not the case and is the reason why we would use a supervised learning algorithm like a neural network to learn or discover the mapping function.
In this case, we will use the square of the input as the mapping function, defined as: y = x^2 Where y is the output variable and x is the input variable.
We can develop an intuition for this mapping function by enumerating the values in the range of our input variable and calculating the output value for each input and plotting the result.
The example below implements this in Python.# example of creating a univariate dataset with a given mapping function
from matplotlib import pyplot
# define the input data
x = [i for i in range(50,51)]
# define the output data
y = [i**2.0 for i in x]
# plot the input versus the output
pyplot.scatter(x,y)
pyplot.title('Input (x) versus Output (y)')
pyplot.xlabel('Input Variable (x)')
pyplot.ylabel('Output Variable (y)')
pyplot.show()Running the example first creates a list of integer values across the entire input domain.
The output values are then calculated using the mapping function, then a plot is created with the input values on the xaxis and the output values on the yaxis.
Scatter Plot of Input and Output Values for the Chosen Mapping Function
The input and output variables represent our dataset.
Next, we can then pretend to forget that we know what the mapping function is and use a neural network to relearn or rediscover the mapping function.
Approximating a Simple Function
We can fit a neural network model on examples of inputs and outputs and see if the model can learn the mapping function.
This is a very simple mapping function, so we would expect a small neural network could learn it quickly.
We will define the network using the Keras deep learning library and use some data preparation tools from the scikitlearn library.
First, let’s define the dataset....
# define the dataset
x = asarray([i for i in range(50,51)])
y = asarray([i**2.0 for i in x])
print(x.min(), x.max(), y.min(), y.max())Next, we can reshape the data so that the input and output variables are columns with one observation per row, as is expected when using supervised learning models....
# reshape arrays into into rows and cols
x = x.reshape((len(x), 1))
y = y.reshape((len(y), 1))Next, we will need to scale the inputs and the outputs.
The inputs will have a range between 50 and 50, whereas the outputs will have a range between 50^2 (2500) and 0^2 (0). Large input and output values can make training neural networks unstable, therefore, it is a good idea to scale data first.
We can use the MinMaxScaler to separately normalize the input values and the output values to values in the range between 0 and 1....
# separately scale the input and output variables
scale_x = MinMaxScaler()
x = scale_x.fit_transform(x)
scale_y = MinMaxScaler()
y = scale_y.fit_transform(y)
print(x.min(), x.max(), y.min(), y.max())We can now define a neural network model.
With some trial and error, I chose a model with two hidden layers and 10 nodes in each layer. Perhaps experiment with other configurations to see if you can do better....
# design the neural network model
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1))We will fit the model using a mean squared loss and use the efficient adam version of stochastic gradient descent to optimize the model.
This means the model will seek to minimize the mean squared error between the predictions made and the expected output values (y) while it tries to approximate the mapping function....
# define the loss function and optimization algorithm
model.compile(loss='mse', optimizer='adam')We don’t have a lot of data (e.g. about 100 rows), so we will fit the model for 500 epochs and use a small batch size of 10.
Again, these values were found after a little trial and error; try different values and see if you can do better....
# ft the model on the training dataset
model.fit(x, y, epochs=500, batch_size=10, verbose=0)Once fit, we can evaluate the model.
We will make a prediction for each example in the dataset and calculate the error. A perfect approximation would be 0.0. This is not possible in general because of noise in the observations, incomplete data, and complexity of the unknown underlying mapping function.
In this case, it is possible because we have all observations, there is no noise in the data, and the underlying function is not complex.
First, we can make the prediction....
# make predictions for the input data
yhat = model.predict(x)We then must invert the scaling that we performed.
This is so the error is reported in the original units of the target variable....
# inverse transforms
x_plot = scale_x.inverse_transform(x)
y_plot = scale_y.inverse_transform(y)
yhat_plot = scale_y.inverse_transform(yhat)We can then calculate and report the prediction error in the original units of the target variable....
# report model error
print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))Finally, we can create a scatter plot of the real mapping of inputs to outputs and compare it to the mapping of inputs to the predicted outputs and see what the approximation of the mapping function looks like spatially.
This is helpful for developing the intuition behind what neural networks are learning....
# plot x vs yhat
pyplot.scatter(x_plot,yhat_plot, label='Predicted')
pyplot.title('Input (x) versus Output (y)')
pyplot.xlabel('Input Variable (x)')
pyplot.ylabel('Output Variable (y)')
pyplot.legend()
pyplot.show()Tying this together, the complete example is listed below.# example of fitting a neural net on x vs x^2
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from numpy import asarray
from matplotlib import pyplot
# define the dataset
x = asarray([i for i in range(50,51)])
y = asarray([i**2.0 for i in x])
print(x.min(), x.max(), y.min(), y.max())
# reshape arrays into into rows and cols
x = x.reshape((len(x), 1))
y = y.reshape((len(y), 1))
# separately scale the input and output variables
scale_x = MinMaxScaler()
x = scale_x.fit_transform(x)
scale_y = MinMaxScaler()
y = scale_y.fit_transform(y)
print(x.min(), x.max(), y.min(), y.max())
# design the neural network model
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1))
# define the loss function and optimization algorithm
model.compile(loss='mse', optimizer='adam')
# ft the model on the training dataset
model.fit(x, y, epochs=500, batch_size=10, verbose=0)
# make predictions for the input data
yhat = model.predict(x)
# inverse transforms
x_plot = scale_x.inverse_transform(x)
y_plot = scale_y.inverse_transform(y)
yhat_plot = scale_y.inverse_transform(yhat)
# report model error
print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))
# plot x vs y
pyplot.scatter(x_plot,y_plot, label='Actual')
# plot x vs yhat
pyplot.scatter(x_plot,yhat_plot, label='Predicted')
pyplot.title('Input (x) versus Output (y)')
pyplot.xlabel('Input Variable (x)')
pyplot.ylabel('Output Variable (y)')
pyplot.legend()
pyplot.show()Running the example first reports the range of values for the input and output variables, then the range of the same variables after scaling. This confirms that the scaling operation was performed as we expected.
The model is then fit and evaluated on the dataset.
Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.
In this case, we can see that the mean squared error is about 1,300, in squared units. If we calculate the square root, this gives us the root mean squared error (RMSE) in the original units. We can see that the average error is about 36 units, which is fine, but not great.
What results did you get? Can you do better?
Let me know in the comments below.50 50 0.0 2500.0
0.0 1.0 0.0 1.0
MSE: 1300.776A scatter plot is then created comparing the inputs versus the real outputs, and the inputs versus the predicted outputs.
The difference between these two data series is the error in the approximation of the mapping function. We can see that the approximation is reasonable; it captures the general shape. We can see that there are errors, especially around the 0 input values.
This suggests that there is plenty of room for improvement, such as using a different activation function or different network architecture to better approximate the mapping function.
Scatter Plot of Input vs. Actual and Predicted Values for the Neural Net Approximation
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Tutorials Your First Deep Learning Project in Python with Keras StepByStep Books Deep Learning, 2016. Articles Function approximation, Wikipedia. Summary
In this tutorial, you discovered the intuition behind neural networks as function approximation algorithms.
Specifically, you learned: Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
How to develop and evaluate a small neural network for function approximation. Do you have any questions?
Ask your questions in the comments below and I will do my best to answer. Tweet Share Share The post Neural Networks are Function Approximation Algorithms appeared first on Machine Learning Mastery.
[Podcast episode created from https://machinelearningmastery.com/neuralnetworksarefunctionapproximators/ by Whysper  The website to podcast converter  whysper.io]
Tue, 17 Mar 2020 18:00:19 Z
20200323T04:30:57Z
0:27:13