create account

Machine Learning with Scikit-Learn - [Part 44] by cristi

View this thread on steemit.com
· @cristi ·
$12.80
Machine Learning with Scikit-Learn - [Part 44]
In  this tutorial we're going to discuss and code another method of automatic feature selection in scikit-learn, which is model based selection. 

According to the textbook we are following, model based selection uses a supervised model to compute the importance that each feature carries. After making the selection, it will only keep the most importance features. 

Since it needs something to determine the importance of each feature, this means that the algorithm used has to be able to do that. The algorithm has to have one or more methods to determine feature importance. And in scikit-learn, we know that two of these models are Decision Trees and ensembles of trees, like Random Forests. 

In this tutorial we're going to use a RandomForestClassifier for our model based selection example. The algorithm in scikit-learn for model based selection is SelectFromModel and the parameters it requires include:

- the algorithm to determine the importance (in this case RandomForestClassifier)
- parameters for the classifier (n_estimators, etc)
- and a threshold - to make the selection - in this case 'median'

Once we have it, we fit it onto the data and then we apply it onto our training set. We then look at both the original training set and the training set after we applied the select method. We will ultimately do some visualization and then train an algorithm on both sets to be able to compare their performances. 

The trained algorithm on the data with the select method applied has a better performance than the one trained on the original dataset. Please see the full video to have a comprehensive understanding of this:

<center><iframe width="560" height="315" src="https://www.youtube.com/embed/VvJcmxnAmxA" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe></center>
___
### <center>To stay in touch with me, follow @cristi</center>   
___

[Cristi Vlad](http://cristivlad.com) Self-Experimenter and Author
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
properties (23)
post_id28,496,749
authorcristi
permlinkmachine-learning-with-scikit-learn-part-44
categorymachine-learning
json_metadata"{"format": "markdown", "links": ["https://www.youtube.com/embed/VvJcmxnAmxA", "http://cristivlad.com"], "app": "steemit/0.1", "tags": ["machine-learning", "python", "programming", "science"], "users": ["cristi"], "image": ["https://img.youtube.com/vi/VvJcmxnAmxA/0.jpg"]}"
created2018-01-27 16:19:09
last_update2018-01-27 16:19:09
depth0
children5
net_rshares1,490,098,837,348
last_payout2018-02-03 16:19:09
cashout_time1969-12-31 23:59:59
total_payout_value11.418 SBD
curator_payout_value1.378 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length1,954
author_reputation128,167,460,043,555
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (30)
@dani74 ·
Nice post
👍  
properties (23)
post_id28,496,984
authordani74
permlinkre-cristi-machine-learning-with-scikit-learn-part-44-20180127t162004350z
categorymachine-learning
json_metadata"{"app": "steemit/0.1", "tags": ["machine-learning"]}"
created2018-01-27 16:20:24
last_update2018-01-27 16:20:24
depth1
children0
net_rshares295,171,788
last_payout2018-02-03 16:20:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length9
author_reputation80,044,838,613
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@khanaasim ·
$0.17
noise is selected by the selection function and is preferred over the real ones. why is it so?
👍  
properties (23)
post_id28,497,590
authorkhanaasim
permlinkre-cristi-machine-learning-with-scikit-learn-part-44-20180127t162339171z
categorymachine-learning
json_metadata"{"app": "steemit/0.1", "tags": ["machine-learning"]}"
created2018-01-27 16:23:42
last_update2018-01-27 16:23:42
depth1
children2
net_rshares19,956,582,238
last_payout2018-02-03 16:23:42
cashout_time1969-12-31 23:59:59
total_payout_value0.150 SBD
curator_payout_value0.018 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length94
author_reputation198,507,892,119
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@cristi ·
I just answered this question on the video. It seems that some noise features have more importance over some of the original ones. Basically, some of the original features may be completely irrelevant to the training of the algorithm...
👍  
properties (23)
post_id28,500,440
authorcristi
permlinkre-khanaasim-re-cristi-machine-learning-with-scikit-learn-part-44-20180127t164105243z
categorymachine-learning
json_metadata"{"app": "steemit/0.1", "tags": ["machine-learning"]}"
created2018-01-27 16:38:36
last_update2018-01-27 16:38:36
depth2
children1
net_rshares592,454,675
last_payout2018-02-03 16:38:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length236
author_reputation128,167,460,043,555
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@khanaasim ·
may be it is so but confusion still persists
properties (22)
post_id28,532,956
authorkhanaasim
permlinkre-cristi-re-khanaasim-re-cristi-machine-learning-with-scikit-learn-part-44-20180127t194500682z
categorymachine-learning
json_metadata"{"app": "steemit/0.1", "tags": ["machine-learning"]}"
created2018-01-27 19:45:03
last_update2018-01-27 19:45:03
depth3
children0
net_rshares0
last_payout2018-02-03 19:45:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length44
author_reputation198,507,892,119
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@hernanjosegb ·
Good tutorial my friend, very simple to explain, thank you very much and greetings my brother, good content in Steemit!
properties (22)
post_id28,498,646
authorhernanjosegb
permlinkre-cristi-machine-learning-with-scikit-learn-part-44-20180127t163140024z
categorymachine-learning
json_metadata"{"app": "steemit/0.1", "tags": ["machine-learning"]}"
created2018-01-27 16:29:03
last_update2018-01-27 16:29:03
depth1
children0
net_rshares0
last_payout2018-02-03 16:29:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length119
author_reputation199,526,231,496
root_title"Machine Learning with Scikit-Learn - [Part 44]"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000