Reward is enough - Journal of Artificial Intelligence by remlaps-lite

hive-160342 · @remlaps-lite · Jun 14 '21

$0.38

Reward is enough - Journal of Artificial Intelligence

<div class=pull-right>

[![](https://ars.els-cdn.com/content/image/1-s2.0-S0004370221X00057-cov150h.gif)](https://www.sciencedirect.com/science/article/pii/S0004370221000862)

</div>

<h6><sup>( May 24, 2021; <i>Journal of Artificial Intelligence</i> )</sup></h6>

<blockquote>

<b>Abstract</b>

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

</blockquote>

Read the rest from <i>Journal of Artificial Intelligence</i>: [Reward is enough](https://www.sciencedirect.com/science/article/pii/S0004370221000862)

- [PDF](https://www.sciencedirect.com/science/article/pii/S0004370221000862/pdfft?md5=12802032b840c6cc044e57c3a5aaa7c3&pid=1-s2.0-S0004370221000862-main.pdf)

---

-h/t [Communications of the ACM](https://cacm.acm.org/opinion/articles/253154-reward-is-enough-for-generalized-ai/fulltext)

👍 kiwi-crypto, whatsup, cub1, remlaps1, penny4thoughts, famigliacurione, cmp2020, ruzmaira, remlaps, simonjay, primevaldad, rpalmer13, esouthern3, jmor, remlaps-lite, lisa.palmer, cub2, astronomyizfun, remlaps2, cmp2020-lite, ricardo306
👎 dufbes, carslep, asam1267

properties (23)vote details (24)

voter	rshares	pct
remlaps	27,934,778,350	25%
cmp2020	69,006,806,292	25%
primevaldad	13,769,637,502	100%
whatsup	464,653,062,776	100%
remlaps1	131,315,138,231	25%
cub1	286,412,087,765	25%
simonjay	14,939,557,715	2%
remlaps2	1,712,673,552	25%
lisa.palmer	2,218,183,650	25%
cub2	1,971,382,147	25%
astronomyizfun	1,731,277,113	25%
rpalmer13	7,051,042,189	25%
ricardo306	178,905,645	15.87%
famigliacurione	75,028,984,712	25%
cmp2020-lite	187,221,372	25%
remlaps-lite	2,257,747,649	25%
jmor	3,611,358,215	25%
kiwi-crypto	655,545,165,206	100%
asam1267	0	-10%
carslep	0	-10%
dufbes	0	-10%
penny4thoughts	94,238,682,802	100%
esouthern3	5,604,394,078	25%
ruzmaira	37,549,849,767	100%

@ruzmaira · Jun 15 '21

$0.33

Umm .. Interesting articulates they are practically looking for a way for artificial intelligence to have its own thoughts that can manipulate any object, remember where it has left something, know how to choose between good and bad.

Having facial expressions depending on how you feel Umm I think these would be a double-edged sword in the future as we have talked about before.

Do you think that an artificial intelligence can develop some kind of sensation through reward stimulation?

👍 remlaps1, remlaps, famigliacurione, remlaps-lite, remlaps2

`post_id`	91,964,162
`author`	ruzmaira
`permlink`	quq4t6
`category`	hive-160342
`json_metadata`	{"app":"steemit\/0.2"}
`created`	2021-06-15 03:39:09
`last_update`	2021-06-15 03:39:09
`depth`	1
`children`	1
`net_rshares`	871,718,828,754
`last_payout`	2021-06-22 03:39:09
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.164 SBD
`curator_payout_value`	0.163 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	489
`author_reputation`	17,069,555,901,365
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
remlaps	139,639,195,215	100%
remlaps1	638,279,038,034	100%
remlaps2	8,525,463,277	100%
famigliacurione	74,330,986,176	25%
remlaps-lite	10,944,146,052	100%

@remlaps · Jun 16 '21

> Do you think that an artificial intelligence can develop some kind of sensation through reward stimulation?

Yeah, I do think so.  I remember reading about [this](https://ai.facebook.com/blog/near-perfect-point-goal-navigation-from-25-billion-frames-of-experience/) last year.

> The AI community has a long-term goal of building intelligent machines that interact effectively with the physical world, and a key challenge is teaching these systems to navigate through complex, unfamiliar real-world environments to reach a specified destination — without a preprovided map. We are announcing today that Facebook AI has created a new large-scale distributed reinforcement learning (RL) algorithm called DD-PPO, which has effectively solved the task of point-goal navigation using only an RGB-D camera, GPS, and compass data. Agents trained with DD-PPO (which stands for decentralized distributed proximal policy optimization) achieve nearly 100 percent success in a variety of virtual environments, such as houses and office buildings. We have also successfully tested our model with tasks in real-world physical settings using a LoCoBot and Facebook AI’s <A HREF="https://ai.facebook.com/blog/open-sourcing-pyrobot-to-accelerate-ai-robotics-research/">PyRobot platform</A>.

When they talk about "<i>reinforcement learning</i>", that's a reward-based learning model.

👍 ruzmaira

`post_id`	92,011,176
`author`	remlaps
`permlink`	qutjc8
`category`	hive-160342
`json_metadata`	{"links":["https:\/\/ai.facebook.com\/blog\/near-perfect-point-goal-navigation-from-25-billion-frames-of-experience\/"],"app":"steemit\/0.2"}
`created`	2021-06-16 23:45:48
`last_update`	2021-06-16 23:45:48
`depth`	2
`children`	0
`net_rshares`	39,276,143,041
`last_payout`	2021-06-23 23:45:48
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 SBD
`curator_payout_value`	0.000 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	1,368
`author_reputation`	284,737,353,688,347
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000
`author_curate_reward`	""

properties (23)vote details (1)

voter	weight	wgt%	rshares	pct	time
ruzmaira	0 B		39,276,143,041	100%

@tanveer741 · Jun 15 '21

$0.27

To understand the research paper, I found this YouTube video by Yannic Kilcher, a Machine Learning expert.

https://www.youtube.com/watch?v=dmH1ZpcROMk

👍 remlaps1, remlaps, primevaldad, remlaps-lite, remlaps2

`post_id`	91,970,602
`author`	tanveer741
`permlink`	quqmy3
`category`	hive-160342
`json_metadata`	{"image":["https:\/\/img.youtube.com\/vi\/dmH1ZpcROMk\/0.jpg"],"links":["https:\/\/www.youtube.com\/watch?v=dmH1ZpcROMk"],"app":"steemit\/0.2"}
`created`	2021-06-15 10:10:54
`last_update`	2021-06-15 10:10:54
`depth`	1
`children`	1
`net_rshares`	747,287,173,039
`last_payout`	2021-06-22 10:10:54
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.136 SBD
`curator_payout_value`	0.136 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	151
`author_reputation`	57,987,359,510,083
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
remlaps	125,244,063,766	100%
primevaldad	15,045,714,092	100%
remlaps1	588,958,436,673	100%
remlaps2	7,856,367,730	100%
remlaps-lite	10,182,590,778	100%

@remlaps-lite · Jun 16 '21

Very cool video!  So far, I have only had time to skim the article, but I hope to read it later.  Meanwhile, I am listening to the video now.  He doesn't just explain it, but also presents some counterarguments and criticisms. Thank you very much.

properties (22)

`post_id`	92,011,090
`author`	remlaps-lite
`permlink`	qutj19
`category`	hive-160342
`json_metadata`	{"app":"steemit\/0.2"}
`created`	2021-06-16 23:39:09
`last_update`	2021-06-16 23:39:09
`depth`	2
`children`	0
`net_rshares`	0
`last_payout`	2021-06-23 23:39:09
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 SBD
`curator_payout_value`	0.000 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	247
`author_reputation`	538,407,512,576,073
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000

@primevaldad · Jun 16 '21

$0.22

> Sophisticated abilities may arise from the maximisation of simple rewards in complex environments. 

There's a [You're Wrong About](https://www.buzzsprout.com/1112270/4446851-koko-the-gorilla) podcast episode that essentially makes the case that in order to communicate with sign language, Koko the gorilla had simply learned gestures for rewards with a more sophisticated framework than what we typically see in research. Obviously, that's an oversimplification, but there's a wonderful debate about the whole thing. 

>According to our hypothesis, the ability of language in its full richness, including all of these broader abilities, arises from the pursuit of reward. It is an instance of an agent's ability to produce complex sequences of actions (e.g. uttering sentences) based on complex sequences of observations (e.g. receiving sentences) in order to influence other agents in the environment (cf. discussion of social intelligence above) and accumulate greater reward [7].

If reward is enough, and seeking reward is a singular universal mechanism for the development of intelligence, it would seem that either side of the Koko argument is moot. The "Koko was only responding to rewards" camp is in fact just echoing the sentiment that Koko is demonstrating general intelligence.  Therefore, that cannot by itself stand as an argument that Koko had not demonstrated general intelligence. In contrast, arguing that Koko did demonstrate a high level of intelligence would simply be reiterating the counter argument that Koko manifested sophisticated abilities through the maximization of rewards in complex environments.

👍 remlaps1, remlaps, remlaps-lite, remlaps2

`post_id`	92,008,542
`author`	primevaldad
`permlink`	qutbct
`category`	hive-160342
`json_metadata`	{"links":["https:\/\/www.buzzsprout.com\/1112270\/4446851-koko-the-gorilla"],"app":"steemit\/0.2"}
`created`	2021-06-16 20:53:18
`last_update`	2021-06-16 20:53:18
`depth`	1
`children`	1
`net_rshares`	813,569,196,899
`last_payout`	2021-06-23 20:53:18
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.112 SBD
`curator_payout_value`	0.112 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	1,631
`author_reputation`	7,281,522,942,561
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000
`author_curate_reward`	""

properties (23)vote details (4)

voter	rshares	pct
remlaps	142,477,125,849	100%
remlaps1	651,225,258,177	100%
remlaps2	8,699,474,686	100%
remlaps-lite	11,167,338,187	100%

@remlaps · Jun 16 '21

> If reward is enough, and seeking reward is a singular universal mechanism for the development of intelligence, it would seem that either side of the Koko argument is moot.

Very interesting point.  And this mirrors the question of free will and whether human intelligence is really anything more than just a biological form of computation -- i.e. <A HREF="https://youtu.be/C5DfnIjZPGw">Chalmers' <i>hard problem of consciousness</i></A>.

👍 primevaldad

`post_id`	92,011,284
`author`	remlaps
`permlink`	qutjo6
`category`	hive-160342
`json_metadata`	{"app":"steemit\/0.2"}
`created`	2021-06-16 23:52:54
`last_update`	2021-06-16 23:52:54
`depth`	2
`children`	0
`net_rshares`	16,609,737,144
`last_payout`	2021-06-23 23:52:54
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 SBD
`curator_payout_value`	0.000 SBD
`pending_payout_value`	0.000 SBD
`promoted`	0.000 SBD
`body_length`	439
`author_reputation`	284,737,353,688,347
`root_title`	"Reward is enough - Journal of Artificial Intelligence"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 SBD
`percent_steem_dollars`	10,000
`author_curate_reward`	""

properties (23)vote details (1)

voter	weight	wgt%	rshares	pct	time
primevaldad	0 B		16,609,737,144	100%