DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half) by manfredcml

View this thread on steempeak.com
· @manfredcml · (edited)
$1.57
DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)
Recently I've been thinking of playing with the data related to the posts on Steemit under the category **"Science"** to see if there're any interesting insights. I’ve done some simple work and summarized my findings by visualizing the results in this post. I hope this article can be helpful to you in any way or can provide little information to initiate discussions on how to make the science community grow.

![Betterment_DataVisualization.jpg](https://steemitimages.com/DQmcYneF4sFwR6PPbxSuM8ZFUry4KhxDASut8x2b9EwuCoG/Betterment_DataVisualization.jpg)

------
### <center> Data source </center>
All the data were obtained using [SteemData](https://steemdata.com/) through Python. Thanks @furion for creating such a great application!

![steemdata.png](https://steemitimages.com/DQmV6sTec1sARX1JrUJzD7g2shrFivBBSrvWpNaStAS79aY/steemdata.png)

------
### <center> Coverage </center>
I extracted all the posts under the category **“Science”** created between 1 Jan 2017 and 30 Jun 2017, amounting to a total of 4,601 posts. Only the posts created were counted towards the statistics below and all the replies to those posts were not included in my work.

For statistics related to payouts, only the payouts in SBD to the **authors** were included. If the rewards haven’t been paid out as at the moment when I was preparing the charts, the pending payouts were used as a proxy for calculation.

------
### <center> Let’s visualize! </center>

#### How many science posts are there?

The number of articles created under the category “Science” per month didn’t exceed 500 in the first 4 months this year. Yet, there were more than 750 science articles in May and the number even surged to approximately 2,100 in June.

I also looked at the payouts for each month in the past 6 months. Surprisingly, the **average payout** for each post was around 15 to 16 SBD in May and June. However, such averages can be easily distorted by several outliers, who were those authors receiving consistently high rewards. For this reason, I also plotted the median payoffs in the red line, which shows that the median payoff for a science article over each of the preceding 6 months was, unfortunately, close to 0 dollar.

![Figure_1.png](https://steemitimages.com/DQmfJxUgfH2Q2VV7KbeCQjTFHJwdLXbx3ZEqGDXHcZK3H2C/Figure_1.png)

#### Who are the top authors?

I analyzed the payouts to the authors who ever wrote posts under the category “Science” in the first half of 2017. The following is a simple box plot which shows the distribution of payouts to the top 10 authors with the **highest average payoffs** over all the science articles posted from January to June. For friends who aren’t familiar with box plots, you may refer to [here]( https://en.wikipedia.org/wiki/Box_plot). In simple words, the longer a box, the more disperse the distribution of payout to the corresponding author is. Moreover, I also plotted the average payouts to those authors in the red line below for easy reference.

![Figure_2.png](https://steemitimages.com/DQmNVQECNe1DDmFNY3iW1o1H1bytuRsPLQdrEPLdKwfXzgh/Figure_2.png)

#### When do Steemians often post science articles?

The distribution of the total number of posts by hours is shown in the histogram below. No surprise that there were more posts created during 3 pm to 8 pm than any other time during the study period.

![Figure_3.png](https://steemitimages.com/DQmeRhqqrFjDHuqdXhwS6arXVTXFUJdgWD1CzNZ8bYudC41/Figure_3.png)

#### Do posting at certain hours earn more?

Is it possible to earn a higher potential payout by posting a story at a specific time? Let’s have a look at the chart below! It seems like the science articles created between 7 pm and 8 pm received the highest rewards on average. Yet, I also plotted the median payouts in the blue line as a warm reminder that average payouts are easily distorted by extreme values.

![Figure_4.png](https://steemitimages.com/DQmc4Kamc26gNXgd8i1TheagyaeUVJCXqND34XKE7Lf8oYv/Figure_4.png)

#### What are the most popular tags?

What are the commonly used tags for science articles? To get the answer, I counted the occurrence of each tag for all articles with “science” as the main tag over the past half-year and the top 10 tags are presented in the bar chart below. You may want to notice that 2 or more tags shown in the following chart may be found in a single article (e.g. “technology” and “news” can both be used in an article about tech news) and such duplication was not specially handled. As we can see, in terms of popularity, “life” won the race, followed by “news”, “technology”, “space” and so on. As for the average payout, science posts with tag “steemstem” had the highest average payout despite its least count among the other 9 tags.

![Figure_5.png](https://steemitimages.com/DQmfLdSPzzAKAJH51N6G2p13R6aoJcnypmZeyv98KCZGRjo/Figure_5.png)

#### The fancy word clouds!

I also hoped to see the most popular words appeared in the **titles** of science articles. Therefore I scraped the titles of all science articles published in the previous 6 months and created the following word cloud with the help of the Python module [“wordcloud”](https://github.com/amueller/word_cloud). Apart from the general terms like “science” and “scientist”, it’s interesting to see jargons related to astronomy also had high popularity!
![Figure_6.png](https://steemitimages.com/DQmdYrR56vh8rSAC8Nd9HQ1r2THB1Zh1cpQuRq4aeMPD79M/Figure_6.png)

We’ve seen a word cloud for titles. How about the tags? I also took all tags to create a word cloud below to show a number of tags appeared frequently in science articles!
![Figure_7.png](https://steemitimages.com/DQmeymXRnjySPZDLdBTSeVWLVtbUtbB2zEPjnNGrbUttXfc/Figure_7.png)

#### Can we predict the tag given the title only? --- A more advanced topic

Having done several exploratory analyses, I started to think whether it’s possible to give a suitable tag for an article given its **title** only using some sort of machine learning algorithms (ads time: if you’re interested in machine learning, you can have a look at my previous articles [here]( https://steemit.com/technology/@manfredcml/do-you-have-to-be-a-genius-to-work-on-machine-learning) and [here](https://steemit.com/technology/@manfredcml/kicking-off-your-first-machine-learning-project-is-easier-than-you-thought)).

I’m not going to do any machine learning but rather I would like to use some tricks to “plot the titles” of articles under different categories on a graph (for technical buddies, basically I used vector representations using a pre-trained model based on Google News Corpus and reduced the dimension using t-SNE embedding). If the titles of articles in the same category are similar in some sense, then they should form clusters on the graph. To see this, I extracted all articles created last month under 5 distinct categories, namely “science”, “art”, “food”, “politics” and “love”. The graph is shown below and there’re really some clusters in several regions (e.g. the red points and green points are concentrated in some areas).

![Figure_8.png](https://steemitimages.com/DQmcoog7kdwGne8MR3WajwRCAQWWLBw3KTqSia4Jr1RaoQc/Figure_8.png)

What if I’m not interested in the main category and instead want to give precise **tags** for a science article? With the use of the same methodology (this time for **science** articles ONLY), I randomly selected 5 tags and plotted the graph below to see if there would be any patterns. It seems that only the points corresponding to “astronomy” demonstrated some interesting patterns and so assigning accurate tags for a science article by looking at the title only without going through the contents seems to be a difficult task.

![Figure_9.png](https://steemitimages.com/DQmdNA5R6LxeGMWgUcKFDBzTLaUJ7r9Ry74T9KTmxZ6oe7h/Figure_9.png)

---------

#### <center> Let the data tell stories! </center>

The Steem blockchain contains tons of valuable information which is useful in many ways. The above analysis is just a simple one and a lot more is yet to be explored by you!

---------
If you like my posts, please upvote, resteem and follow me @manfredcml!

如果喜歡我的文章,可以upvote, resteem或follow @manfredcml支持!

Recent articles 近期文章
[Does intuition matter in solving problems?](https://steemit.com/science/@manfredcml/does-intuition-matter-in-solving-problems)
[Can Steemit help stop the spread of diseases?](https://steemit.com/steemit/@manfredcml/can-steemit-help-stop-the-spread-of-diseases)
[Do you have to be a "genius" to work on machine learning?](https://steemit.com/technology/@manfredcml/do-you-have-to-be-a-genius-to-work-on-machine-learning)
[Kicking off your first machine learning project is EASIER than you thought!](https://steemit.com/technology/@manfredcml/kicking-off-your-first-machine-learning-project-is-easier-than-you-thought)
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
properties (23)
post_id6,361,802
authormanfredcml
permlinkdata-can-tell-a-story-visualizing-science-related-steemit-in-a-few-picutres-2017-first-half
categoryscience
json_metadata"{"format": "markdown", "links": ["https://steemdata.com/", "https://en.wikipedia.org/wiki/Box_plot", "https://github.com/amueller/word_cloud", "https://steemit.com/technology/@manfredcml/do-you-have-to-be-a-genius-to-work-on-machine-learning", "https://steemit.com/technology/@manfredcml/kicking-off-your-first-machine-learning-project-is-easier-than-you-thought", "https://steemit.com/science/@manfredcml/does-intuition-matter-in-solving-problems", "https://steemit.com/steemit/@manfredcml/can-steemit-help-stop-the-spread-of-diseases"], "app": "steemit/0.1", "tags": ["science", "technology", "steemstem", "blockchain", "steem"], "users": ["furion", "manfredcml"], "image": ["https://steemitimages.com/DQmcYneF4sFwR6PPbxSuM8ZFUry4KhxDASut8x2b9EwuCoG/Betterment_DataVisualization.jpg"]}"
created2017-07-06 18:01:18
last_update2017-07-06 18:03:45
depth0
children7
net_rshares328,513,924,538
last_payout2017-07-13 18:01:18
cashout_time1969-12-31 23:59:59
total_payout_value1.225 SBD
curator_payout_value0.342 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length8,785
author_reputation200,549,794,924
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (55)
@winlion ·
Grade post
👍  
properties (23)
post_id6,362,019
authorwinlion
permlinkre-manfredcml-201776t233319192z
categoryscience
json_metadata"{"app": "esteem/1.4.6", "format": "markdown+html", "community": "esteem", "tags": "science"}"
created2017-07-06 18:03:24
last_update2017-07-06 18:03:24
depth1
children1
net_rshares2,411,234,218
last_payout2017-07-13 18:03:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length10
author_reputation3,940,538,323
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries
0.
accountesteemapp
weight500
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@manfredcml ·
Thank you : )
properties (22)
post_id6,363,210
authormanfredcml
permlinkre-winlion-re-manfredcml-201776t233319192z-20170706t181412923z
categoryscience
json_metadata"{"app": "steemit/0.1", "tags": ["science"]}"
created2017-07-06 18:15:54
last_update2017-07-06 18:15:54
depth2
children0
net_rshares0
last_payout2017-07-13 18:15:54
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length13
author_reputation200,549,794,924
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@fredrikaa ·
Great article. I'm happy you informed us in the #steemstem group about it.

Being an economist working with Space Technologies and Satellite Data, I can see so many overlaps in our interests. Having only recently gotten into blockchain, I'm excited to start applying some Data Analytics to the data coming from the steem platform myself, so thanks a lot for sharing!

Followed you and upvoted. Hope to have many more exchanges with you on the platform :)
👍  
properties (23)
post_id6,365,666
authorfredrikaa
permlinkre-manfredcml-data-can-tell-a-story-visualizing-science-related-steemit-in-a-few-picutres-2017-first-half-20170706t184224048z
categoryscience
json_metadata"{"app": "steemit/0.1", "tags": ["steemstem", "science"]}"
created2017-07-06 18:42:24
last_update2017-07-06 18:42:24
depth1
children2
net_rshares2,460,443,080
last_payout2017-07-13 18:42:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length454
author_reputation127,187,503,965,069
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@manfredcml ·
Thanks for reading @fredrikaa : ) Followed you as well! Hope you enjoy playing around with the data. Apart from visualizing the data, building machine learning tools as add-in to Steemit may be another way of using the data. By the way I'm thinking of how to apply machine learning in topics about economics since there're plenty of such topics in financial economics but seems there're not much about investigating social issues with AI.
properties (22)
post_id6,411,048
authormanfredcml
permlinkre-fredrikaa-re-manfredcml-data-can-tell-a-story-visualizing-science-related-steemit-in-a-few-picutres-2017-first-half-20170707t050820146z
categoryscience
json_metadata"{"app": "steemit/0.1", "users": ["fredrikaa"], "tags": ["science"]}"
created2017-07-07 05:10:00
last_update2017-07-07 05:10:00
depth2
children1
net_rshares0
last_payout2017-07-14 05:10:00
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length438
author_reputation200,549,794,924
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@fredrikaa ·
It is happening more and more. Although the people with an economics or finance background who also master machine learning and advanced data analytics in R or Python (most just learn analogue tools like SPSS and STATA) usually get sucked up by big banks or big exchanges :P
properties (22)
post_id6,419,504
authorfredrikaa
permlinkre-manfredcml-re-fredrikaa-re-manfredcml-data-can-tell-a-story-visualizing-science-related-steemit-in-a-few-picutres-2017-first-half-20170707t072542742z
categoryscience
json_metadata"{"app": "steemit/0.1", "tags": ["science"]}"
created2017-07-07 07:25:45
last_update2017-07-07 07:25:45
depth3
children0
net_rshares0
last_payout2017-07-14 07:25:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length274
author_reputation127,187,503,965,069
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@steemitboard ·
Congratulations @manfredcml! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

[![](https://steemitimages.com/70x80/http://steemitboard.com/notifications/votes.png)](http://steemitboard.com/@manfredcml) Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click [here](https://steemit.com/@steemitboard)

If you no longer want to receive notifications, reply to this comment with the word `STOP`

> By upvoting this notification, you can help all Steemit users. Learn how [here](https://steemit.com/steemitboard/@steemitboard/http-i-cubeupload-com-7ciqeo-png)!
properties (22)
post_id11,467,150
authorsteemitboard
permlinksteemitboard-notify-manfredcml-20170830t105735000z
categoryscience
json_metadata"{"image": ["https://steemitboard.com/img/notifications.png"]}"
created2017-08-30 10:57:33
last_update2017-08-30 10:57:33
depth1
children0
net_rshares0
last_payout2017-09-06 10:57:33
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length690
author_reputation38,705,954,145,809
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@steemitboard ·
Congratulations @manfredcml! You received a personal award!

<table><tr><td>https://steemitimages.com/70x70/http://steemitboard.com/@manfredcml/birthday2.png</td><td>Happy Birthday! - You are on the Steem blockchain for 2 years!</td></tr></table>

<sub>_You can view [your badges on your Steem Board](https://steemitboard.com/@manfredcml) and compare to others on the [Steem Ranking](https://steemitboard.com/ranking/index.php?name=manfredcml)_</sub>


**Do not miss the last post from @steemitboard:**
<table><tr><td><a href="https://steemit.com/steemitboard/@steemitboard/the-steem-community-has-lost-an-epic-member-farewell-woflhart"><img src="https://steemitimages.com/64x128/https://cdn.steemitimages.com/DQmQWnM36SWCPGn98nY83M1ArgweMz5fnovQEp2E4FiDdug/Wolfhart_header.png"></a></td><td><a href="https://steemit.com/steemitboard/@steemitboard/the-steem-community-has-lost-an-epic-member-farewell-woflhart">The Steem community has lost an epic member! Farewell @woflhart!</a></td></tr><tr><td><a href="https://steemit.com/steemtoolbar/@steemitboard/steemtoolbar-update-display-bug-fixed"><img src="https://steemitimages.com/64x128/http://i.cubeupload.com/7CiQEO.png"></a></td><td><a href="https://steemit.com/steemtoolbar/@steemitboard/steemtoolbar-update-display-bug-fixed">SteemitBoard - Witness Update</a></td></tr><tr><td><a href="https://steemit.com/steem/@steemitboard/do-not-miss-the-coming-rocky-mountain-steem-meetup-and-get-a-new-community-badge"><img src="https://steemitimages.com/64x128/https://cdn.steemitimages.com/DQmUphCGZFWgt6bJ1XTtunV7esnwy6bxnGqcLcHAV3NEqnQ/meetup-rocky-mountain.png"></a></td><td><a href="https://steemit.com/steem/@steemitboard/do-not-miss-the-coming-rocky-mountain-steem-meetup-and-get-a-new-community-badge">Do not miss the coming Rocky Mountain Steem Meetup and get a new community badge!</a></td></tr></table>

###### [Vote for @Steemitboard as a witness](https://v2.steemconnect.com/sign/account-witness-vote?witness=steemitboard&approve=1) to get one more award and increased upvotes!
properties (22)
post_id76,865,493
authorsteemitboard
permlinksteemitboard-notify-manfredcml-20190623t025608000z
categoryscience
json_metadata{"image":["https:\/\/steemitboard.com\/img\/notify.png"]}
created2019-06-23 02:56:09
last_update2019-06-23 02:56:09
depth1
children0
net_rshares0
last_payout2019-06-30 02:56:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length2,033
author_reputation38,705,954,145,809
root_title"DATA can tell a story! Visualizing science-related Steemit in a few pictures! (2017 First Half)"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000