Stack Exchange Inc operates a network of forums, the most recognizable being Stack Overflow, that range from English Language to Bicycles. A full listing of current sites is available at [url removed, login to view]
As part of its business model, it allows people to propose new forums when a visitor has a new forum topic idea. The idea is subjected to a process of discussion then proposal then commitment then private beta then public beta then graduation into a fully-fledged member of the Stack Exchange network. You can read more about the process at [url removed, login to view]
In order to run the process, Stack Exchange Inc created a portal called Area 51. The Area 51 portal can be accessed at [url removed, login to view]
There are many ideas that enter the new site creation process but never graduate. Some get closed at the private beta stage, some get closed in the public beta stage, and some stay forever in the public beta stage without graduating. I want to study why.
The active private beta, public beta, and graduated site dataset is publicly available on the Internet Archive website: [url removed, login to view]
For the inactive private beta and public beta dataset: we can obtain this by searching Google for "didn't have enough activity during the beta".
We end up with the following datasets:
> 22 closed Private Beta forums
> 11 closed Public Beta forums
> 83 active Public Beta forums that have not graduated but not shut down either ("zombies")
> 39 active Graduated Forums that have gone on to graduate
The total dataset will be around 20Gb in compressed format.
We will also need to make use of the Stack Exchange API as some pieces of data are not in the data files. The Stack Exchange API is extensive and can be accessed at [url removed, login to view]
My prior attempt with using custom-coded Python running on an Amazon Web Services instance has not gone well and I need to quickly re-do everything in order to meet a deadline. The Python code will be made available to you.
In short, we want to compare key statistics relating to forums that we have adjusted for user size: e.g. what is the average number of posts per user, how does the age of users change over time, how does the geographic location of users change over time, how does the posting frequency change over each hours - is the forum active 24 hours a day with new messages or does it only get active at certain hours, etc.
The person assisting will need to be willing to go beyond what is specified as we might encounter new ideas as we work together - e.g. identifying new statistics that we should collect. I have attached the statistics that have been used so far - that said, some have been identified as inaccurate since they are not properly adjusted for the size of the user base - e.g. number of up votes on a post is influenced by the number of users: a site with more users will end up with more up votes simply as a reason of its larger user base rather than the merit of the post itself.
5 freelancers estão ofertando em média $203 para esse trabalho
Hi, Expert in several coding and query languages and statistical tools like R, SPSS, SAS and host of other data tools. Please ping on chat to discuss about the project in detail Best Regards