Em Andamento

Analyze online forum postings using MATLAB or SPSS

Stack Exchange Inc operates a network of forums, the most recognizable being Stack Overflow, that range from English Language to Bicycles. A full listing of current sites is available at [url removed, login to view]

As part of its business model, it allows people to propose new forums when a visitor has a new forum topic idea. The idea is subjected to a process of discussion then proposal then commitment then private beta then public beta then graduation into a fully-fledged member of the Stack Exchange network. You can read more about the process at [url removed, login to view]

In order to run the process, Stack Exchange Inc created a portal called Area 51. The Area 51 portal can be accessed at [url removed, login to view]

There are many ideas that enter the new site creation process but never graduate. Some get closed at the private beta stage, some get closed in the public beta stage, and some stay forever in the public beta stage without graduating. I want to study why.

The active private beta, public beta, and graduated site dataset is publicly available on the Internet Archive website: [url removed, login to view]

For the inactive private beta and public beta dataset: we can obtain this by searching Google for "didn't have enough activity during the beta".

We end up with the following datasets:

> 22 closed Private Beta forums

> 11 closed Public Beta forums

> 83 active Public Beta forums that have not graduated but not shut down either ("zombies")

> 39 active Graduated Forums that have gone on to graduate

The total dataset will be around 20Gb in compressed format.

We will also need to make use of the Stack Exchange API as some pieces of data are not in the data files. The Stack Exchange API is extensive and can be accessed at [url removed, login to view]

My prior attempt with using custom-coded Python running on an Amazon Web Services instance has not gone well and I need to quickly re-do everything in order to meet a deadline. The Python code will be made available to you.

In short, we want to compare key statistics relating to forums that we have adjusted for user size: e.g. what is the average number of posts per user, how does the age of users change over time, how does the geographic location of users change over time, how does the posting frequency change over each hours - is the forum active 24 hours a day with new messages or does it only get active at certain hours, etc.

The person assisting will need to be willing to go beyond what is specified as we might encounter new ideas as we work together - e.g. identifying new statistics that we should collect. I have attached the statistics that have been used so far - that said, some have been identified as inaccurate since they are not properly adjusted for the size of the user base - e.g. number of up votes on a post is influenced by the number of users: a site with more users will end up with more up votes simply as a reason of its larger user base rather than the merit of the post itself.

Habilidades: Matlab and Mathematica, Estatística SPSS, Análise estatística, Estatísticas

Veja mais: what is online data, well being amazon, using re, the active network, statistics archive, some online business ideas, re code online, python exchange, online services blog, online proposal services, online new business ideas, online network business, online idea business, online forum posting, online business website ideas, online business portal, online business model, online business ideas part time, online business blog, online blog sites, online blog business, new business ideas on internet, network business online, need a python code that does, make my business online

Acerca do Empregador:
( 5 comentários ) Brantford, Australia

ID do Projeto: #6791862

Concedido a:


Dear Client, I am writing to apply for your job .I have a Bachelor of Science in Applied Statistics with computing and has the expertise in data analysis and statistics in general using SPPS ,R and R .I am competent , Mais

$155 USD em 3 dias
(1 Comentário)

5 freelancers estão ofertando em média $203 para esse trabalho


As a Post Graduate in Statistics and graduate in mathematics, i have a lot of experience in handling statistical data, especially in Time Series Modelling & Forecasting, Regression analysis,correlation , ANOVA, statist Mais

$300 USD in 3 dias
(16 Comentários)

Hi, Expert in several coding and query languages and statistical tools like R, SPSS, SAS and host of other data tools. Please ping on chat to discuss about the project in detail Best Regards

$250 USD in 3 dias
(5 Comentários)

Dear Sir, I am graduated from Indian Institute of Technology New Delhi in 2007, in "Industrial and Production Engineering" which is considered to be among the best institutes in Asia and world impart Mais

$155 USD in 3 dias
(0 Comentários)

Bir öneri henüz sağlanmadı

$155 USD in 3 dias
(0 Comentários)