Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know what the appropriate type of statistical analysis would be if I wanted to try to discover groups of commonly used features?

Say I have a spreadsheet where the rows are users and the columns are "has used feature X in the past 30 days". How do I go from that to "there are three clusters of users who tend to use these distinct subsets of features"?



Clusters make a nice story, but you're going about it backwards. What are you trying to predict?

The scientific method suggests we create a hypothesis first, then design an experiment (or identify a natural experiment), then collect data, and finally perform a hypothesis test. Deviations from that process increase your risk of spurious results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: