Knowledge

User:WikiInquirer/WikiStudy

Source 📝

172:, the data file shrinks to 10+ GB and it is finally ready for insertion into the database. Please do not be concerned with the variance in file size -- no data is lost; this is expected when you convert a .xml file (a structured human-readable file which requires more redundant data) to a .sql file. And so, the data import has been ongoing since Jan 12, 2007, that means it has been running for 58:
Note: While this is certainly not the best way to get our sample, it is our next best alternative. It is better than going to Knowledge and trying to fish randomly around for 600 names (in fact, this method is not even random). The Dec database captures a snapshot of editing activity in early Nov
100:
are those who have administrative powers. The Dec database has the information on who are all* the privileged users in the English Knowledge at that point in time. Non-privileged users are those who are registered users and not anonymous IP addresses. This stratification can be used as a control
167:
of all edits made prior to Nov 2006. This explains why I require survey participants to have registered their accounts before Jan 2006, so that I can track their edits made leading up to Nov 2006. After uncompressing the file, it ballooned up to a size of 20+ GB and after running it though the
35:. But suffice to say that I am investigating the cause-and-effect relationship between motivations and knowledge contributions. The ‘cause’ (motivations) is measured in the survey whereas the ‘effect’ (contributions) is measured in the archival data ( 54:
in Dec 2006. The data file contained the edit metadata for the most recent edit made to all the articles in Knowledge. Now, it has become useful again because we are going to pick our sample from this database.
151:
has a total of 600 usernames, divided equally between privileged and non-privileged users (300 in each group). Direct invitations are extended to Sampling Unit C over the weekend of Mar 3-4, 2007.
120:
function in the database to shuffle Sampling Unit B like a deck of cards and draw the first 300 names. These 300 names are non-privileged users that are passed on to
159:
The problem is, as I have stated on my user page, the data import is really taking a long time. Specifically, I am importing the data from the
108:
Firstly, let’s look at all the non-privileged users in Sampling Unit A. After filtering all the privileged users from Sampling Unit A, we have
62:
The sampling frame consists of all the editors whose usernames appear in the Dec database. From here, I ran two sampling stages:
163:
of 3.1 GB. In contrast to the earlier file mentioned in the Section 'Sample Selection', this data file contains the
102: 127:
Secondly, we want to find out how many privileged users exist in Sampling Unit A. The query shows that
85: 17: 32: 139:
names and draw the first 300. These 300 names are privileged users who are passed on to
132: 117: 47:
Explanation in brief on how I selected the target sample for the survey:
160: 51: 36: 169: 131:
privileged users exist in Sampling Unit A. Again, I called the
97: 31:
I cannot divulge too much details here because of the
105:, so that I can hear the opinions from both sides. 73:The result of the above two sampling stages is 8: 69:Users who are not anonymous IP addresses. 66:Users who are not ‘bot’ (robot) accounts. 7: 59:2006, when the database dump began. 24: 88:Sampling Unit A into two groups: 101:variable later on. I am doing 1: 190: 135:function to shuffle these 116:. After which, I called a 103:non-probabilistic sampling 77:. Sampling Unit A has 112:users. This gives us 176:now (as at Mar 4). 146:Our final result: 52:database download 37:the database dump 18:User:WikiInquirer 181: 98:Privileged users 84:Next, I want to 43:Sample Selection 33:Hawthorne effect 189: 188: 184: 183: 182: 180: 179: 178: 157: 149:Sampling Unit C 147: 141:Sampling Unit C 122:Sampling Unit C 114:Sampling Unit B 75:Sampling Unit A 45: 29: 22: 21: 20: 12: 11: 5: 187: 185: 165:entire history 156: 155:Technicalities 153: 94:non-privileged 71: 70: 67: 44: 41: 28: 27:The Rough Idea 25: 23: 15: 14: 13: 10: 9: 6: 4: 3: 2: 186: 177: 175: 171: 170:mwdumper tool 166: 162: 154: 152: 150: 144: 142: 138: 134: 130: 125: 123: 119: 115: 111: 106: 104: 99: 95: 91: 87: 82: 80: 76: 68: 65: 64: 63: 60: 56: 53: 48: 42: 40: 38: 34: 26: 19: 173: 164: 158: 148: 145: 140: 136: 133:pseudorandom 128: 126: 121: 118:pseudorandom 113: 109: 107: 93: 89: 83: 78: 74: 72: 61: 57: 49: 46: 30: 90:privileged 161:stub dump 96:users. 86:stratify 50:I did a 174:50 days 110:277,350 81:users. 79:278,423 137:1,073 129:1,073 16:< 92:and 39:). 143:. 124:.

Index

User:WikiInquirer
Hawthorne effect
the database dump
database download
stratify
Privileged users
non-probabilistic sampling
pseudorandom
pseudorandom
stub dump
mwdumper tool

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.