24:
95:
This bot will post external link analysis, find probable spambot-created pages, and eventually tag them for speedy deletion. It will also generate a set of statistics that can be used by the community to determine whether some pages are being used as spam carriers.
120:: Checks download.wikimedia.org to find new database dumps, comparing the current ones with the last one it had processed. If new ones are found, it can generate a list of urls to download page.sql.gz and externallinks.sql.gz to be downloaded via
201:
After executing the queries, the script processes the resulting lists to limit the lists to a determined amount, to prevent creating pages too big. If resulting listing has more than 500 items, the bot stops, as the dump result must be manually
210:: This script executes the communication between the bot and the Knowledge (XXG) project. The script logins the bot and uploads the generated listings at a determined location. Currently, that is being done at
286:, updating the statistics in that page. Permission for the bot to run there will be requested after having the bot approved in the English Knowledge (XXG).
130:: Executes the queries from page.sql.gz and externallinks.sql.gz in a local database, then executes several custom-made queries to gather statistics:
66:. This bot does not yet have the approval of the community, or approval has been withdrawn or expired, and therefore shouldn't be making edits that
298:
75:
Administrators: if this bot is making edits that appear to be unassisted to pages not in the operator's or its own userspace, please
283:
303:
99:
The bot runs once per database dump. In the case of the
English Knowledge (XXG), I expect it to run once every 45-60 days.
23:
59:
55:
215:
114:: The "bot" itself. The script just calls each of the following scripts in order, handling any problem they may have.
31:
184:
Generates a list of pages with titles containing one of several patterns used by malfunctioning bots, like
63:
211:
46:
35:
214:. First, the script determines whether there is a current dump, and if so, archives it at
107:
The bot itself is composed of a set of bash shell script files, each doing a single task:
138:
WHERE externallinks.el_from = page_id AND page_is_redirect = 0 AND page_namespace = 0
292:
42:
266:
User:ReyBrujo/Dumps/yyyymmdd/Articles with between xxx and yyy external links
146:
Generates a list of articles sorted by the amount of external links each has.
62:
edits that would be extremely tedious to do manually, in accordance with the
256:
User:ReyBrujo/Dumps/yyyymmdd/Articles with more than xxx external links
152:
SELECT COUNT(el_to) AS total, SUBSTRING_INDEX(el_to, '/', 3) AS search
242:
User:ReyBrujo/Dumps/yyyymmdd/Sites linked between xxx and yyy times
218:. Then it uploads the listings and the dump page, with the format:
282:
Finally, the bot will also edit a global page currently found at
121:
276:
are delimiters when a single listing would have over 500 items.
252:
are delimiters when a single listing would have over 500 items.
70:
except in the operator's or its own user and user talk space.
18:
232:
User:ReyBrujo/Dumps/yyyymmdd/Sites linked more than xxx times
238:
is usually 500 in the case of the
English Knowledge (XXG)
228:
is the database dump date (and not the processing date)
164:
Generates a list of external links in descendant order.
76:
134:
SELECT COUNT(el_from) AS total, el_from, page_title
156:WHERE page_id = el_from AND page_namespace = 0
8:
170:SELECT page_id, page_title, page_namespace
16:Knowledge (XXG) editing bot run by ReyBrujo
284:meta:User:ReyBrujo/Dump statistics table
7:
174:WHERE page_title LIKE '%index.php%'
14:
22:
299:Unapproved Knowledge (XXG) bots
178:OR page_title LIKE '%/w/%' OR
54:It is used to make repetitive
1:
222:User:ReyBrujo/Dumps/yyyymmdd
216:User:ReyBrujo/Dumps/Archive
320:
176:OR page_title LIKE '%/%'
154:FROM externallinks, page
136:FROM externallinks, page
304:All Knowledge (XXG) bots
68:appear to be unassisted
180:page_title LIKE '%/';
160:ORDER BY total DESC;
142:ORDER BY total DESC;
212:User:ReyBrujo/Dumps
192:, or ending with
140:GROUP BY el_from
88:
87:
82:
311:
262:is usually 1000.
181:
161:
158:GROUP BY search
143:
81:
72:
53:
40:
26:
19:
319:
318:
314:
313:
312:
310:
309:
308:
289:
288:
179:
177:
175:
173:
171:
169:
159:
157:
155:
153:
151:
141:
139:
137:
135:
133:
105:
93:
73:
71:
51:
38:
17:
12:
11:
5:
317:
315:
307:
306:
301:
291:
290:
280:
279:
278:
277:
263:
253:
239:
229:
205:
204:
203:
199:
198:
197:
167:
166:
165:
149:
148:
147:
125:
115:
104:
101:
92:
89:
86:
85:
83:
60:semi-automated
27:
15:
13:
10:
9:
6:
4:
3:
2:
316:
305:
302:
300:
297:
296:
294:
287:
285:
275:
271:
267:
264:
261:
257:
254:
251:
247:
243:
240:
237:
233:
230:
227:
223:
220:
219:
217:
213:
209:
206:
200:
195:
191:
187:
183:
182:
168:
163:
162:
150:
145:
144:
132:
131:
129:
126:
123:
119:
116:
113:
110:
109:
108:
102:
100:
97:
90:
84:
80:
78:
69:
65:
61:
57:
50:
48:
44:
37:
33:
28:
25:
21:
20:
281:
273:
269:
265:
259:
255:
249:
245:
241:
235:
231:
225:
221:
207:
193:
189:
185:
127:
117:
111:
106:
98:
94:
74:
67:
41:operated by
32:user account
29:
118:download.sh
293:Categories
172:FROM page
128:process.sh
64:bot policy
208:upload.sh
202:analyzed.
112:review.sh
56:automated
226:yyyymmdd
91:Overview
77:block it
43:ReyBrujo
268:where
258:where
244:where
234:where
224:where
52:
39:
103:Tasks
34:is a
30:This
272:and
248:and
122:wget
47:talk
274:yyy
270:xxx
260:xxx
250:yyy
246:xxx
236:xxx
190:/w/
58:or
36:bot
295::
188:,
49:).
196:.
194:/
186:/
124:.
79:.
45:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.