Knowledge (XXG)

User:RBSpamAnalyzerBot

Source 📝

24: 95:
This bot will post external link analysis, find probable spambot-created pages, and eventually tag them for speedy deletion. It will also generate a set of statistics that can be used by the community to determine whether some pages are being used as spam carriers.
120:: Checks download.wikimedia.org to find new database dumps, comparing the current ones with the last one it had processed. If new ones are found, it can generate a list of urls to download page.sql.gz and externallinks.sql.gz to be downloaded via 201:
After executing the queries, the script processes the resulting lists to limit the lists to a determined amount, to prevent creating pages too big. If resulting listing has more than 500 items, the bot stops, as the dump result must be manually
210:: This script executes the communication between the bot and the Knowledge (XXG) project. The script logins the bot and uploads the generated listings at a determined location. Currently, that is being done at 286:, updating the statistics in that page. Permission for the bot to run there will be requested after having the bot approved in the English Knowledge (XXG). 130:: Executes the queries from page.sql.gz and externallinks.sql.gz in a local database, then executes several custom-made queries to gather statistics: 66:. This bot does not yet have the approval of the community, or approval has been withdrawn or expired, and therefore shouldn't be making edits that 298: 75:
Administrators: if this bot is making edits that appear to be unassisted to pages not in the operator's or its own userspace, please
283: 303: 99:
The bot runs once per database dump. In the case of the English Knowledge (XXG), I expect it to run once every 45-60 days.
23: 59: 55: 215: 114:: The "bot" itself. The script just calls each of the following scripts in order, handling any problem they may have. 31: 184:
Generates a list of pages with titles containing one of several patterns used by malfunctioning bots, like
63: 211: 46: 35: 214:. First, the script determines whether there is a current dump, and if so, archives it at 107:
The bot itself is composed of a set of bash shell script files, each doing a single task:
138:
WHERE externallinks.el_from = page_id AND page_is_redirect = 0 AND page_namespace = 0
292: 42: 266:
User:ReyBrujo/Dumps/yyyymmdd/Articles with between xxx and yyy external links
146:
Generates a list of articles sorted by the amount of external links each has.
62:
edits that would be extremely tedious to do manually, in accordance with the
256:
User:ReyBrujo/Dumps/yyyymmdd/Articles with more than xxx external links
152:
SELECT COUNT(el_to) AS total, SUBSTRING_INDEX(el_to, '/', 3) AS search
242:
User:ReyBrujo/Dumps/yyyymmdd/Sites linked between xxx and yyy times
218:. Then it uploads the listings and the dump page, with the format: 282:
Finally, the bot will also edit a global page currently found at
121: 276:
are delimiters when a single listing would have over 500 items.
252:
are delimiters when a single listing would have over 500 items.
70:
except in the operator's or its own user and user talk space.
18: 232:
User:ReyBrujo/Dumps/yyyymmdd/Sites linked more than xxx times
238:
is usually 500 in the case of the English Knowledge (XXG)
228:
is the database dump date (and not the processing date)
164:
Generates a list of external links in descendant order.
76: 134:
SELECT COUNT(el_from) AS total, el_from, page_title
156:WHERE page_id = el_from AND page_namespace = 0 8: 170:SELECT page_id, page_title, page_namespace 16:Knowledge (XXG) editing bot run by ReyBrujo 284:meta:User:ReyBrujo/Dump statistics table 7: 174:WHERE page_title LIKE '%index.php%' 14: 22: 299:Unapproved Knowledge (XXG) bots 178:OR page_title LIKE '%/w/%' OR 54:It is used to make repetitive 1: 222:User:ReyBrujo/Dumps/yyyymmdd 216:User:ReyBrujo/Dumps/Archive 320: 176:OR page_title LIKE '%/%' 154:FROM externallinks, page 136:FROM externallinks, page 304:All Knowledge (XXG) bots 68:appear to be unassisted 180:page_title LIKE '%/'; 160:ORDER BY total DESC; 142:ORDER BY total DESC; 212:User:ReyBrujo/Dumps 192:, or ending with 140:GROUP BY el_from 88: 87: 82: 311: 262:is usually 1000. 181: 161: 158:GROUP BY search 143: 81: 72: 53: 40: 26: 19: 319: 318: 314: 313: 312: 310: 309: 308: 289: 288: 179: 177: 175: 173: 171: 169: 159: 157: 155: 153: 151: 141: 139: 137: 135: 133: 105: 93: 73: 71: 51: 38: 17: 12: 11: 5: 317: 315: 307: 306: 301: 291: 290: 280: 279: 278: 277: 263: 253: 239: 229: 205: 204: 203: 199: 198: 197: 167: 166: 165: 149: 148: 147: 125: 115: 104: 101: 92: 89: 86: 85: 83: 60:semi-automated 27: 15: 13: 10: 9: 6: 4: 3: 2: 316: 305: 302: 300: 297: 296: 294: 287: 285: 275: 271: 267: 264: 261: 257: 254: 251: 247: 243: 240: 237: 233: 230: 227: 223: 220: 219: 217: 213: 209: 206: 200: 195: 191: 187: 183: 182: 168: 163: 162: 150: 145: 144: 132: 131: 129: 126: 123: 119: 116: 113: 110: 109: 108: 102: 100: 97: 90: 84: 80: 78: 69: 65: 61: 57: 50: 48: 44: 37: 33: 28: 25: 21: 20: 281: 273: 269: 265: 259: 255: 249: 245: 241: 235: 231: 225: 221: 207: 193: 189: 185: 127: 117: 111: 106: 98: 94: 74: 67: 41:operated by 32:user account 29: 118:download.sh 293:Categories 172:FROM page 128:process.sh 64:bot policy 208:upload.sh 202:analyzed. 112:review.sh 56:automated 226:yyyymmdd 91:Overview 77:block it 43:ReyBrujo 268:where 258:where 244:where 234:where 224:where 52:  39:  103:Tasks 34:is a 30:This 272:and 248:and 122:wget 47:talk 274:yyy 270:xxx 260:xxx 250:yyy 246:xxx 236:xxx 190:/w/ 58:or 36:bot 295:: 188:, 49:). 196:. 194:/ 186:/ 124:. 79:. 45:(

Index


user account
bot
ReyBrujo
talk
automated
semi-automated
bot policy
block it
wget
User:ReyBrujo/Dumps
User:ReyBrujo/Dumps/Archive
meta:User:ReyBrujo/Dump statistics table
Categories
Unapproved Knowledge (XXG) bots
All Knowledge (XXG) bots

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.