Knowledge

:Village pump (proposals)/FritzpollBot - Knowledge

Source 📝

Result: checkY Approved

  • FritzpollBot (discussion closed)
One of the beauties of wikis in general -- and Knowledge in particular -- is that there is no rigid process that specifies how content evolves. The lack of a rigid process allows each of us to develop our own personal style of editing. Some people create stubs, some people fix typos, some people facilitate discussions, and some people bring articles to featured status. The community polices itself through discussions and the creation of guidelines in a somewhat indirect fashion. In essence, it is the status quo that drives wikipedia. The community only gets involved when there is a dispute. The bulk of editing happens without discussion, by a community that follows the status quo. Community discussions help clarify and crystallize the status quo, and occasionally steer it in a new direction. Policy is mostly descriptive.
Reading through this discussion, I find that the practice of using a bot to create pages is in keeping with the status quo. Hundreds of pages have already been created this way. Fritzpoll makes a strong case that using a bot to create an article is like using a can-opener to open cans. It is a tool that helps speed up the task. Many editors use automated tools to help speed up the process. The wiki itself is an automated tool for creating web pages that we all use.
Bots are very powerful tools. Because they have the power to create bad edits just as fast as they can create good edits we don't just let anyone use them. The community has created policies and procedures to keep bots from running amok. We rely on the bot operators to be responsible for the use of their bots. Fritzpoll has been extremely responsible in adapting the proposal so that the choice of "cans to be opened" is under the control of individuals or groups of individuals. Clearly, there still needs to be some work done to optimize the criteria for selecting places to be created, and I expect that these criteria can and will be refined. With the adaptations to the proposal that have already been made, I don't see any likelihood that a million or so articles will quickly flood Knowledge. The bot has only been approved for a limited run. So far, all our status quo practices and policies have been followed.
There needs to be strong convincing reasons to put limits on editing that is in keeping with the status quo. The practice being limited has to be shown to be either disruptive, contrary to the goals of the project, or detrimental to our core principals such as verifiability and transparency. A large majority of opinion in this discussion has been in favor of allowing the proposal. Nevertheless, I don't believe in percentages. A single objection that shows a proposal to be disruptive, harmful or detrimental to the project would outweigh any majority. The main objections raised can be roughly summarized as follows:
  • "New articles should be created by people, not bots". However, there is precedent for creating articles by bot, and the articles still remain years later. There seem to be positives and negatives to creating large numbers of new articles using a bot. I don't find any convincing argument for the need to ban bot created articles simply because they are created by a bot. On the contrary, I think there is a long standing consensus that all edits created by a bot are the creation of the person running the bot, and the responsibility of the person running the bot.
  • "Red-links inspire article creation more than stubs". I would have fully agreed with this statement when I joined Knowledge in 2004. At that time every article was filled with red-links and many editors were driven to turn as many of them to blue as they could. Since then, red-links have become a source of embarrassment to many editors, who see them as pointing out the deficiencies of Knowledge. I don't think anyone has made a conclusive argument either way. Certainly, to the ordinary user of Knowledge, some information is better than nothing.
  • "Places are not inherently notable". I would say that this is the most convincing argument against running the bot. The community expends quite a bit of effort arguing about notability. It seems to have different meanings to different people. It is important to have standards for notability so that Knowledge does not devolve into facebook. In this regard, notability is a way that we judge the verifiability of information. I don't think anyone is proposing creating articles about places who's existence is unverifiable. I find that Fritzpoll with the input of the larger community has made a concerted effort to limit the bot to creating articles of places that the community deems notable. With these limitations, the notability argument is moot.
  • "This will set a bad precedent that will cause a massive increase in the number of articles". I don't see convincing evidence that such an increase would be a bad thing. Some people see it as good, some do not. Since everything is undo-able, and the bot is going to ramp up slowly, there is no chance of this being an irreversible problem if it is found to have a negative effect. If anything, there is an argument to be made that the community input that happened here is a good precedent for any future project that would involve a bot creating a large number of articles.
  • "These pages should be created in a new separate project -- a Wikiatlas". I am not convinced that this would make Knowledge any better. Users commonly click on place names in articles to find out where they are. Putting all these places in a different project would just make it more difficult for our users to find the information they are looking for.
I don't find any of these arguments convincing enough to override our established guidelines and practices. All things considered, I find that there is a consensus for going forward with the proposal. I commend everyone involved for working to create a proposal that takes the concerns of the community into account. -- SamuelWantman 00:06, 10 June 2008 (UTC)
The following is an archived discussion. Please do not modify it.
New proposal formulated - old discussion archived to talk page.

What this discussion is about

User:FritzpollBot was recently approved at Knowledge:Bots/Requests for approval/FritzpollBot to create stub articles for most or all of the documented villages and towns in the world in the style of User:Fritzpoll/GeoBot/Example (apparently using data from the NGIA(?)). Since this would have created about 2 million stub articles, broader approval of the Bot was sought from the community. The initial proposal has been changed, and the current proposal, to be found in the section immediately below, is the subject of current discussion.

Alternative: a new proposal

This entire page was 12 hours old before I even knew it existed, by which time there were misunderstandings, and raging arguments taking place. I have read what has been said, and believe there is net support for at least the principle of evening up (to a greater or lesser extent) the geographical coverage of Knowledge. However, there are many legitimate concerns, and I have taken these on board and now present an amended proposal for the community's consideration

Proposal

The executive summary of my proposal is this: bot automation driven by WikiProjects, operating within community-defined guidelines.

Here is the meat of it.

1. A new WikiProject is created to coordinate the activities of the bot (What bot?). This allows for a central group of volunteers to assist with the generic tasks involved in making this project work (What project?), and gives a centralised palce for questions to be asked, and new proposals and requests to be made

The first job of the new WikiProject will be to clean up the existing articles on settlements, adding valuable referenced paragraphs of text, infoboxes with referenced data and locator maps etc to existing articles or existing sub stubs or poor articles to improve quality and consistency by country in collaboration with the wikiprojects before starting work on developing each country.

2. Before beginning work on a new country, the relevant WikiProjects are contacted. These will include the country WikiProjects, continental WikiProjects, and subject-based WikiProjects. We will seek some volunteers - if no, or insufficient volunteers for a country can be obtained, we ignore this country for the time being.

3. Together with the WikiProjects, a collection of sources will be obtained. The default will be an amalgamation of the US GNS data and the census data of various countries obtainable from the following list of resources http://www.census.gov/main/www/stat_int.html. If census data is not available, unreliable, or imcomplete, work on the country will be suspended until it is, or until other, reliable sources can be found to add this kind of data.

4a. Once source collection has occurred, the bot will be tuned to output lists similar in format to those already being created, but with the addition of population data, and hopefully other elements such as elevation data etc. The output will be seperated into subpages, with a subpage devoted to those places the bot is unable to reconclile between databases.

4b. The bot will not upload any data for places where the census data indicates that the dwelling is too small, with size to be determined here by community consensus (not voting!). More on this below.

5. Data will be checked, as per the old proposal, by human editors to ensure correct spellings, check for disambiguation, etc. In the case where the bot cannot automatically reconcile data from the existing sources, human editors must add a reference to their corrections to indicate how they reconciled the data (looking at an atlas, for instance). Most data reconciliation failures are likely to be failures to correlate census data to coordinate data. These references will ultimately also be uploaded by the bot

6. Once the project agrees that the data has been checked, and is ready for upload, relevant parties (such as New Page Patrol) will be given notice of an upload - I propose 30 mins notice - and the bot will automatically create the articles according to a template agreed with the WikiProjects. The articles will include all the above data, and all the references to it.

7. The bot will watchlist the articles to prevent flooding Special:UnwatchedPages and create a list of articles it created - this list will be posted to allow the WikiProject volunteers to watch the pages that they helped to create.

8. When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly. Added by JJB 14:13, 2 June 2008 (UTC)

What use is this?

The advantages to the above are that, although a little slower, we end up with more than one-line stubs, and because countries can be worked on in parallel by multiple WikiProjects on their own subpage within a separate WikiProject (the new one proposed above), the speed factor is also maintained. Thus there is an increase in quality with a minor cost in speed compared to the old proposals. By involving the WikiProjects in the way described, we ensure that there is sufficient interest in the articles, we obtain new and useful sources, and we ensure that there is someone to watchlist the pages afterwards.

The difference, therefore, between this and the old proposal is the increase in quality, and breadth of sourcing. These will not be single-sourced articles, and we will be able to devote our time to finding new and reliable sources of data. The WikiProjects also end up with a series of extra articles that they wanted, in the format they wanted. An example of how the project has already been moving in this direction is a discussion I have had with a member of WikiProject Russia, who is collating a list of sourced data in a database, and we want to help them by uploading the data when it is complete.

Other points

This proposal will probably drastically reduce the number of articles created, but I hope people will understand that this proposal by its very nature will not yield a good estimate of the number of articles created. It will be nowhere near the predictions of the first proposal, however.

I also hope the community will understand that an example is difficult to give, since I would have to first go and collect the data and sources for an entire country to create a handful of articles. This would not be in the interest of the articles in question. The rough layout of the articles created under this proposal would not be significantly different to the original - there would be an infobox, categories and text. The text would be more substantial given the additional sources, and the external links currently in the article would not exist in this new iteration.

Onwards to discussion...

I believe this proposal will qualm the legitimate fears of vandalism, unsure notability and low quality of stubs. The one point about the above, beyond acceptance, that needs to be considered above is point 4b). The easiest automatic criterion is size. There is no need to have a permanent, everlasting limit - a limit that can later be reviewed if it is found to be inadequate is probably best, so that we introduce articles slowly. My suggestion is that the community pick a percentage representing the lowest size of town/village to be included - the percentage would be "as a percentage of the capital city of the country". So if you picked 50%, all dwellings that had a population greater than half the population of the capital city would be included by the bot.

The reason for doing this is that it is fairer than selecting a fixed number, like 30,000, since less developed countries will not necessarily have reached the levels of urbanisation that we consider.

This proposal should satisfy most of those "on the fence" for the previous proposal, should continue to garner support from those supporting it, and may even address some of the concerns of those who opposed. But let's not make the following discussion divisive. I beg, no more straw polls, no more "voting" - let's just talk about this rationally.

Let the games begin! Fritzpoll (talk) 11:53, 2 June 2008 (UTC)

Interested editors in joining the new WikiProject

For editors who are willing to work as a team to develop and take some responsibility for organizing or developing FritzPoll bot please sign here. This may include editors from relative wikipojects who may have a specific interest in working on one country as part of the project or editors who have a general interest in working as a team to achieve new objectives:

  1. ♦Blofeld of SPECTRE♦ 14:18, 4 June 2008 (UTC)
  2. -- TinuCherian - 14:35, 4 June 2008 (UTC) for WP:INDIA
  3. Calaka (talk) 14:45, 4 June 2008 (UTC). I will not be a big help unfortunately, but I will try to contribute in any small way I can.
  4. --NickPenguin(contribs) 14:48, 4 June 2008 (UTC) Grunt work is my specialty, and I can dedicate a few evenings a week to this project.
  5. llywrch (talk) As I've mentioned elsewhere, I have materials about Ethiopia, although I can contribute to Eritrea settlements (which badly needs attention -- our Eritrean specialists seem to have dropped out of Knowledge).
  6. EJF (talk) 17:08, 4 June 2008 (UTC) (signed as IP, on wikibreak)
  7. Wrad (talk) 17:33, 4 June 2008 (UTC) Should be able to help with Arabian towns and villages.
  8. I'm an Editorofthewiki 19:06, 4 June 2008 (UTC)
  9. Keeper, although I'm quite useless in most ways, I do have admin buttons for anything they may be needed for (mass deletions/page moves/blocking the opposers,(just kidding on the last one).
  10. I have some skills in rooting up data, but will oppose any stub without a population figure. Phlegm Rooster (talk) 19:24, 4 June 2008 (UTC)
  11. Will help with Russia... eventually.—Ëzhiki (Igels Hérissonovich Ïzhakoff-Amursky) • (yo?); 19:54, 4 June 2008 (UTC)
  12. I'm looking for a project, and I'd like to see the bot go forward. --Falcorian  00:11, 5 June 2008 (UTC)
  13. Adam McCormick (talk) Might be some use to have a template guy and bot op on board. I'm in. Adam McCormick (talk) 00:43, 5 June 2008 (UTC)
  14. John Carter (talk) 20:33, 5 June 2008 (UTC)
  15. User:Fritzpoll (would it be wrong of me to add the namesake here?) :-)
  16. Kaly99 (talk) 16:36, 6 June 2008 (UTC) for WikiProject Sierra Leone and anywhere else help is needed.
  17. —KetanPanchal 05:28, 7 June 2008 (UTC) Should be able help somewhat with articles related to India

Straw Poll

This is not a vote, even though it looks like one. However, if a significant number of people say "yes without reservations" we can move ahead faster, if a significant number say "no, it's just a bad idea," we can speedy-cancel it.

I like it as it stands

Sign here if you like the proposal as it stands.
  • Stubs are like seeds. The more seeds, the bigger the wikipedia tree!--Nblschool (talk) 02:06, 6 June 2008 (UTC)
  • In before the zot! JJB 14:13, 2 June 2008 (UTC)
  • Like I stated before there is nothing wrong with stubs. Britannica is mostly stubs! Zginder 2008-06-02T15:03Z (UTC)
  • The stubs created by this bot are better than most stubs written by humans. I, for one, welcome our new robot overlords! Plrk (talk) 15:09, 2 June 2008 (UTC)
  • Support. Even if flawed, this proposal will set things in motion, while eventual mistakes will be fixed (Knowledge can do that). --Qyd (talk) 15:19, 2 June 2008 (UTC)
  • Support - Would give a kick-start to certain places with deficiencies. MRM (talk) 15:28, 2 June 2008 (UTC)
  • Absolutely strongest possible support. The world is not ending folks, it is merely getting added to Knowledge. Uniformly, human assisted-ly, using Wiki-botcode. Good freeking grief, get over yourselves. — Preceding unsigned comment added by Keeper76 (talkcontribs)
  • Support - There's enough human intervention involved here for common sense to prevail on notability where it's needed. I'd like to see the functionality I mention below included. Pfainuk talk 15:57, 2 June 2008 (UTC)
They're already on my notepad to be implemented Fritzpoll (talk) 15:59, 2 June 2008 (UTC)
  • Support. I did have reservations, but the proposal in its current form addressed them all fairly well.—Ëzhiki (Igels Hérissonovich Ïzhakoff-Amursky) • (yo?); 16:13, 2 June 2008 (UTC)
  • Support. To those who identify the lack of human input as a problem with this proposal, note the following. I am a human being (or at least my cat seems to think I am.) I am personally compiling a list of the Canadian municipalities that still don't have articles, sticking only to incorporated municipalities for which a bot can easily extract objective source data from the last Canadian census and excluding any place that doesn't meet that requirement. I am personally verifying all of these redlinks to ensure that they aren't just misspelled links to articles that we do already have (and even if we do end up with some accidental duplicates, it takes what, three whole seconds to type #REDIRECT ] and hit save?) I am personally providing Fritzpoll with the relevant and valid Canadian sources. And I am personally committed to ensuring that every article gets reviewed afterward, either by myself or by another WP:CWNB colleague, to ensure that it got done correctly, gets corrected quickly if anything goes wrong, and gets expanded whenever possible with content of the type that only a human can add. So there's human oversight every step of the way — at least for Canadian municipalities, the bot simply won't have the ability to do anything that somebody from WP:CWNB, be it me or someone else, doesn't personally give Fritzpoll the go-ahead to do with it. So I don't see what the problem is. Bearcat (talk) 16:25, 2 June 2008 (UTC)
    This is exactly the kind of work ethic I would like to see enshrined in the proposal itself for all countries that would have articles created. In particular, the articles need to be assured of expansion before they get generated. Ryan Reich (talk) 16:51, 2 June 2008 (UTC)
  • Support - People from other countries are more likely to make an article for their city on a wikipedia in their language. Finding and translating these would take forever. This bot would help people get these articles on the english wikipedia. Jkasd 16:38, 2 June 2008 (UTC)
  • Support. I think this is a fantastic idea, though I understand and accept the concerns of those members below. KV5Squawk boxFight on! 16:39, 2 June 2008 (UTC)
  • yes without reservations - it will create uniformity and consistency. It also add the potential to add more editors. Kingturtle (talk) 17:10, 2 June 2008 (UTC)
  • Support As I supported the first proposal and strongly believe all inhabited places are inherently notable. Davewild (talk) 17:17, 2 June 2008 (UTC)
  • Strong Support - I have long wanted to see just such a project and I'm very glad that people are stepping forward to do it. Two comments. First, my observation is that people are *far* more likely to expand a stub than to start an article. Yes, all these articles would probably be created eventually, but I suspect the hand-created ones would usually lack the type of info this bot initially adds. I think the reluctance to start an article compared to adding to a stub is even stronger when the person isn't a native speaker. Any grammatical/spelling errors in the added info from non-native speakers can be fixed by people who know the language but not the subject matter. Secondly, as far as vandalism goes, I don't think it will be a real problem. What is the difference between an article that gets viewed 1000 times/minute and vandalism gets fixed after 5 minutes, compared with an article that gets viewed 1000 times/month and gets fixed after 5 months? The nature of fixing vandalism is that X% of the readers will notice the error, and Y% of those X people will fix it. I suspect that vandalism will be easier to detect in these stub articles and if vandals are using the "random page" function to find articles to vandalize, this would turn out to be a plus. Yeah, not enough of a plus to justify creating stub articles alone, but I just don't see the vandalism problem on wikipedia changing much because of these stubs. Wrs1864 (talk) 17:21, 2 June 2008 (UTC)
    First, just because people are more likely to expand a stub than start an article doesn't mean that all the place stubs that would be created will be likely to get expanded. Speculation of future notability is not supported by the notability policy, and if the stubs are created merely with geographic data, they will not initially claim any notability. Second, your attitude towards vandalism is a little cavalier: in the case of a mass-produced set of an unknown number of articles, perhaps hundreds of thousands, X (as in X%) would be extremely, unusually small. Yet the vandals would not necessarily be as arbitrary as the police: directed misinformation campaigns, based on some kind of political rivalry, racism, or whatever, could directly target locations that might take months to be noticed. This is not something to be brushed off. The only way to combat this is to ensure that human editors are personally invested in the articles created and the notable information placed there. Please comment on my proposal for this below. Ryan Reich (talk) 18:10, 2 June 2008 (UTC)
  • Support seems almost too conservative, but it will be a good start. EJF (talk) 17:25, 2 June 2008 (UTC)
  • Support Agree with EJF--I would have supported even something more radical. This will be a fantastic project. Mangostar (talk) 17:56, 2 June 2008 (UTC)
  • Strongest support Yes, this is going to be a great project, and would, to some extent, reduce the systemic bias in Knowledge.--Dwaipayan (talk) 17:59, 2 June 2008 (UTC)
  • Support Even in this revamped and weakened version of the proposal, it is still something that should be done. I can find no policy issues involved since any published atlas meets the requirements as a secondary source as defined in WP:PSTS. The ues of multiple data sets meets the requirements of WP:RS. I find fault with many of the arguments that seem to boil down to WP:NOEFFORT or WP:WHOCARES. I believe that this project reduces significant barriers to getting notable, encyclopedic information into article space because new editors, or non-native english speaking editors are more likely to edit an existing article than start a new article. I also believe that a consistant starting point for these articles, with a common format, layout, title convention, and proper category tagging will go a long way toward improving the overall quality of the entire encyclopedia. Jim Miller (talk) 18:13, 2 June 2008 (UTC)
    In my reading of the footnotes in WP:PSTS, an atlas is definitely not a secondary source, as it does not provide any kind of analysis or interpretation. It is simply a collection of facts. It also satisfies the criteria of the University of Nevada (Reno) reference as being a collection of "raw research data" (though it is not an original document). Further information that tends to appear in an atlas, such as cultural data, is a secondary source, but the mere geographic facts are not. I think that placing these contents of atlases in the category of secondary sources is contrary to the intent of the term as providing a document of human interest in the facts, and that allowing articles based purely on rote information would diminish Knowledge to the status of a mere repository of information. However, see my proposal for how we can implement this project in a way that does justice to the encyclopedia and to the places, and also encourages participation by new editors. Ryan Reich (talk) 18:29, 2 June 2008 (UTC)
    Then we will disagree. An atlas is not a mere listing of facts, in this case listings of demarcations determined by surveyors to determine boundries. I would argue that cartography is inherently analytical, and that atlas publishers undergo a defined editorial process of their publications. I have read your other proposal and don't find any merit in your insistance that a human generated stub is any more notable than a bot generated stub. Further, we should not be setting any precedent that would define stricter standards for creating articles than those standards used to justify of deleting articles. Any stub that would survive the already low deletion criteria of AfD should not be prevented from creation. Jim Miller (talk) 00:01, 3 June 2008 (UTC)
    An atlas may not, as a whole, be a mere listing of facts, but if all you are using from it is the bare facts, then you haven't used the portion which is a secondary source. The statistics themselves are merely a primary source; I suppose we do disagree on this point, but I think it is the letter of WP:NOT#Knowledge is not an indiscriminate collection of information, and certainly the spirit, that articles replicating only cartographic data do not belong here, regardless of whether they otherwise meet the notability criteria. And if you think that my proposal would simply replace bot-generated stubs with human-generated ones, then you didn't read it. The point is to replace a stub containing no notability claims (hence barely deserving the name of stub, since it doesn't meet the inclusion criteria) with an article that is at least stub-quality and draws from sources testifying to human discourse on the subject; perhaps, from the other parts of the atlas that a bot can't possibly process. My criteria are not restrictive; I merely advocate that we adhere to the notability guidelines, no more than AfD would ask. Many articles are deleted for lack of notability claims. The specific point I'm making in this debate is that statistical data alone is not a notability claim. Ryan Reich (talk) 01:36, 3 June 2008 (UTC)
  • Support This is a well-thought out and useful proposal. Tuf-Kat (talk) 18:25, 2 June 2008 (UTC)
  • Support, it sounds a fine idea and I expect the creation of these stubs to act as a catalyst to increase input from existing editors with knowledge of the country, as well as helping recruit new editors. This should help a great deal with the serious problem that is our US/European bias. Tim Vickers (talk) 18:41, 2 June 2008 (UTC)
  • Support - Why wouldn't I support expanding Knowledge's coverage of notable topics? These stubs will be a great asset. Okiefromokla 19:00, 2 June 2008 (UTC)
  • Support - Why not? Stubs can always be expanded, and this paves the way for interested writers. Der Wohltemperierte Fuchs 19:01, 2 June 2008 (UTC)
  • Support. Although I think this proposal is a little restrictive/conservative and is too bureaucratic. IMO, this bot should operate like any other upload bot. Certainly, groups of Wikipedians going through its uploads is useful, but this shouldn't be used in policy to limit the activity of the bot before-the-fact. Ultimately, we need articles on all designated geographic locations regardless of population size (rather meaningless, IMO - many geographic locations are unpopulated but significant). The purpose of using sources such as US GNS is that significance is inherent in geographic data from these. Every action on Knowledge is reversible and I trust the bot maintainer to be responsible. --Oldak Quill 19:16, 2 June 2008 (UTC)
  • Support, and I supported the original proposal.-gadfium 19:18, 2 June 2008 (UTC)
  • Support if Fritzpoll implements all minor improvements he promised (see his reply to concerns below).Biophys (talk) 19:45, 2 June 2008 (UTC)
  • Support. I like cheese. JKBrooks85 (talk) 19:48, 2 June 2008 (UTC)
  • Support. It is about time, I think this bot is long over due. -theoneintraining (talk) 20:03, 2 June 2008 (UTC)
  • Support without reservations. The articles would be consistent and uniform to start with and would give a good starting point for other editors who wish to expand on those articles. Places are already notable. -Pparazorback (talk) 20:03, 2 June 2008 (UTC)
  • Support, as before... agree with the above comments about it being a good place to start. Alex Muller 20:23, 2 June 2008 (UTC)
  • Support - I actually have some reservations (I'm sure you have read at least some of them), but per the proposal, we can work these things out within the WikiProject. I support the general idea of the project. Good luck! -- Ynhockey 20:26, 2 June 2008 (UTC)
  • Support - I'm surprised this bot wasn't created long ago... and I think Keeper76 said it well.   jj137 (talk) 20:30, 2 June 2008 (UTC)
  • Support per my previous comments on the talk page. El Greco 21:27, 2 June 2008 (UTC)
  • Support. I was strongly in opposition to the original proposal, but this new proposal resolves those concerns to a sufficient degree that I support it. By using multiple types of information from several databases and by delegating much of the work to human-assisted work from WikiProjects, this looks to be an excellent proposal. Another reason why I support is this: Blindly creating two million stubs is a process that is unlikely to adapt, hard to control, and has a huge potential to do damage; this new proposal could be customized and adapted quite easily, and thus is unlikely to do harm. The key here is adaptability, because there will always be unforeseen challenges with any proposal of this scale. Pyrospirit (talk · contribs) 21:35, 2 June 2008 (UTC)
  • Support, for the most part this new proposal resolves my concerns in that it establishes more than bare coordinates as sources. I'm less concerned about lower population limits; sometimes such stubs should be created for completeness. (Rambot made a good start on Alva, Oklahoma, population 5,000, for several authors to add to, and even Amorita, Oklahoma is better than having nothing there.)--Prosfilaes (talk) 21:38, 2 June 2008 (UTC)
  • Support, although: would it not be possible to hide these places from "random article" until someone else than the bot has edited them at least once?--Aqwis (talkcontributions) 21:51, 2 June 2008 (UTC)
  • Support; I think the new proposal takes the original idea and gives it a higher quality. I think this project will do a good job of creating articles about locations that wouldn't otherwise get articles, because the people who live there may not be part of Knowledge/even have access to it. As these places are often underrepresented, I think this project will do a very good job of filling out the ranks of Knowledge's article on places. -- Natalya 23:44, 2 June 2008 (UTC)
  • Support. As per previous discussion all my thoughts still stand. While this modfied version will reduce the speed of article creation, that was never a goal (e.g. to be the fastest bot in the world or whatever) it will hopefully lead to increase in quality. Calaka (talk) 01:32, 3 June 2008 (UTC)
  • Support. This proposal is awesome. Wrad (talk) 01:34, 3 June 2008 (UTC)
  • Support As I have noted before. The U.S. has articles on every little place that were bot generated (not saying every article; just saying a similar bot ran a similar task on a smaller scale without all of the fuss). The same respect should be given to the rest of the world in some sort of fashion. §hep¡Talk to me! 02:12, 3 June 2008 (UTC)
  • Support as per previous straw poll. Eyes will have to be kept after it, but I believe it will benefit the project overall. Huntster (t@c) 02:25, 3 June 2008 (UTC)
  • Support A great way for Knowledge to improve its coverage of the world's towns and cities. Captain panda 03:00, 3 June 2008 (UTC)
  • Support. PrinceOfCanada (talk) 03:05, 3 June 2008 (UTC)
  • Support As you say, start with large towns, cities, and see how it goes (I foresee up to a million pages within a year or two is within reason). I think this is a great effort, as mentioned it helps with infobox standardization enormously. I am for a more or less fixed population limit. (perhaps established cities, towns, 1000+ or 5000+; I assume that is still a large quantity of articles.) Danski14 03:07, 3 June 2008 (UTC)
  • Support The original proposal was flawed but this one is much better, and it will be of a greate help to many countries --TheJosh (talk) 03:19, 3 June 2008 (UTC)
  • Strong support Informative, accurate and sourced articles should be welcomed with open arms. Predictions that these articles will somehow attract the vandalism boogyman, nationalist edit wars, wikidust, and outright misinformation apply no more than they do with any other article. Perhaps even less so. Wikiprojects will be organized and mobilized, and this will produce high quality stubs. --NickPenguin(contribs) 03:25, 3 June 2008 (UTC)
  • Strong support per my comment before -- penubag  (talk) 04:19, 3 June 2008 (UTC)
  • Support Fritzpoll has met some of the concerns I raised in my earlier edit. I have compiled a collection of information on several hundred Ethiopian settlements that I would like to share with Fritzpoll so the bot could add those articles; an example of the minimum the bot would create with this data is Softu. (And hopefully the bot can update all of the Ethiopian articles when the results of the 2007 Ethiopian census is published.) -- llywrch (talk) 04:27, 3 June 2008 (UTC)
  • Support per my archived comments.. where ever they went to...-- Ned Scott 06:22, 3 June 2008 (UTC)
  • Support As per archived comments. Lugnuts (talk) 07:16, 3 June 2008 (UTC)
  • Support Even basic geographic information is useful in its own right, it also provides a framework for many small diverse contributions. ChrisHodgesUK 12:04, 3 June 2008 (UTC)
  • Support - It will improve Knowledge's coverage and per my other comment it will produce consistency and stubs of high quality. I also trust the people behind this proposal to progress thoughtfully and carefully. Suicidalhamster (talk) 13:11, 3 June 2008 (UTC)
  • Support: a much needed project. Having watched users like Blofeld spend so many hours/days/weeks doing this manually, this would free up all of us to progress to the next step, which is researching and expanding individual places. This is the skeleton which must be built for the encyclopedia to expand. T L Miles (talk) 14:30, 3 June 2008 (UTC)
  • Support; great idea. The bot-generated articles about geographical places in the US are helpful and a good starting point; the same ought to be true of the articles created this way. The human involvement in notability decisions will help keep things from getting out of hand. --Spangineer (háblame) 14:38, 3 June 2008 (UTC)
  • Support Its a rather repetitive task. The data on these settlements would be tedious at best to create the articles by hand. Additionally, Knowledge is WP:NOTPAPER, so there should be no concern about the number of articles and Knowledge is WP:NOTINARUSH, so I have no qualms about stubs sitting around for years. --Jayron32.talk.contribs 15:34, 3 June 2008 (UTC)
  • Support. The arguments in opposition are unconvincing. We need these articles eventually, and well-crafted stubs based on reliable sources and with human oversight strikes me as probably the best way to go about it.  Sandstein  16:28, 3 June 2008 (UTC)
  • Support without reservation. Remember that the first stage of an article is an uncreated one; this is often difficult to surmount (especially because of the inability for anon users to create pages -- people from rural areas will be more likely to edit their own pages, but many may do so anonymously). By moving these articles to the "created, but stub" stage, we help promote the expansion of these articles. The proposal seems to be very well thought out, so I see no reason to oppose. nneonneo 16:33, 3 June 2008 (UTC)
  • Support John Carter (talk) 19:17, 3 June 2008 (UTC)
  • Support Shaleblade {talk} Yeah, I think this would expand Knowledge. For the better :P. 20:31, June 3 2008 (UTC)
  • Support I like it. One of the first things that new users do when they start poking around with Knowledge is check out the article for their hometown. Expanding something that was little more than a stub, when I thought my hometown deserved better than that, is one of the things that got me started editing in the first place. More editors = Better encyclopedia. 'Nuff said. - Ken Thomas (talk) 21:30, 3 June 2008 (UTC)
  • Support As in previous poll. Garion96 (talk) 22:08, 3 June 2008 (UTC)
  • Support. MrPrada (talk) 22:30, 3 June 2008 (UTC)
  • Yes, please do go ahead with this project. Andre (talk) 23:56, 3 June 2008 (UTC)
  • Support Amalthea (talk) 00:52, 4 June 2008 (UTC)
  • Support Trevor MacInnis (Contribs) 02:09, 4 June 2008 (UTC)
  • Support necessary infrastructure. I frequently link to the original U.S. articles. Royalbroil 02:15, 4 June 2008 (UTC)
  • Support - much better to involve WikiProjects. Let them decide how far down they want to go the rabbit hole (ie how low/high set the population requirement for the settlements). I can't wait to see what this bot has to offer for WP:LITH. Renata (talk) 02:39, 4 June 2008 (UTC)
  • Support. The idea of involving WikiProjects in this process is ideal since it allows for more discussion about what gets added and what doesn't before the bot adds the stubs and more adherence to notability guidelines. Kal 04:41, 4 June 2008 (UTC)
  • Support. Jailerdaemon (talk) 05:25, 4 June 2008 (UTC)
  • Support MBisanz 05:52, 4 June 2008 (UTC)
  • Support This is a great plan. --Kaly99 (talk) 06:11, 4 June 2008 (UTC)
  • Support These articles should be written. As a bot can do it better (that is, 100% coverage once we determine what to cover), let the bot do it. --Falcorian  06:52, 4 June 2008 (UTC)
  • Strong support - As before, the best thing we can do to address systematic bias towards the United States and the UK and put wikipedia on the right path that attempts to cover the world evenly in which any decent encyclopedia should. Not only this but the plan to set up a wikiproject involving all of the different country porjects and possibility of using a bot and humans working at improving and making existing articles consistent before new content is generated is a massively needed thing in itself. I can't see how people could object to a team which will improve existing articles first. ♦Blofeld of SPECTRE♦ 10:08, 4 June 2008 (UTC)
  • Comment - Ethiopia is extremely difficult to get accurate information regarding place names (which have changed repeatedly, and without notice nor available documentation from the Ethiopian government). Badagnani (talk) 10:12, 4 June 2008 (UTC)
This is the same for many non english countries, whether it is Iran, Vietnam, Laos, Saudi Arabia or whatever. For some places over ten different spellings of a place or transliterations exist. Obtaining

"official" names for many countries is indeed difficult when many variations exist and ther eisn't an abundance of data. This is why people who may have specific knowledge from the wikiprojects who may be a nativ eof that country may be able to help sort things out before creation ♦Blofeld of SPECTRE♦ 10:24, 4 June 2008 (UTC)

Unfortunately, casual users are the worst contributors of wikipedia when they create articles, with their contributions ending up at Afd more often than not. MickMacNee (talk) 15:14, 8 June 2008 (UTC)
  • Support - my reservations about forging ahead are smaller than my misgivings about leaving it undone. --Alvestrand (talk) 22:06, 5 June 2008 (UTC)
  • Support - I have a .CSV file which has the list of most populus towns and villages of mazanderan, which census information about men and women, families, ..., I always looking for the safe software (and broadband connection) to creating articles for them, to be visitable by people and even through softwares such as Google Earth --Parthava (talk) 22:36, 5 June 2008 (UTC)
  • Strongly Support - This is an encyclopedia, and I want it to have all data available in it. After all, that's it's purpose. If I want to know if there's a town named West Elbow in Oklahoma, then I'm going to ask here. And, I'd much rather have a bot build all those stubs than wait for someone who lives there to create and screw up what should have been a standard template. The people who live there will add to the template if the can, or leave it if they can't. If there's nothing to add, well, then just the stub is all we get. -SandyJax (talk) 23:45, 5 June 2008 (UTC)
  • Support Why anybody would be against this is beyond me, lets do it. Redekopmark (talk) 04:23, 6 June 2008 (UTC)
Yeah, well, policy might have something to do with it. MickMacNee (talk) 15:14, 8 June 2008 (UTC)
Whatever our views, we are all constrained by what what wikipedia is not. MickMacNee (talk) 15:09, 8 June 2008 (UTC)

I like it with reservations

Add your reservations to the discussion sections below, and add a brief signed comment here, linking to the relevant section(s) if you wish.
So, Stockton, California would get an article; Saint Petersburg would not. I love the idea, but this metric is ridiculous. Let's just pick a number--I'd say a population over 1,000 should suffice--and be done with it. But what a great idea for a bot in general. Matt Yeager (Talk?) 22:28, 2 June 2008 (UTC)
  • The absurdness mainly comes from the "Half" and "Capital": 5-10 percent of largest city gives better results. However, I agree with Matt and others, that the metric is not good: cutoffs are essential if percentages are used. See WP:Village pump (proposals)/FritzpollBot#Size limits for more details. Other than that, the proposal addresses, at least in principle, all of my earlier reservations. Geometry guy 22:37, 2 June 2008 (UTC)
  • Per Matt Yeager's excellent points. Relations to capitals aren't the way to go here. Absolute numbers may be arbitrary, but they're probably less biased. --Bfigura 00:40, 3 June 2008 (UTC)
Obviously I support as above but the only thing I don't like is the 50% thing which is not a good idea given that every country is different, somebody used Mexico City as an example I agree with you completely. 1000 people would be a decent cut off point for adding articles on most of the towns in the world of note, . The problem arises when there are some places which may have population of 250 where there would be a great deal to write about and a town with a population of 40,000 there may be little to write. |Each country is different but it needs to be worked out before hand individually I think. ♦Blofeld of SPECTRE♦ 11:07, 4 June 2008 (UTC)
If the bot is given a decent run. About 6 seconds and all articles created will have a standard infobox, map, references and hopefully as many details as possible. ♦Blofeld of SPECTRE♦ 11:09, 4 June 2008 (UTC)
  • I like it in general. It's good to get the WikiProjects closely involved. But the capital city metric needs to be discussed further. Zagalejo^^^ 02:48, 3 June 2008 (UTC)
  • After opposing, I'm now giving the bot my cautious support for three reasons. First: Fritzpoll is a trustworthy, intelligent editor, not some zealot bent on shoving 2 million micro-stubs here; he has shown himself willing to listen to the many suggestions that are sure to come about from such a large undertaking. Second: while the bot alone cannot judge notability, this will be a deliberative, interactive process, done in close consultation on a country-by-country basis. Some countries with more complex administrative systems, or which are already sufficiently covered here, might be skipped entirely, while the bot would be very useful indeed for the 36,780 Communes of France. Third: if something goes awry, the process can be stopped or even reversed. At bottom it's a matter of trust, and I am confident the bot will be used wisely, so I endorse its implementation, at least on an experimental basis. Biruitorul 06:23, 3 June 2008 (UTC)
  • Support, but the metric used should be administrative divisions, as Badagnani suggested above. Doing it based on population doesn't make sense to me, even if done on a scaled basis. Population bears no relation to notability, in my mind. You can get some highly populated districts where there is no reliable written source in any language, and then some which are well-reported on yet quite small in population.--Aervanath lives in the Orphanage 09:18, 3 June 2008 (UTC)
  • Support, but the 50% idea is rather limiting. Many countries have had an unproportional urbanisation over the last few decades. Take some of the established countries: Germany would only have Berlin and Hamburg. Britain will have a similar situation. Agathoclea (talk) 10:58, 3 June 2008 (UTC)
    Comment to all' - yes, the suggested metric is a bit daft. The 50% was plucked out as an example, and I just wanted to show that I was open to the idea of some metric. Let the discussion lower down the page determine what criteria need to be met, and let's look past my ill thought-through suggestion relating to percentages of capital cities :) (hides in shame and ignomy) Fritzpoll (talk) 11:22, 3 June 2008 (UTC)
  • Support, but the % idea is rather limiting in general. I don't see what the problem is with little communities, at least as long as they're verifiable: look at Royal, Nebraska, just 0.013% of Washington, D.C. and 0.0044% of Nebraska. Moreover, I'm not sure of how the populations can easily be done in less developed countries with less common or less reliable censuses. I support the entire idea as long as we completely look past the suggestion relating to percentages of capital cities :) Nyttend (talk) 13:41, 3 June 2008 (UTC)
  • Support, but with reservations I would like to point out that 50% population can big a threshold. There could be sparsely populated areas, which would be quite notable, but not have the sufficient population. I'd suggest employing other "minimum" criterion as well, viz., area of the region/locality. If a place qualifies in any of the criteria, it should be automatically considered "notable". In all other aspects, I extend strongest possible support as such a format would bring in lot of standardization, and somewhat compensate for lack of initiation&mdash("How do I begin?") that might be resulting in people not coming forward to start articles. Also, the idea of starting this as a separate project is a good idea. Later, features from the "main" Knowledge articles can be merged with this project's article. All the best! —KetanPanchal 14:19, 3 June 2008 (UTC)
  • Support, but with reservation: Can we get enough volunteers for each country/city? especially from areas with less participation on wikipedia? Well generally I believe that this is an excellent idea which may as well culminate in having wikipedia an article on each geographic location but my only fear of un-proportionate volunteer participation in respect of different areas. For example countries with a developed countries backdrop may find a high response in terms of material to be added and in turn the quality of the article which comes up from such participation. However countries and areas which are less connected to internet and about which not much is known globally would face the dangers of being represented by stub articles for a long (if not perpetual) time till some non-originating volunteer decides to take up the task of putting in stuff for that area. Tarun2k (talk) 14:08, 3 June 2008 (UTC)
    Hi - thanks for your comments. The volunteers should come from WikiProjects related in some way to the given country. If not enough volunteers could be found, then work on that country would not proceed, so the stubs would not be created in the first place. The idea is to find volunteers to take on the articles before creating them Fritzpoll (talk) 14:12, 3 June 2008 (UTC)
    Hi Tarun and Fritzpoll! I believe, if a region has lesser "English-using" population, articles on that region are less likely to come by the conventional process, which would require an even greater command over English. So, in this regard, a bot created article would be better. It could also be done that the "BOT" would create infoboxes that can be used as a database to develop further articles. This would make the article, more of semiautomated rather than fully automated. But, well to be honest, I don't know what stage of development the BOT is in, and if the users would be notified what "point" (like population, religion, languages, etc.) are planned to be included. If such a bot is going to be created, it'd be better to include all such points by consensus as it might prove difficult to incorporate changes later in the already created articles. Even though, it has been told that it'd be difficult to give example, it'd be nice to give a demonstration of how the bot works, say using a fictitious land like the "Wikiland". Looking forward to replies. —KetanPanchal 14:35, 3 June 2008 (UTC)
    Excellent idea Ketan, I am sure an illustration would be really helpful in allowing wikipedians to come to a consensus. Afterall as of now we all are contemplating our own notions as to the output of the bot but surely if we see how an illustrative page comes out, we can add more to the idea of what all to include and what not, given the fact that we agree in first place to agree to the idea. And @ Fritzpoll, I really like your idea of finding volunteers for the project but I am not sure about the practicalness of the idea. I mean it requires too many ground-rules etc. like (i) how many minimum number of volunteers required for starting the bot on the area, (ii) the time we wait for volunteers to respond, (iii) what of those cases where those areas do not have a wiki project, etc. etc. Don't count me as a critic but the simple reason for my apprehensiveness is that if there were enough volunteers to come up with the area coverage, there would not have been the need for this bot of the discussion in the very first place. But the fact of the matter is that we lack requisite number of volunteers to handle all areas and it is for this reason that the idea to have a bot has started in the first place. Therefore waiting for and finding the requisite and able volunteers to for most areas is what I apprehend is quiet a longish process. Hope you understand. Tarun2k (talk) 17:38, 3 June 2008 (UTC)
    I don't think that simply the worry that relying on significant human intervention in this project would cause it to take too long is enough to justify going to the opposite extreme (in fact, nothing is enough, but this worry in particular is not so significant). Knowledge has no deadline, and we shouldn't try to rush out low-quality articles simply because we fear it would take too long to do it well. Ryan Reich (talk) 18:37, 3 June 2008 (UTC)
  • I support, not with reservations per se, but I've worked with User:Ganeshk's Ganeshbot which did pretty much the same thing for India. I found it about a year after the bot had been created, and there were plenty of duplicate named cities (when two different cities actually had the same name) and this can get pretty confusing. With a lot of work done by several editors over a period of time, this was cleared up. But if we're gonna go global, I wanna see some dedicated editors (count me out - Geography's not my cup of tea!) who're willing to sit and sift through the articles output by the bot. The bot itself, is an excellent idea - nothing entirely new, but an excellent idea... Knowledge needs it. But simply having stub-ed articles, or article pages which need to be merged with existing articles would not do. A sincere task force, dedicated to working tirelessly is required. If this is in place, then the bot would rule. :) aJCfreak yAk 16:25, 3 June 2008 (UTC)
  • Support as is, with reservations only for the notability size brought up in the discussion so far. 1,000 or 2,500 would do for me I think. (Anyone know the average population of all known cities on the planet? That might work too.) Kresock (talk) 16:37, 3 June 2008 (UTC)
  • Support I think WikiProjects cooperating with an article-creating bot can produce outstanding results. I've been creating articles about small towns and villages in Israel, and I can't begin to tell you how much easier it would be with some assistance from a bot. I manage by translating from Hebrew Knowledge, and if I have to translate all the articles one by one I will, but I see no reason to reject bot-assisted creation on principle. It worked for Rambot's creation of places in the United States, it worked for the creation of a pile of asteroids (see Special:Whatlinkshere for Socorro), and it can work for places anywhere in the world. My only reservation is that the proposal doesn't go far enough. Setting arbitrarily high limits of 50% of the capital city's population or 1,000 inhabitants prevents the bot from realizing its full potential. I would be willing to go down to 100 inhabitants, and maybe less than that. I wrote after my last RFA that I wanted the community to complete its coverage of every incorporated settlement in the world in the next five years. If we can do it in one year, why wait forever? There are undoubtedly some articles that won't be created by human hands for a hundred years without the bot noticing these places. I say, let the bot help. Shalom (HelloPeace) 18:49, 3 June 2008 (UTC)
  • Support. As others have already stated, the half-the-capital-size is not a good metric. Also, I would like to know how the bot should deal with regional disputes; what does the bot do if a town/village/city is claimed by two or more countries? For example, if the half-the-capital-size metric were used, do places in Israel go by half the population of Jerusalem or half the population of Tel Aviv? Please address the issue of how the bot should recognize/handle regional disputes. Thanks. ← Michael Safyan 19:20, 3 June 2008 (UTC)
  • Support but.... Well first of all this page should state near the top what it's about. Apparently the actual topic is on the archived page. (Fixed while I was editing.) I support the concept of a bot adding large numbers of places to WP, but there's going to be trouble deciding the minimum population to include. "Global standards and any national exceptions" won't really cut it. A town size that's notable in Wyoming isn't necessarily notable in California. A size that's notable in Siskiyou County, California isn't necessarily notable in Los Angeles County, California. In fact Lake Hughes, California which is in Los Angeles County has an article, even though it has only a few hundred people. It's not a CDP, so I can't find the population, but for numbers let's say it's 1000, which I'm pretty sure is way high. A named unincorporated area of 2000 people that's adjacent to the city of Los Angeles, or that's in the San Gabriel Valley would likely not be notable. Some method of comparing it to nearby population would be preferable, but it may not be feasible, depending on how a country supplies it's data.  Randall Bart   Talk  19:42, 3 June 2008 (UTC)
  • Comment, I don't care what the (non-zero) population of the settlement is, as long as the bot provides the number upon creation. Without a population number for every stub I cannot support the bot. Phlegm Rooster (talk) 19:48, 3 June 2008 (UTC)
  • Support: People are worried that random article might send them to a stub for an African town, but right now, random article is more likely to send you to an article for an American neighbourhood than a good size African town. I think that does a disservice to the overall worldview provided, and the general status of wikipedia. This debate should be about notability, and there are plenty of quite notable locations that don't have any kind of article, simply because there are less people people using the internet there. Less people using the internet does not make a country less notable, but it does ensure it has less content on wikipedia. Let's use this tool WITH good criteria for notability (size, yes, but other too) to do what we're supposed to be doing here. - TheMightyQuill (talk) 19:49, 3 June 2008 (UTC)
  • Support; Matt Yeager said it best. Neutrality 15:03, 4 June 2008 (UTC)
  • Support, the idea of limiting the articles to 10% of the major city seems to be wrong. I would suggest a uniform metric of, say 5K ppl. Settlements of that size can be reasonably expected to be on such a project like ours. I am not sure how the project will work with Cyrillic name. Does the source has the same transliteration scheme as Wiki? Can we run it on a single administrative region per each country (say Moscow Oblast for Russia) and see if everything is OK. If not we have to update the bot's code (country specific) Alex Bakharev (talk) 12:40, 6 June 2008 (UTC)
  • Support; I think this is a very good Idea and will greatly enhance the information available in Knowledge. Many edits are afraid to add to pages but most will gladly add info to existing pages so even if we create stubs, some info is better than none. I do think that where possible we shoud add the necessary infoboxes and categories rather than just one line stubs.--Kumioko (talk) 12:54, 6 June 2008 (UTC)
  • Suppport Good idea to populate WP with settlements that can be expanded, but have reservations about percentage metric for cutoff. I think population size would be better. Also, start with a subset of places as a trial run first to gage impact. — Becksguy (talk) 21:13, 7 June 2008 (UTC)

I have serious reservations that cause me to vote no

Sign here, and put your reservations in the discussion sections below.

Serious reservations -- I don't see how you can expect us to support your proposal without some concrete examples of the bot's output. The samples I saw earlier were pretty bad -- most of them should probably be deleted. Pete Tillman (talk) 22:39, 2 June 2008 (UTC)

The problem here is that the new proposal means getting a WikiProject involves to gather sources, then me adjusting the bot to fuse the sources, outputting the lists, having the lists checked and then using the amended lists to create the articles. This is, necessarily, a long process and one where I will have trouble getting people to engage if it is on a very speculative basis of this "might" happen. I do appreciate your concerns, but I hope that what you disliked most about the earlier samples was their lack of content and not the layout of the articles, etc. In a sense, the bot has proved itself technically capable of creating articles with the information we give it. The problem was, from the earlier discussion, that it didn't have much information to go on. By involving human editors, it will have more information, and output the data in a format similar to the one created by the previous version of the bot - that is, text, infobox, sources. The only differences will be the extra information, extra sources, and no external links. On a separate note, I would think that the bot's new WikiProject would probably first off add a task of cleaning those articles up to a higher standard per the discussions here and elsewhere. If you have any concerns that I haven't addressed, please ask Fritzpoll (talk) 11:30, 3 June 2008 (UTC)
  • Big improvement but still a no (moved to complete oppose because of WP:OWNership issues -- the scope of the new proposal is closer to right and far less likely to be disruptive to the whole project than 2 million articles; however, I think the proposal needs at least a rough estimate of the size before people can let the bot loose; I would support something in the order of 10,000-30,000 (e.g., something like the number of Pokemon articles we have :). It's a number I can forsee being expanding in the near future. And hey if a good % of the 10k get expanded, I'll happily say I'm wrong and let the next run be 100k. And, in addition to most of the reasons from the "strong oppose" camp, I'm bothered by this quote in the proposal, to "clean up the existing articles on settlements, adding references/infoboxes to existing articles to improve quality": For me, 95% of the quality of Knowledge comes from prose, from editorial judgments about what is important or not about a location or settlement (is it the largest producer of linen in the region? is it known for its great sports teams? its ancient mosque?) -- those that believe that in-line references and (in particular) infoboxes are what make Knowledge useful and important are spending too much time playing editor here and not enough time using the encyclopedia. -- Myke Cuthbert (talk) 15:43, 4 June 2008 (UTC)
As I take it you are referring to me, no, the goods of an articles come from paragraphs of text you are absoltely right. What I am talking about is adding infoboxes containing data on population, area, district, mayor, with a decent locator map and referencing this and adding some new paragrpahs of text in one line stub articles which are unreferenced and plain useless which we have in abundance. If you don't think that is useful and an improvement then thats your problem. Who wouldn't want to see every article expanded fully with proper details which are referenced. For somebody to make a judgement of me who has contributed numerous FA articles and GA'S to wikipedia to give a lecture that I think infoboxes are the be all and end all of wikipedia is quite something. ♦Blofeld of SPECTRE♦ 16:26, 4 June 2008 (UTC)
Actually, no, I wasn't referring to you, I was referring to the quote from the proposal above that I quoted directly. Though now that you're asking for comments about you, I do wish you wouldn't feel the need to personally rebut every statement you disagree with--it's hardly a community discussion when one person feels the need to say every other word and take every disagreement as a personal attack. -- Myke Cuthbert (talk) 01:02, 5 June 2008 (UTC)
Well that proposal was written by me so it clearly is ♦Blofeld of SPECTRE♦ 10:33, 5 June 2008 (UTC)
  • Oppose At first I thought this was a very exciting project which I thought could see a substantial improvement to the coverage of wikipedia on these subjects. Then I immediately started to think about the key problem of notability. I'm afraid my concerns got even worse when I saw that WP:NPT - the notability guideline mentioned - was only a proposed guideline, and seems to be quite a way off approval! Going to the core, approved, guideline (WP:N), what it states is that a topic is notable if it has "received significant coverage in reliable secondary sources that are independent of the subject". Unfortunately the only subject being given for these places is a primary source - a census, with some kind of "presumed" level of notability based on population size. Frankly I think this is the wrong way to go about creating a large number of notable place articles - a better place to start might be a secondary source like a guide book. AndrewRT(Talk) 22:47, 4 June 2008 (UTC)
  • Oppose As I understand the proposal, I have to say I have serious reservations about it. I'm all for using bots as an aid to help data-mine information or to help with formatting, if it means the end product will be more informative, higher quality articles. However, articles need to be created by and for humans, otherwise they serve no purpose other than to fill up disk space. Here's a basic idea of a proposal I would be more likely to support: a bot that goes through pages that already exist, and then drops sourced data on to the talk page for editors to cherry-pick through and incorporate into the main article. Bosintang (talk) 08:00, 5 June 2008 (UTC)
  • Oppose Unless output quality is high enough. I think the resources invested are better deployed elsewhere. I also agree that it is highly probable that most of those articles will never be edited or even looked at. FelisLeo 12:00, 5 June 2008 (UTC)
  • Oppose as the creation of articles on locations of unproven notabiltiy goes against WP:N. I know the current thinking is all inhabited towns are notable, but I don't subscribe to that view, as an article with no reliable secondary sources is little more than a directory entry, which falls outside the scope of Knowledge.--Gavin Collins (talk) 15:03, 5 June 2008 (UTC)
  • Oppose I see two different ideas driving this project. I see some people arguing it should be created because we ought to have articles on these places, and this is the fastest way of getting them. I see other people arguing it should be created because we will have articles on these topics either way; this just gets them created a little sooner and saves some busywork for people creating those articles. I have no problem with the second argument; I think it's a good reason for such a tool (not that it matters; that type of tool could operate off-site and a human editor could simply copy-paste in the appropriate information. It'd be hard to prevent it even if the community disapproved of it.). I'm concerned that the project (and thus, the bot) would end up being driven by the first reason instead, generating useless articles we'd never have had without the bot. What we need is quality articles on notable places. We don't need stub articles on every place of a certain population size, or certain GDP, or whatnot. My reasons in saying this are drawn from others' comments in "just a bad idea", "inherent notability", "400,000 larger articles vs. 2 million stubs", and "Is systemic bias bad?," all stated much more clearly than I can briefly summarize. Coanda-1910 (talk) 06:42, 6 June 2008 (UTC)
  • Oppose per no way to gauge reliability of the underlying data. Settlements may be renamed, relocated or even abandoned, fall victim to armed conflicts, natural disasters or open-pit mining. For Mongolia, the data seems almost ridicously poor (looks like 30%-50% of all "populated places" have been abandoned for decades), how are we to make sure we don't run into similar problems in places like Bosnia, Ruanda, Sierra Leone, Cambodia etc. ? Yaan (talk) 19:20, 8 June 2008 (UTC)
No, it's just a bad idea
Sign here if you think the whole concept of bot-generated or bot-assisted mass-article generation for places stubs is a bad idea.
  • Oppose strongly: encourages proliferation of articles about non-notable places, reinforces notions of "inherent notability", just plain wrong. Kww (talk) 14:18, 2 June 2008 (UTC)
    There are no "non-notable places". Any geographic place is notable; the difference is in what sources can or can't be brought to bear on an article. Bearcat (talk) 15:43, 2 June 2008 (UTC)
    Please read my thoughts at #Inherent notability. I want to build a consensus that this idea is wrong and unnecessary. Ryan Reich (talk) 16:54, 2 June 2008 (UTC)
    Bearcat, I absolutely and totally disagree. Boven Bolivia is an excellent example of a "non-notable" place: an old farmhouse that even its neighbors don't know by that name. Places are notable if, and only if, they have been covered directly and in detail by multiple reliable sources. Single specks on a map or single lines in an atlas don't cut it, and no wikiproject can come up with rules that shortcut it.Kww (talk) 18:52, 2 June 2008 (UTC)
    Being 'covered directly and in detail by multiple reliable sources' is verifiability, not notability. Notability is a subjective extension of verifiability. What is considered "notable" or not changes between each person. The article Boven Bolivia is potentially useful to those travelling in, mapping or writing about Bonaire. No one is able to conceive the potential uses of any one article, but this is what notability claims to do. Verifiability is the only concrete policy of inclusion we've got to stick to and since this bot necessarily uses verifiable sources, verifiability is not an issue with the uploads of this bot.--Oldak Quill 19:32, 2 June 2008 (UTC)
    Your position is troublingly broad. You appear to be claiming that everything is notable because it is notable to someone. This is emphatically not the standard we have ever used on Knowledge, nor the one that is adopted by WP:Notability. That policy specifically restricts notability to the contents of secondary sources, those which transcend mere verifiability to achieve some measure of importance by way of having been discussed by people. This is not a very restrictive principle, yet you insist that it be weakened to the point of uselessness. Taking "verifiability" as our only inclusion criterion would lead to Knowledge becoming a repository of indiscriminate facts, an outcome which is also specifically forbidden by policy (it is also not a directory). I do not ask that every new article be gripping; all I want is to be assured that it is relevant. Ryan Reich (talk) 19:59, 2 June 2008 (UTC)
    No, I reject "notability" (the concept) because it changes from person to person. The foundation of WP:N is verifiability and its inclusion in other sources. I disagree with "significance" being brought into inclusion because significance changes between context, time and place. The reason Knowledge doesn't turn into a 'repository of indiscriminate facts' isn't due to our article inclusion policy (or any other single policy) but due to a number of policies relating to what articles should be and how they should be written. Rejecting article inclusion policy based upon "notability" is entirely independent of whether Knowledge is a repository or an encyclopedia and whether the facts contained in articles are indiscriminate or not. --Oldak Quill 20:24, 2 June 2008 (UTC)
    Don't confuse the Knowledge jargon "notable" with the English word of the same spelling. The English word means "something that I care about"; here, I am arguing only that we must follow the prescriptions of WP:N. You should read, very carefully, WP:N#General notability guideline, in particular the clarification of "presumed" concerning what Knowledge is not. Certainly, notability requires verifiability (of the notability claim: that's in the clarification of "reliable"), but it is more than that. It also requires more than passing factual mention (see the clarification of "significant coverage"). I have already argued that an atlas, at least for the purposes to which FritzpollBot will put it, is not a source (as clarified in "sources"). Indeed, other than verifiability, the only part of the standard satisfied by the stubs that would be created is "independence of the source". Finally, you cannot reject the concept of "notability" and still keep Knowledge. The concept is the core of what we are; it's as basic to the encyclopedia as the US Constitution is to the United States. Ryan Reich (talk) 20:41, 2 June 2008 (UTC)
    Thank you for your considered reply. Notability has been reined-in from a catch-all term for anything considered not-significant to something more concrete and measurable. The policy WP:N as-it-stands is fairly objective but still contains many variables that are open to interpretation and the policy is thrown around a lot in matters of people not considering a subject significant or coverage-worthy. The defining feature of WP:N (compared to WP:V) is depth of coverage and independent sources. Ignoring independent sources here, "depth of coverage" is one such variable in the policy that is open to interpretation. WP:N defines "significant coverage" as "sources address the subject directly in detail, and no original research is needed to extract the content." The policy does not define what it means by "in detail" and this definition of "significant coverage" does not preclude the use of geography databases. The datasources that this bot intends to use give the subjects more than a passing mention and provide enough information to compile a reasonable stub article which concisely overviews the subject (population, location, &c.). Just to your last point: I would suggest that Knowledge is based on no single policy and certainly it is not based on an article inclusion policy. Starting from WP:Five pillars (just one place to start), I have no need to agree with or use WP:N to edit Knowledge. WP:N has redundancy in several other policies and those parts which aren't redundant are subjective and undefined. --Oldak Quill 21:17, 2 June 2008 (UTC)
  • (<-) Hopefully the notability guidelines are not too prescriptive, or else they would inevitably prosecute the innocent. However, I believe that even as they are written, widespread propogation of purely statistical geographic data into otherwise empty stubs doesn't meet these guidelines. It doesn't even meet the standard of the first pillar: Knowledge is an encyclopedia. Among the things that it is not (which are said right in WP:Five pillars) are: an indiscriminate collection of information, a web directory, or a collection of source documents. An indiscriminate directory of all towns in the world, containing only definitional data, would seem to be at least two and a half of these (technically, as I've said, an atlas isn't actually a source document, but this particular kind of data is no improvement over what is found in a source document). Furthermore, WP:No original research is a policy, and its definition of primary and secondary sources clearly (see the footnotes) places raw geographic data in the category of a primary source. You can still disagree that WP:N's prescription that notability is determined by secondary sources alone is binding, but if you do, then you still run up against the WP:NOT objections. Finally, if you do disagree, then you will have to explain why you consider that this extremely broad principle you are advocating is "common sense" justifying the "occasional exception". WP:N may not be absolutely binding like a policy, but you can't use this fact as a shield. Ryan Reich (talk) 22:40, 2 June 2008 (UTC)
    You need to go reread WP:N, OldakQuill. Every phrase I used is a quote from WP:N, not WP:RS. Kww (talk) 20:15, 2 June 2008 (UTC)
    When I used the word "notability" I referred to the concept rather than the policy as-it-stands. Parts of WP:N are drawn from WP:V. Notability is not merely defined as 'covered directly and in detail by multiple reliable sources' and this is more a reflection of WP:V and WP:RS. According to WP:N, "notability" may only be indicated by coverage, but coverage isn't the same as notability. Notability is an abstract concept about the significance of article-subjects and whether they should be included in Knowledge. Coverage in sources (essentially verifiability) is only an indicator of notability. There are no true measures of "notability" and what is "notable" or not ultimately comes down to the individual and what they feel is significant. --Oldak Quill 20:32, 2 June 2008 (UTC)
    Boven Bolivia was created by a person, not a bot. A bot will only do what it's programmed to do, and with a population threshhold, it would never create a stub for Boven Bolivia. JD Lambert 19:36, 2 June 2008 (UTC)
  • Oppose: this kind of thing leads to a proliferation of substubs that turn out not to have been worth clicking to. Worthwhile articles are created by intelligent humans; until an intelligent human wants to make an article, let that article continue not to exist. I'm already sick of such nonsense elsewhere. -- Hoary (talk) 14:44, 2 June 2008 (UTC)
  • Oppose. I for one do believe in the inherent notability of locations -- in fact I believe it is even addressed in the notability policy -- in other words the article on Alsask, Saskatchewan has every right to exist on Knowledge as the article on New York City. HOWEVER, I oppose strongly the idea of articles being created by bot. Bots are, by definition, dumb robots, so there's no way for this thing to know if an article already exists for a place, especially places that might rightly be written about as part of larger articles i.e. metro areas, rural municipalities, etc. I have no objection to a bot creating a list of articles that need to be done (I'm not sure how that would work, though), but humans should ultimately create new articles, not a bot. Plus, any attempt to create some sort of limit to notability (such as setting a minimum population) would create a dangerous notability precedent that would violate WP:NPOV. 23skidoo (talk) 14:50, 2 June 2008 (UTC)
    No problem in principle with your opposing, but I am a bit confused. The bot I describe will do everything you say you want - it uses human input at every step, and human editors decide what articles they want created. It doesn't overwrite articles (skips anything blue-linked), and let's human editors check in advance that these existing articles don't need to be disambiguated. When human editors then decide what articles they want to create, it does the grunt work for them. So, oppose if you will, but I'm a little confused about why you are opposing this particular bot. Fritzpoll (talk) 14:55, 2 June 2008 (UTC)
    In fact, WP:Notability specifically does not address "inherent notability", though it links to an essay (meaning, non-binding opinion) on the subject. Please read my comments at #Inherent notability about why I think this concept is contrary to the principles of Knowledge and is not necessary to improve it. Ryan Reich (talk) 16:59, 2 June 2008 (UTC)
  • Oppose Still have no clue why this is needed. Countering systemic bias is a completely weak justification, bordering on political correctness. It seems to be just common sense that if a place article is required, it will be created eventually in the same way all other articles are, there is no need to have a stub sitting there waiting for it to happen. Setting off this bot would create a dangerous precedent with people queueing up with lists of inherently notable things that have no articles. If the issue is standardisation, or helping users, then what seems to have completely passed the proposers by is the alternate approach, to support the manual method of article creation by documenting all of the above in a manual of style or instruction guide. Coding and running the bot just seems to be a massive waste of effort given the number of things that need to be done to already existing articles, given the high likelihood that most of these articles will never get touched. Anyone who blindly thinks all the stubs will get expanded eventually, clearly haven't toured the rest of the already created parts of the wiki enough. The proposal still basically has the look and feel of a 'can be done' not 'must be done' hobby project, desperately looking for a reason to exist. MickMacNee (talk) 15:07, 2 June 2008 (UTC)
Fine to oppose for reasons relating to the bot, but please don't imply anything about my motivation for this. A group of editors who ahve created hundreds, if not thousands of articles like this by hand, and wanted them to be created, asked for a bot to do the work. I wrote one in response to this - I didn't just sit down and try to write something for a laugh and then try to force everyone to let me use it. Fritzpoll (talk) 15:12, 2 June 2008 (UTC)
If these 'thousands of articles' are simply of the same content as the proposed bot stubs, then that was a bad idea too, and also seems to be effort looking for a justification. Get offended all you want, but its a fact that every deleted page or aborted project on wikipedia started out as a good idea in somebody's mind. Its a basic fact of wikipedia. MickMacNee (talk) 15:26, 2 June 2008 (UTC)
  • Oppose, as per existing comments, If an article needs to be made, someone will make it. The generation of stubs for non-notable locations is not really a good idea - it seems that this is being done just to put the stubs there - I fail to see how this will help anything. If people have actual information on these, then this will eventually get made. The only possible thing this can do is to lower the effort required to make a page in the first place. Surely you could simply make a wikiproject where if people have information that they wish to put into wiki a separate web-page could be created where users select the item they are looking for and a pre-pressed article is returned?? This would seem more logical, at least to me. Having a bot generate two million articles that no-one might ever look at seems like a waste of time - I'm all for ignoring efficiency but I'm also all for not doing things without a really good reason - and I fail to see it here. Alternatively if the idea is to make census data available, isn't that what the census website is doing? Is there actually a need to mirror that? I would be for a bot that include this information into existing articles - no problems. User A1 (talk) 15:35, 2 June 2008 (UTC)
It isn't going to make two million articles. It isn't going to make them automatically. It's only going to create the pages that members of WikiProjects want to have created. If you read the actual proposal, and ignore this irritaing page title (!) you'll see that this thing is essentially people-driven with a bot doing the grunt work of extracting data from sources, listing it for analysis by humans and then only adding the ones that people want to create Fritzpoll (talk) 15:39, 2 June 2008 (UTC)
reply I read the proposal. All human editors are doing is slowing the rate at which the two million articles are generated by the introduction of a bottleneck. Given 20 volunteers at 5 min/bot generated articles each, assuming a wikipedia time of 4hrs/day/volunteer (is the bot operating in paralell or do all volunteers see the same bot upload proposals?) is 48 articles per editor/day this gives 960 total bot articles per day gives 350,400 articles per year. At that rate it is 5.7 years to 2 million articles. User A1 (talk) 15:58, 2 June 2008 (UTC)
No, the additional restriction on notability or population size will heavily restrict the number created, as I suggest following the inclusion of census data. It will thus never reach 2 million. Probably not even 1 million. Fritzpoll (talk) 16:10, 2 June 2008 (UTC)
  • Qualified oppose - I still think it's preferable for people, rather than a bot, to go about creating these articles, which should happen on an as-needed basis, rather than automatically. However, I am relieved by the shape of the new proposal. I like the fact that if this goes through, consultation with editors working on individual countries will happen first. Perhaps the bot will even skip over certain countries if coverage thereof is judged to be complete. I only hope the promise of careful consultation will be honoured, and we will not rush into this, but do it with careful deliberation. Biruitorul 15:49, 2 June 2008 (UTC)
Absolutely, nooone wants the articles, they don't get created. If the source data is incomplete, they don't get created. Any violation of the proposals as they stand should result in the bot being immediately blocked, and the operator (i.e. me) being sanctioned in some way. The bot is just a tool - computer code is not intelligent enough to be let loose on this kind of thing unattended. Think of this bot like a can opener: you use a can opener to open the tin, because it's difficult, though not impossible, to get inside the can without it. What you don't want is your can opener deciding what you should have for dinner. Fritzpoll (talk) 15:54, 2 June 2008 (UTC)
  • Oppose - while every place is notable, the stub articles are worthless if nobody is willing to fill them with useful information Towel401 (talk) 16:06, 2 June 2008 (UTC)
    I was hoping that by including people from the WikiProjects who wanted the articles created, they would want them created for a reason - namely expansion. I guess I did mention this in my proposal above, but looking at it now, it is kinda long-winded! Fritzpoll (talk) 16:09, 2 June 2008 (UTC)
    Every place is potentially notable, and certainly has a lot of reason to be notable, but for the purposes of Knowledge, that has to be determined by reliable secondary sources. The idea that places can be inherently notable is a subversion of the principles of Knowledge. Since a determination of inherent notability is part of the proposal, I want to start a discussion at #Inherent notability to the effect that the idea should not and does not need to exist. Ryan Reich (talk) 17:03, 2 June 2008 (UTC)
  • Oppose, although this variation of the proposal is better than the last one. I still don't like the idea of a bot writing our articles, as it takes away the need for human editors. Even with population, elevation info and such, the articles are still going to be rather stubby, and most of them will never be expanded or improved. Juliancolton 16:44, 2 June 2008 (UTC)
  • For reasons I have already given. seresin ( ¡? ) 19:19, 2 June 2008 (UTC)
  • Strong oppose— there are already issues with special:random; do we really want more problems? Furthermore, there would be far too many stubs, and this would affect the Five-million and Ten million pool. In short, the cons outweigh the pros, and I believe we shouldn't have an automated bot doing the work we should be doing, albeit tedious. Why not have bots write the rest of Knowledge for us? --Mizu onna sango15/珊瑚15 19:52, 2 June 2008 (UTC)
    Could you clarify what you mean by 'issues with special:random'? If you mean too many articles in one subject appear when using this tool, I would say that this isn't an issue. Having a larger proportion of articles about one subject isn't an argument for limiting more articles being created in that subject. The effect of increasing the proportion of geography articles on Knowledge will be reversed as Knowledge grows and extends. Furthermore, I imagine that some subjects (geography and science) will always have more articles and will always be more linked to by Special:Random. If anything, a disproportionate number of articles about one subject is more a call to other subjects to extend their coverage to make up. Special:Random is a useful tool, but its results and how this reflects Knowledge's make-up should not influence our inclusion policy. In a years time Special:Random may be able to select random articles in a particular category (or exclude particular categories) and this wouldn't be a problem. The 5M and 10M pools are essentially just fun and again shouldn't be used to affect our inclusion policy. --Oldak Quill 20:11, 2 June 2008 (UTC)
    Would you be happy if special random had a 1 in 10 chance of producing a place stub? (I actualy think the chance would be higher with the predicted numbers). MickMacNee (talk) 22:06, 2 June 2008 (UTC)
    I'm strongly opposed to this bot, but I think that affecting a Knowledge bet is a really bad reason to be opposed -- the bet is meta-encyclopedic; this is about the encyclopedia. Special:Random is a bit better reason, though whether or not this bot is approved, I'd like Random to ignore stubs. But the perception on the outside world that Knowledge has suddenly become RandomVillagePedia is a reasonable objection. -- Myke Cuthbert (talk) 15:54, 4 June 2008 (UTC)

I've said this to other editors who don't like the idea of a bot. Can I just say that the idea of the bot creating a few articles automatically is so many editors such as myself can no longer spend all our time creating articles on geo stubs and trying to address the huge bias on here, but can focus on quality, on building up stubs to start class articles. I dedicate a lot of time to wikipedia and creating new articles but if I could soley spend my editing time on here writing the articles that exist, things would be looking a lot better.

Perhaps the millions of articles thing was overly ambitious. It would take over a year to create that many and to expand each and every one of them would be a difficult task indeed, time which you or I haven't got. What we can do however it do several thousand at a time and get the wikirpojects involved so we can aim to get a team working at expanding a sensible number of articles of the most notable articles e.g towns with a poulation over 1000 that could quite feasibly be expanded and not remain permastubs. Ideally I;d love to have full and detailed articles on everywhere, but the huge problem is access to knowledge. Realistically if we could get many onto to here like this, it would give us a firm basis to build upon sensibly. I think the new proposal has a lot of positice points, I agree bots are stupid, even Mr Fritz. the bot programmer is the first to say this. But if it is used in the right way and coordinated and regulated closely it can be a very powerful and efficent tool in setting up a foundation to build upon as of course we write the articles!!. I have spent many weeks laone trying to adding infoboxes and refs to the geo articles which already exist by country and the biggest problem by far is lack of consistency and general shoddiness of starting them. Some editors don't get me wrong cna start articles correctly and get things off to a good start but the majority are not done in a manner expected of wikipedia and it takes weeks to sort out the mess. But if we have a bank of articles under the whim of wikiprojects and along with the editors like myself who work on geo articles we can try to build the best we can which help people in the most efficient and consistent way we can. Whatever is thought of me, I would rather not create 2 million perma stubs either and am here to build an encyclopedia of the highest quality and depth. It should be done in stages but we need to start from somewhere. Best regards ♦Blofeld of SPECTRE♦ 20:08, 2 June 2008 (UTC)

I agree with the large part of the point you are trying to make: the bot will relieve the tedium for human editors who want to populate geography articles with basic information, freeing their efforts to expand these stubs. My core reservation is that the stubs will be created without any plan for further expansion, but merely the hope that they will attract some interest. This bot should be the mechanical core of a much larger organizational effort centered around the national WikiProjects to attach editors to these new stubs and flesh them out. This effort needs some advance planning, namely, the location before creating the stubs of sufficient reliable secondary sources to establish notability claims for each town. Any article for which this cannot be done is an immediate candidate for being what you don't want: a permastub. The community needs to come to a consensus around this point before this bot can be run. Ryan Reich (talk) 20:21, 2 June 2008 (UTC)
  • The more I think about this, the more I don't like it. Even creating only lists isn't all that great. What's wrong with just letting Knowledge expand at a natural rate? The Volapük Knowledge was heavily criticised (and IIRC almost deleted and restarted) for using bots just like this to create thousands of stubs (I think they were even geographcal places too). Why would we want to bring that here? Knowledge has content based on what its users want, because its created by its users. This will likely lead to a systemic bias toward English speaking areas. I think that as a problem, its overhyped. I would venture a guess that the Arabic Knowledge is biased toward the Middle East and the Russian Knowledge is biased toward Russia. Its perfectly natural and, while it does need to be addressed somewhat, using a bot like this as some sort of full frontal assuault against bias isn't a very good idea. We also have to consider how this will affect Knowledge's image. We generally make a press release for every X millionth article, how is it going to look when we release 2 of those a few months apart and the articles are bot created stubs? Yes, we'll have 4 million articles, but only half will be created in the real wiki tradition. Storden, Minnesota (a rambot article picked entirely at random) is a good example of what's likely to happen to most of these articles - Since it was created almost 6 years ago, 17 of its 21 edits have been by bots, only 1 human edit has actually added any new text, 1 sentence about a highway, and even that wasn't done until earlier this year. If someone wants thousands of automatically generated and maintained articles about towns, that's a fine idea for a website, but not for Knowledge. Mr.Z-man 20:12, 2 June 2008 (UTC)
    • Heh. I've been to Storden, MN. There isn't anything else to say about it that hasn't been said. And yet, it is notable. And so is the town/village/commune/district in _______ that has 274 people in it, and it should be added much the same way that Storden was added. Keeper | 76 | Disclaimer 15:31, 3 June 2008 (UTC)
      • My reason for mentioning that is that one of the arguments for the bot is that relevant Wikiprojects will add info to the article after its made so that the article will be helpful, with real content instead of just prose-ified statistics. Its been there for 6 years and been categorized as part of the Minnesota Wikiproject for a year. Yet, no real content has been added, it hasn't even been assessed, just like 1300 other Minnesota articles, most probably rambot articles like that one. Knowledge is user-generated content. If someone wants an atlas website maintained almost solely by bots, they're free to set one up, but that's not what Knowledge is for. Mr.Z-man 20:58, 3 June 2008 (UTC)
  • Oppose much per Mr.Z-man, directly above. From where I'm standing, this seems too much like a solution in search of a problem. This project is always expanding, and as interest in creating articles on tiny villages grows, they will be created. We don't need a bot creating stubs for hundreds, thousands, millions of articles that will likely never be touched again. Systemic bias is a problem, nobody can deny that, but it is nowhere near the problem some would make it out to be. I do not support the concept of this bot; we should focus on vastly improving the quality of articles, not quantity. - auburnpilot talk 21:00, 2 June 2008 (UTC)
  • Oppose because this will just encourage deletionists and inclusionists to have yet another forum for blazing rows. If a place is worth including then it is worth real work. Fiddle Faddle (talk) 23:30, 2 June 2008 (UTC)
  • Oppose per just about all the comments above, particularly concerns over non-notability, and do we really want WP to have a ton of stuff generated by bots? The whole point of WP as I recalle was that it would be user-generated, not computer-generated. --Bookgrrl /lookee here 01:33, 3 June 2008 (UTC)
  • If bots can write decent articles (which I think is the case here), then let them have at it! I don't care if I'm reading machine-written text as long as it's useful. If people can think of other ways to use bots to flesh out other areas of the 'pedia, I'd support it there too. As always, we should focus on content, not contributors (human or otherwise). Mangostar (talk) 01:38, 3 June 2008 (UTC)
  • If bots can write decent articles (which I think is the case here): Really? Can you give some links to some of these bot-made, decent articles? -- Hoary (talk) 13:44, 3 June 2008 (UTC)
  • Oppose - I want their to be articles on most towns in the world to make this encyclopedia representative of the world, adding several million articles of questionable notability is not acceptable. If we were creating lists of towns in differents states in a country, that would make sense, as some towns are very small and others are very notable and require a whole article. This does make it seem like notability is inherited, meaning that because they are towns or villages, they need a whole article, and that isn't the case; in many cases, only a little will be able to be reliably sourced, so we will have to have, in addition to all the work we currently have to do to improve wikipedia, is to merged millions of stub articles into lists. Perhaps if we could create a bot to create lists of towns, that would be acceptable, but this should only be done for something which notability is completely unquestionable. Judgesurreal777 (talk) 02:40, 3 June 2008 (UTC)
  • Oppose Any information that is worth adding to an encyclopedia ought to be added by a human. The information that is suggested as possible bot-loaded material is not bias-free/neutral/unproblematic - particularly for the countries that are likely to be underrepresented. This is a truly bad idea. Also, we don't need a plethora of useless, probably unreliable, misleadingly incomplete sub-stubs. Pinkville (talk) 02:54, 3 June 2008 (UTC)
    To expand/clarify a little: Knowledge already manifests an astounding and grim imbalance of First World vs. Third World content (e.g. thousands of articles on video games vs. comparatively nothing on African art). The reasons for this are obvious, for one, people write about what they know; those of us fortunate enough to be familiar with the technology that makes Final Fantasy possible can indulge ourselves with writing about it, but for the majority of the world's population that have never made a phone call, the subjects they are familiar with aren't likely to appear any time soon in Knowledge. Relying on a bot to assemble ready-made data (from which "reliable" sources, exactly?) on documented countries, leaving aside those countries that are undocumented (or clearly unreliably documented) only exacerbates this imbalance, relegating "peripheral" nations to even greater invisibility. A better "bot" would incite Knowledge editors to learn and write about these vast, unexplored territories of subjects we presently can't even guess at... Pinkville (talk) 18:59, 3 June 2008 (UTC)
  • Oppose The number of errors will likely be overwhelming, tens of thousands at least, maybe hundreds of thousands. Either way, thousands of these stubs, under attended, could become battlegounds for nationalistic edit wars and who knows what else. Lastly, as Pinkville hints, no bot can interpret the notability guidelines. This is not a helpful notion. However, it could be very helpful in a separate geo-wiki project. Gwen Gale (talk) 03:05, 3 June 2008 (UTC)
    The bot doesn't have to interpret the notability guidelines. Per the proposal, humans will. The bot will make not decisions about what should and should not be added. Humans will ask for the data, the bot will simply save them time. If they want the articles, presumably it's so that they can expand them. Trust me, I know that a bot can't just create the articles, which is why this bot has never, ever, been proposed by me as a simple "read data, create article" type bot. Fritzpoll (talk) 11:17, 3 June 2008 (UTC)
  • Oppose Ideally, this is the 💕 that anyhuman can edit. Creating an article should have a rationale, someone should care about it beyond its existence as a line of census data. Lets extend this a little bit - every star in the universe is inherently notable, every species of life on Earth is inherently notable, every dropped wrench in orbit around the Earth is notable - are we going to create stubs for each? Franamax (talk) 03:55, 3 June 2008 (UTC)
Every star, every species, every will eventually have a wikipedia article that is deemed to be notable. Perhaps in 10 years? 20? 100? Do not be surprised by the eventual creation of all these things in time to come, either by human/robot/space alien/etc hands. Things change. They do not stay as they are. Peace!Calaka (talk) 04:22, 3 June 2008 (UTC)
Absolutely, and I look forward to showing my grandchildren my edits to Knowledge when it was only on computers. The question though is today - do we achieve a purpose by creating a million stubs? And do we help the world of search by putting a WP stub into the top-three Google results, even though it is uninformative? The articles we create go straight to the top of search rankings, shouldn't they actually say something useful? Franamax (talk) 04:30, 3 June 2008 (UTC)
Ahh I was not aware that google automatically indexed newly created wiki articles on the first page. I assumed that the newly created articles arrived at the top page after several search results of the topic in google and consequential clicks of that link. Calaka (talk) 04:46, 3 June 2008 (UTC)
Google works from a complex algorithm which I don't pretend to understand. It revolves to a large degree on the number of incoming and outgoing links to the page, and I think to the site. Knowledge scores way up there. For instance the obscure Order of the Dogwood which I recently created - two weeks ago, searching that phrase on google would give you pages of links related to Terry Fox and Nancy Greene, now those are all gone, subsumed into the Knowledge link - we're the go-to click. Please anyone correct me if I'm wrong, but this is serious stuff here. Franamax (talk) 06:20, 3 June 2008 (UTC)
  • Oppose per what everyone else has said (info added by humans, not computers). Fin© 10:05, 3 June 2008 (UTC)
  • Strong Oppose. In a nut shell, you want to create a bot that creates articles that no one finds interesting enough to create. I cannot oppose this idea more strongly. I'm not strictly a creationist per say, but in most cases I believe that if someone wants to create an article on a subject that interests them, then they should be allowed to, so long as its not nonsence or other such vandalism, but if no one wants to create an article on a topic, that means its doesn't need to exist.

Ferdia O'Brien /(C) 10:11, 3 June 2008 (UTC)

Can I just say that why do you think it was proposed if "nobody is creating these articles". Such an outlook shows little awaremness of what is actually happening on wikipedia.The bot was created precsiely because people are and will be adding these articles. There are many editors who create articles on a daily basis on such places that will be covered by the bot and many are done so badly withou proper references and infoboxes that it takes a lot of time to clean them up. The bot was proposed actually to make it more efficient as people are and will add these articles on a daily basis. It is seriously narrow minded to suggest you know the interests of everybody who uses wikipedia and who will be developing these articles. Many editors want these articles, and many editors will develop them, A focus on real world content outside America or the UK ♦Blofeld of SPECTRE♦ 12:25, 3 June 2008 (UTC)
Many people add articles about cars. I recently saw a comprehensive list of automobiles deleted for notability reasons. What you're now saying is all anyone has to do is find a couple of databases about cars, write a bot, and we're golden, the stubs can stay in perpetuity, on the defence that people create car articles. This applies to hundreds of other topics, I'm not seeing the special justification for places, unless you sign up to the inherent notability clause. Which seems to be driven by political correctness and reflects no actual practical demonstrated need or use, beyond some creative crystal ballery. The fact that people create these articles does not a bot justify. The fact they are messy and people have to tidy them up, well welcome to wikipedia, it's how it works. Other projects get around this with templates, FAQs, style guides and new article patrollers. What you're pushing here is not a usefull tool for geo-stubs, its a complete change to how wikipedia is built. I think it might help if people actually recognised this. MickMacNee (talk) 12:41, 3 June 2008 (UTC)
Can I just say that your argument against mine shows a remarkable ability to not follow what you just read Blofeld? My point was that if a place is interesting enough to have an article, it will have one, or one will appear because someone will create it. What you want to do is create an article for everyplace irrelevant of interest levels, which just isn't productive. I assume this means I can create an article on each and every star in the galaxy? Theres a database of information of them and everything, I could just parse that information, this way Knowledge can celebrate its 100,002,000,000th article in no time. Ferdia O'Brien /(C) 09:15, 4 June 2008 (UTC)
  • My reply to Gwen above covers your point about "interesting enough to create" - the bot is not deciding what articles to create - human beings are, per the proposal as described Fritzpoll (talk) 11:17, 3 June 2008 (UTC)
Your response to Gwen covers notability, entirely different thing. Being interested enough to put a name on a list isn't good enough, if editors arn't willing to write and maintain an article, it shouldn't exist. Ferdia O'Brien /(C) 16:28, 4 June 2008 (UTC)
  • Strong Oppose I have "ducked out" of this discussion after taking part in some points in the previous debate which seemed to be rather delicately replied too without actually being answered. This Bot goes against almost everything which makes Wiki the unique project it was supposed to be - not reinventing the internet (which seems to be the implied idea behind the plan), not making Wiki the be-all-and-end-all of research and information. This bot will create 2 million stubs, many of which will never be looked at again, many of which will be ripe for vandalism, and many of which would namecheck geographical areas barely a flicker on any decent atlas. As has been mentioned I think elsewhere, it is quite naive to imagine the tribesfolk of an African village suddenly feeling loved because a computer programme has put their village on an internet site. This Bot will flood Wiki at a time when the number of red-links are clearly not going down, the number of tasks continue to increase, the backlog at the existing projects grow. Why not allow Wiki to grow organically, not through some means to bypass the editors who have got it here in the first place? I have read the revised proposals but nothing there suggests this idea will get off the ground without causing a multiude of problems. Wiki needs decent articles of great quality. Quantity should not be a factor. I guess the debate has gone too far now, we are stuck with it, but I cannot find it easy to support at all. doktorb words 14:16, 3 June 2008 (UTC)
    I don't now if this may temper your opposition or not, but the proposal has substantially changed, partly in response to comments by people such as yourself. Consequently, the bot is not going to create two million articles. It is not going to create articles that are as lacking in information as the old proposal. It is only going to operate with the consent of people who are interested in having the articles that it creates. Essentially, it all goes back to the can-opener analogy I gave a little further up this page. The bot, especially with the amendments as proposed, is simply a tool to take the grunt work out of gathering data into one place, and then creating articles. But there is so much human intervention from editors that the bot is merely a tool for helping to write out the encyclopaedia. I understand and appreciate your feelings, and don't expect that this will change your opinion, but I hope to at least calm some of the worst of your fears about this Fritzpoll (talk) 14:32, 3 June 2008 (UTC)
I am grateful for your reply, Fritz, I know you are doing your best to combat some of the negativity here. My reservations are quite strong, I guess we have the passion for the Wiki project but from different perspectives. However, I feel one issue has not been dealt with and maybe here or on my talk page you can address the point i raise about the existing number of projects and backlogs. I am happy to accept that this Bot is going to run off doing its thing, fine, maybe I won't even get involved, that's the joy of Wiki, I have enough to do! But what about the backlogs, the multitudes of AfD nominations, the Arbitraton Committee work, the red-link patrols, the double redirect backlogs, all of that; how can the project be helped if such a massive injection of articles is about to land on our collective laps? I am perfectly willing to accept that the figure of 2 millions is not accurate, and am happy to see a continued instance that human interjection will have a central place in this Bot's work. doktorb words 14:45, 3 June 2008 (UTC)
I'll have a chat with you about this on your talkpage Fritzpoll (talk) 15:05, 3 June 2008 (UTC)
  • Strongly opposed This type of information is already on the Internet, in a table-based format (which is most appropriate for this kind of information), and it is extremely frustrating to read exposition of data.
I would argue that the example mentioned above, Softu, is not good enough. Compare this with town articles (or city articles), even of little-known places, that were written by (several) humans. They have maps, information about the local economy, images, nearby areas, landmarks, etc. They are properly categorized, wiki-linked, consistently formatted, and consistently titled. The list of information that would need to be automatically generated for this proposal to make any sense is enormous.
Please do not do this to Knowledge. « D. Trebbien (talk) 15:09 2008 June 3 (UTC)
But articles like Aliyu Amba and many others from the same country of a similar size are. There is a fine line between a starter stub and a bit of develpoment. We ar enot talking months or years here. MOst could be expanded within minutes. Knowledge is not set in stone for life. We already know articles can develop profoundly within days from stub class articles to GA type articles that occassionaly people put in the DYK column. I think we should be trying to achieve wonders with wikipedia and allow articles to develop not hide 95% of the world . The comment that "This type of information is already on the Internet". Search through wikipedia I don't know where you're looking but the vast majority of this encyclopedia is compiled from information already available over the Internet. Personally I'd rather see more paper sources but information has to be verifiable. The prupose of wikipedia is fact reporting, I have no idea what your idea of wikipedia should be is. I also have absolutely no idea where you are looking but look through most of the city category by country and the vasy majority are not properly referenced or with an infobox and map and I've spent weeks trying to sort out the uneven mess created by humans with geo articles. In regard to towns and cities around the world, humans have clearly shown extreme inconsistency and ability to generate articles consistently. Do you have any idea how many articles I;ve added infoboxes, maps and references to? As for French communes, Dear God if a bot had created the articles with infoboxes, references and some line sof info, weeks of work would be saved in trying to make them all bare minuimum level and more time spent on expanding them into start or B class articles. ♦Blofeld of SPECTRE♦ 15:29, 3 June 2008 (UTC)
Sorry. I meant "already on the Internet in one place", otherwise, where would the bots get the information?
As for the claim that bot-generated content is more consistent, I have a good example from botany-related articles that it is not. Take a look at Category:Rubiaceae and you will find subcategories for genra, which is considered to be an unmanageable practice. Eg. the first one, Category:Alleizettella, which has only one article, Alleizettella rubra. A. rubra was generated by User:Polbot in 2007. This edit was an immediate clean up, then , finally . The other edits were made by bots.
Another example of Polbot work is the article for genus Faramea, which was entirely written by a bot (except for one interlanguage link addition by an IP) when I first looked at it. Personally, I was annoyed by this.
Blofeld, I respect your opinion, as I think you and I are in similar cleanup efforts, but I categorically disagree that dealing with bot content would be easier. It's multiply larger.
« D. Trebbien (talk) 19:28 2008 June 5 (UTC)
May I point out that I, not a bot, wrote both Softu and Aliyu Amba? The reason that Softu remains a stub is that up to this writing I have been unable to find more details about that town -- which is surprising, considering its size (about 25K inhabitants) & the fact that I have been able to make some Ethiopian towns with less than 1000 inhabitants into "Start"-class articles. (And someone else might consider the article on Aliyu Amba to be detailed enough that it should not be graded as "Stub"-class; I tend to be critical about my own work.) -- llywrch (talk) 00:25, 4 June 2008 (UTC)
I know that you wrote Softu, and I only used it as an example of what the bot-generated content might look like at best; I do not anticipate anything more informative.
« D. Trebbien (talk) 19:28 2008 June 5 (UTC)
  • Oppose per excellent argument above - no such thing as inherent notability, articles should be created by humans, Special:Random gets enough boring stubs as is. --Explodicle (talk) 16:37, 3 June 2008 (UTC)
    Yes, the noability issue is one I've asked the community to comment on as part of my proposal. In fact, such discussion is taking place below. The articles are essentially created by humans, in that humans decide what articles they want created and the bot just does it for them, and there are already proposals to change the means by which Special:Random operates anyway Fritzpoll (talk) 17:03, 3 June 2008 (UTC)
  • Strong oppose. Places are not inherently notable. We don't need 2 million 1 sentence articles that are never going to be expanded. If a place it notable, it will eventually get an article. We don't need to manufacture all of them. Kaldari (talk) 16:58, 3 June 2008 (UTC)
    I notice that you wrote this exact same thing to the old proposal around 20 or so minutes ago. I am concerned, therefore, that you haven't actually read the amended proposal, which explicitly would *not* create 2 million articles, and almost certainly wouldn't just have one sentence. Fritzpoll (talk) 17:03, 3 June 2008 (UTC)
  • Strong Oppose per arguments above. Julius1990 (talk) 19:45, 3 June 2008 (UTC)
  • Oppose just like before. With over 2 million articles, we need to focus on quality, not quantity. If people really want to work on a location article, let them be the ones to create the article, not bot-spam the 'pedia with perma-junk-stub articles. Also as before, I am neutral on having a bot create lists of locations. – sgeureka 21:56, 3 June 2008 (UTC)
  • Oppose. Quality, not quantity. We dont need 1000's of pages about non-notable places that will never be well-known enough to fill, and therefore wont be useful. Ironholds 00:50, 4 June 2008 (UTC)
  • Oppose per reasons above; Robots are not to be trusted. --  ShadowJester07  ►Talk  01:48, 4 June 2008 (UTC)
  • Oppose, it will only create articles which will never get edited by a human except for vandalism, which then will stay unnoticed for ages and only give Knowledge a bad image. Knowledge is big enough already to attract enough human editors to get the places worth noting to be added manually. andy (talk) 10:39, 4 June 2008 (UTC)
Can I just say that I have no idea how you think that articles which will never be edited by humans and unlikely to attract a lot of attention in your view would suddenly be a hot spot for vandals. This is a contradiction Andy. Most vandals are not particularly intelligent, and I doubt many would spend there time browsing through articles on places in obscure places which as you say are completely underinteresting. If you conducted a survey of the history of vandalism on wikipedia I wnder what proportion would be on biographies, topics related to America or the UK or popular culture articles and what proportion would be on starter articles villages in Kreblakistan. ♦Blofeld of SPECTRE♦ 14:06, 4 June 2008 (UTC)
Look at the rambot articles. I cited one in my comment above. In nearly 6 years, it had 21 edits, 17 of which were by bots (I think it has some more now that its been linked to from here). Only 1 human edit added any actual text. Everything else was templates and interwiki links. As for vandalism, vandals don't have to be interested in the topic. The more unwatched articles like these we add, the higher the likelihood vandals will get one by clicking Special:Random. Mr.Z-man 18:16, 4 June 2008 (UTC)
That's a tiny sample; look at Alva, Oklahoma for another example. For a population of 5,000, it's had a lot of development around the basic form that Rambot gave it. It seems likely that any population limit of a few thousand would keep in these cities and exclude the one you're pointing to. On the other hand, I see nothing fatal about Amorita, Oklahoma; the one hit of vandalism was reverted in 12 hours, and there's no evidence that they wouldn't have just hit another set of articles. Or that they won't come back sometime and put good information about where they live, encouraged by the fact the articles exist.--Prosfilaes (talk) 18:31, 4 June 2008 (UTC)
You cannot possibly compare articles on American towns taking into account the level of traffic on here which dominates from America and can argue that villages in Kreblakistan will get the same traffic as articles on America. If nobody could be bothered to develop articles after Rambot thats a huge shame given the amount of American users on here. If less people spend all their time in discussions and confict at ANI they'd be expanded in no time at all. Americans, or potential American vandals will naturally look up their own village or community on wikipedia and may leave it a major threat from vandalism but to imagine vandals are going to be going through every settlement created in a place like Guinea or Liberia or something is far less likely. ♦Blofeld of SPECTRE♦ 20:32, 4 June 2008 (UTC)
Whose point are you trying to prove? If Americans aren't improving the thousands of American town stubs, I can't imagine that they'd improve stubs about towns in other nations. You obviously didn't read my comment at all. Vandals aren't going to be "going through" anything. If we add 500,000 of these town articles in the next few months, the odds of getting one on a Special:Random click (assuming it is truly random) are probably about 1 in 6. Mr.Z-man 23:12, 4 June 2008 (UTC)
  • Oppose - I have a number of issues with the project proposal as it currently stands. I'm a bit torn in this situation - after all, I ran a bot which was intimately involved with the concept of geographical place articles. I had it easy, however, as sourcing for information regarding incorporated places and census-designated places in the United States is readily available and rock-solid. Where things get fuzzier is when we start talking about "other", unofficial towns and villages and settlements around the nation, places for which no sources other than vast lists of placenames can be found. Even more harrowing is the prospect of sorting out not hundreds or thousands but hundreds of thousands of such questionable settlements. The sourcing being proposed here is pretty thin, in my opinion, and practically impossible to verify on a case-by-case basis. I am also quite inclined to side with those making the argument that creating thousands of articles with content such as "Place is a town in country". Allowing for a slower, more natural growth of the encylopedia - such as allowing folks to make articles on obscure places as the need/desire arises, armed with whatever information and sources they may have on hand - seems preferable to me. Finally, it seems to me that the scope of this project is just far, far too broad. Already the discussion indicates that we are running into many apples and oranges comparisons between different geographical units such as French communes, United States census designated places, Phillipino barangays, and so on. An attempt to handle these diverse problems en masse may be a futile attempt from the get-go due to the minute but important differences in the way each country handles its geographic subdivisions. Perhaps the proponents of this proposal would have better luck attempting to digest this problem in bite sized portions, engaging in the proper discourse on how to handle, say, French communes, processing them, and moving on to repeat the process with the barangays of the Phillipines. This would allow for the individual problems caused by each case to be handled in a unique fashion rather than attempting to apply the same fix to every little problem. While I must sympathise with those who are trying to go forward with this project, it just feels like they are trying to bite off more than any of us can chew and the proposal has too many problems to move forward as is. Arkyan 19:59, 4 June 2008 (UTC)
What part didn't you read about treating each country individually and working on each country at a time and gaining the respective views of the relative wikiprojects or potential natives of these countries and working together with them in planning what is notable and organizing content before hand. Of course you can't do the world in one go like that. Eahc country is different and may require some thought, and it will be planned out taking each country at time, which will be done, whether you happen to oppose to developing wikipedia or not ♦Blofeld of SPECTRE♦ 20:23, 4 June 2008 (UTC)
You might want to rethink your response there, it's coming across as being dangerously close to incivility. Are you insinuating that merely because I am opposed to this proposal I am somehow opposed to "developing Knowledge"? I resent that. Arkyan 20:40, 4 June 2008 (UTC)
  • No I cannot support There is a place for bots on this project BUT Knowledge should consist of articles that are initially started by humans who care. Indeed in my view it is often the fact that a small place has no wikipedia article that causes new editors to arise and (a) start such articles then, (b) stay on board to edit other links to that article, and (c) edit articles generally available on the project. Indeed I would go further and say it is the fact that an editor started an article that keeps them coming back and adding to that page - and places of interest, birth, death, family background, etc are often the most likely articles to cause that continued interest.--VS 22:57, 4 June 2008 (UTC)
  • Oppose, while I recognize that the idea is unquestionably in good faith, I just don't believe it's a good idea. If we really want a "gazetteer", let's do it in the proper form—a tabled list, with links to the ones that really do have more sourced information available than a few census standards. We've got enough permastubs already, thank you very much, and this will just give us more merging to do later. (If the bot could create tabled lists, on the other hand, I might well be interested...) Seraphimblade 01:11, 5 June 2008 (UTC)
  • Strong Oppose - any articles worth creating are worth being created by humans. The bot process, as I understand it, was created to automate and simplify repetitive and otherwise minor edits across multiple articles. Excluding the debated Rambot, article creation and moves were usually only permitted to bots for general non-controversial maintenance issues. I am not comfortable with the justification for this bot creating what many here - on both sides of the argument - consider to be permastubs. The bot creators say that we should just wait and let the editors come, even if it takes several years. But shouldn't that argument also be made in respect to article creation? What is being proposed now is that we don't have the time to let humans do the work of creating articles, but somehow we mysteriously are willing to wait on those same arbitrary humans to do something further, even if it takes ten years? Presumably by that time, we'll be using bots to create articles en masse on every _________ (insert anything horrifyingly large in number that might be considered notable as a whole). Perhaps we should just expand the encyclopedia until we reach a billion or two articles, that way we can simply assign every human on the planet three or four articles that they will be legally responsible for maintaining? I also find the argument that this will magically create new editors somewhat fallacious. I do have no doubts that it will lead to some new editing, but my guess is that overall the number will be somewhat negligible. It's not as if we're an underground website - last I looked, Knowledge was the 7th most popular website in the world. People will be coming to us either way, and the statistics regarding page views does not evidence a substantial number gravitating to their hometown's page (or lack thereof). I have also made suggestions in the discussion threads below regarding more efficient ways of displaying the data, whether or not individual articles are needed, and expansion of this project to a separate sister Wikimedia project more customized to the geographic needs and parameters. All to no avail. If this project is more interested in breadth than depth, then I will have to seriously reconsider my participation in Knowledge in the long-run, because I can already see the descendant bots scouring databases around the world, creating more articles upon articles, the article-count increasing by several factors, and a desert of permastubs punctuated by occasional oases of human-edited articles with real and considered content. There is no virtue in our article-count numbers, much like there is no virtue in a higher edit-count. What matters is the average quality level. Do we really want to be known as the encyclopedia which has "articles on everything, but good luck finding one worth reading"? Should we reach that point, we will have actually cancelled out our average utility - at which point it becomes more useful to use a paper encyclopedia if only because one is guaranteed substantial content once one locates an article (however few they may have). Alright, that's my peace. I await the editors who will jump on one or two of my arguments and conveniently ignore all the rest... Girolamo Savonarola (talk) 17:45, 5 June 2008 (UTC)
  • Oppose -- was a qualified oppose before (for the reasons outlined above), but moved here because some members of the project group (and not Fritzpoll who seems to be rather balanced) seem to be taking contrary suggestions as personal attacks. It suggests an inability on the part of some of the project leadership to take community suggestions in a spirit that will improve the project. -- Myke Cuthbert (talk) 01:52, 6 June 2008 (UTC)
  • Oppose The more articles we have on notable places (contrary to popular belief not all places are notable), the better and reducing systemic bias is a good thing. However, using a bot to mechanically generate these articles is fraught with problems. Firstly, the determination of what is a notable locality is not as clear-cut as many think. When developing {{Riverina}}, it was the work of many editors and much discussion to consider what would count as a notable locality. Secondly, while I am sure the bot can cater for a variety of circumstances, there are a range of different navboxes, category and stubs, locality maps etc. that vary from nation to nation, and often inside each nation. Thirdly, with the best will in the world, there will be errors that will require correction by humans; duplicate articles, incorrectly spelled names, incorrect geo-coding etc. Leave article creation to people and let the encyclopedia grow organically rather than force-grow it. If this does go ahead, it needs to be rolled out slowly and in conjunction with appropriate WikiProjects. I know I would dread this bot being let loose on Australian geographic articles. -- Mattinbgn\ 02:58, 6 June 2008 (UTC)
  • Oppose -- I really enjoy creating new missing articles (filling a red link) but don't enjoy expanding stubs. Also, the Wikipedias with a large amount of bot-generated content (for example, the Volapük Knowledge at vo:) are plainly horrible, and a lot worse than human-generated Wikipedias that are an order of magnitude or two smaller. Please make sure that the amount of bot-generated content here does not increase. Kusma (talk) 09:37, 6 June 2008 (UTC)
  • Oppose, verging on Luddite. Because of the Dr. Evils of this world, Weapons of Mass Production are inherently dangerous. The proliferation of angry outbursts and arbitrary actions like the renaming of pages and the changing of poll headings do not fill me with confidence that this apparently sensible project will be executed with any degree of good sense. Scolaire (talk) 09:59, 6 June 2008 (UTC)
  • Oppose - if there's enough information to write an article on a topic, do it (and no, census data and a map with a certain point marked do not count as non-trivial coverage in my book). If there's just that marked point, don't try to auto-generate an article. Huon (talk) 15:06, 6 June 2008 (UTC)
I'm a Luddite
Sign here if you're a Luddite.
  • Dude, mechanical looms are just wrong :) However, I do favour the human touch in almost every aspect of Knowledge, with due care to each and every thing we do. It takes a lot longer, but we're more sure of the results. Remember, there's no deadline. Franamax (talk) 04:02, 3 June 2008 (UTC)
    • I would suggest that the creation of the rambot stubs spurred alot of human editing from Americans who started discovering Knowledge in search engine results for their hometown or places in their local area. Furthermore, for newcomers, making a small addition to something that's there will often be less daunting than creating a new page, especially when they're unfamiliar with stylistic issues (not to mention autoconfirmed requirements for creating new pages that exist nowadays). Would we not see the same consequences from these stubs as the rambot stubs? There is the issue of limited returns from non-English speakers, I suppose. --bainer (talk) 05:31, 3 June 2008 (UTC)
    • (edit conflict) If each and every page that the bot produces has to be checked by a human before being created as an article, doesn't this satisfy your requirement? Yes, there is no deadline, but that doesn't mean that the work has to be repetitive and tedious, which is what creating these stubs by hand would be.--Aervanath's signature is boring 05:34, 3 June 2008 (UTC)
Aervanath, to the first, do you mean that we only require an editor to say with a straight face "oh yes, twenty stubs added per minute and I personally verified each one"? That's a recent-ongoing bone of contention of which you may not be aware. Repetitive and tedious - there's my point, if it doesn't gladden the heart, why do it?
Bainer, I take the point on encouraging contributions to expand articles, which I think is also the thrust of Aerv's comment, let's just put them out there. But to what purpose? One million more places for people to say "This town is sooo boring" or "My firend Paul is teh gay"? We end up having to massage every new editor's contribs up to scratch. The counterpoint might be, why not stub every group of four teenagers who play musical instruments? (And as far as non-English - non-issue, we have the equal responsibility to patrol all these articles for quality). Franamax (talk) 06:04, 3 June 2008 (UTC)
Responding to your first point, from what I understand about the way the bot will operate, the articles will not be created until the relevant WikiProject has reviewed the information and deemed it worthy of its own article. So, yes, the articles will not be personally verified at the very instant of creation, but the bot-created article will have been pre-verified by a human editor.
As for the vandalism, I guess there's a fundamental philosophical difference there. I don't think we should let the fear of vandalism stop us from improving the 'pedia. That feels too much like giving in to them and letting the vandals determine what we do and don't do on Knowledge. The anti-vandal bots and RC patrollers do a great job of reverting vandalism very quickly, and I don't think these new articles will impact that stellar record.--Aervanath lives in the Orphanage 10:09, 3 June 2008 (UTC)
And don't forget about Huggle :)   jj137 (talk) 14:46, 3 June 2008 (UTC)
Franamax, my point with respect to languages and limited returns was not to do with quality, rather that stubs in the English Knowledge on towns in, say, Tajikistan will be less likely to attract contributors than an equivalent stub in the Tajik Knowledge. --bainer (talk) 09:25, 4 June 2008 (UTC)
  • Actually, yes, I am a Luddite in this regard! :) If a human can't be bothered to write an article, what makes one think a human will be bothered to edit it, either? We don't solve systemic bias by having an equal number of articles about ill-covered countries - we solve systemic bias by covering those countries better - both in the articles which already exist, and - where we have enough sources to create substantial articles - by creating new articles about topics in those countries. The bias directive is to improve the coverage, not the article-count. Girolamo Savonarola (talk) 17:50, 5 June 2008 (UTC)
I disagree; if I search for my small village and see it doesn't exist on here, I'm not likely to add it - I don't know the right codes, categories and if I just say "Marutano is a small city in Sri Lanka" it will be speedy-deleted. However, if I see the current example, I'll be much more tempted to "improve" the article. And I say that as the native of a small Canadian village ;) Sherurcij 01:33, 6 June 2008 (UTC)

Discussion

So we just scrap the previous discussions because you didn't see it? Seems strange, at the least. I still oppose the whole plan, per the reasons I already gave. Thanks for the opportunity to voice my opinion, again. IvoShandor (talk) 13:41, 2 June 2008 (UTC)

No, they're on the talk page - people coming to the page afresh won't see the adjusted proposals, and at well over 200K, an archive seemed appropriate Fritzpoll (talk) 13:43, 2 June 2008 (UTC)


Better. However, I still oppose the concept that an arbitrary population size makes a location notable. Locations are notable because things are there or happen there that get discussed directly and in detail by independent third-party sources. That's it. Nothing else counts at all. One liners, inclusions in lists of locations, or being in an atlas doesn't confer notability. Direct, detailed discussion in third-party sources is the only criteria that matters at all.
As for balance, I wouldn't mind if someone created a bot that searched for bot-created geographic locations that have never been edited by humans and deleted them. That would repair the error that was made when bots created innumerable articles for non-notable US locations.Kww (talk) 13:45, 2 June 2008 (UTC)

I agree with archiving. I'm adding some more specific subsections below. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)
Kww, I think the criteria for notability is going to be set by the participating WikiProjects. The criteria for Russia may differ from the criteria for Egypt, and none, one, or both may use population as a criterion, depending on the will of the editors involved in those projects. davidwr/(talk)/(contribs)/(e-mail) 13:56, 2 June 2008 (UTC)
No wikiproject should set a criteria for inclusion that is below the general standard, only above. Generally, the project guidelines are shortcuts: if an album meets WP:MUSIC, it is exceedingly rare that it doesn't also meet the general notability guideline. Projects can't override WP:N to suit their particular fancy, they can only provide additional criterion. To do otherwise allows us to have Pokemonpedia, because a small group of editors decides that "included in the Pokemon Index" is sufficient to be notable. This is going down the exact same bad path: appearing on three maps, or in three databases, may make information verifiable, but does not make it notable. I've got no problem with someone saying that towns below 10,000 are generally not notable, but that is not the same as saying that towns above 10,000 are notable. Many towns are completely non-notable, and have no reason to be included in an encyclopedia. Atlases, sure, but encyclopedias, no. Kww (talk) 15:13, 2 June 2008 (UTC)
I think what I was trying to aim for by having a coimmunity discussion was to establish the community's guidelines for how notability should be determined. I said population as a suggestion to get the ball rolling, and because it's easy. However, we should be open to any other suggestions for defining it. I certainly wouldn't want WikiProjects overriding our notability guidelines - I just wanted to establish what these guidelines should be if the bot is to run. My proposal is a flexible thing - that's how I hope to reach consensus Fritzpoll (talk) 15:20, 2 June 2008 (UTC)

The watchlisting bit should be removed. Special:Unwatchedpages is already pretty useless, adding them to the bot's watchlist is just a waste of server space and could be misleading if the page is ever made useful in the future. Mr.Z-man 19:18, 2 June 2008 (UTC)

Size limits

I would like to see the bot operate with an absolute size limit for population, and move that limit downwards over time. As an initial number, let me toss out 10.000 - that would make sure there isn't two million entries created in the first round, and increase the consistent coverage of Knowledge. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)

Assuming I read the revised proposal correctly, there is no "first round" - but rather lots of "first rounds," one for each country or other area covered by a Wikproject. The "first round" of country A may include all places no matter how small, if that is the will of the editors. Another country may have a population cut-off of 100,000 if that is the will of the relevant Wikiprojects' editors. davidwr/(talk)/(contribs)/(e-mail) 13:58, 2 June 2008 (UTC)
I think I was suggesting that whatever the community at large decides is suitable is what the bot will do. If the community wishes to defer these decisions to the Wikiprojects, then it can. Fritzpoll (talk) 13:59, 2 June 2008 (UTC)
I like the idea of having individual WikiProjects decide what the threshold should be. Resolute, Nunavut appears on many world maps as a dot at the far northern edge of Canada. These same maps may omit many other cities with a population of 100,000. Our own article on this hamlet of 229 is pretty good. So why is it included on so many global maps? Because it's a hub for people living, working and exploring within a sparsely populated area of thousands of square kilometers. On the other hand, we had the whole brouhaha of the Gnaa, Nigeria article in 2006. Initially started as an exercise in trolling by a trolling group also known as GNAA, it was almost impossible to kill because we didn't want to delete articles about towns in underserved areas. Eventually it got deleted -- see the talk page discussion at Knowledge talk:Articles for deletion/Gnaa, Nigeria (3rd nomination); what was initially thought to be a town of 6,559 based on Falling Rain Genomics data really turned out to be farmland when viewed on Google Earth. Even Falling Rain's data made the same point once it was properly interpreted. No Nigerian Wikipedian would have advocated the retention of the Gnaa article but probably most Canadians would advocate keeping the Resolute article. --A. B. 18:27, 2 June 2008 (UTC)
Disagree with size requirements. Niue has around 20 settlements on the island, with about 2000 people. In this case, most of the population emigrated to New Zealand for work. However, prior to the emigration, the locations were often much larger. This is one of the big problems with trying to impose size limitations. When generating articles for US locations, which often have much shorter histories, the bar was set at a population of 3. I could accept using that standard elsewhere as well. John Carter (talk) 19:11, 2 June 2008 (UTC)

I don't strongly favour fixed population limits and like the idea to get guidance from WikiProjects. However, I do think we need a community-wide approximate default, and that variations around this default (both wholesale changes to the default, and individual exceptions) should be based on the availability of reliable secondary sources. I see the need for percentages for smaller countries, but a percentage doesn't work well across the spectrum from Niue to China: cutoffs are needed. I also think that percentage of total population may be a safer measure than percentage of capital city. I would propose something like the following (the specific figures are just to make the principle easier to understand):

  1. For countries with a population over 100000, the minimum* population allowed should be 1000;
  2. For countries with a population between 2000 and 100000, the minimum* population allowed should be 1 per cent of population;
  3. For countries with a population of less than 2000, the minimum* population allowed should be 20.

The * is to emphasise that this can be an approximate default. From the point of view of notability, it is important to get the figure "20" right, and this is best left to WikiProjects to demonstrate their sources. From the point of view of not flooding the encyclopedia, it is important to get the figure "1000" right. (My belief is that 1000 is a safe figure here and will result in less than 200000 new stubs, but a higher figure is fine with me: 10000 is certainly safe.) The percentage scale interpolates between these figures in a way that treats medium-sized countries fairly. Geometry guy 22:29, 2 June 2008 (UTC)

I like the idea of having individual WikiProjects decide what the criterion for inclusion should be, but I don't think that it has to be population size. So no approximate default, please. It is possible that some projects will prefer to have only the articles that are not orphaned, others will prefer to have only the articles with interwikis, others will prefer tables with redirects etc. Colchicum (talk) 22:38, 2 June 2008 (UTC)
It is only an approximate default minimum that I want: WikiProjects can freely ask for many fewer articles to be created. What we need to protect against is WikiProjects generating too many articles without sufficient justification. Geometry guy 22:41, 2 June 2008 (UTC)
Well, yes, but this approach is flawed. E.g. Russia has a population about 30 times larger than that of Finland. Now do you really think that a settlement with a population of 1,000 (or 10,000, or 100) should be notable enough in Finland and not notable 30 km from there in Russia? Why is it the size of the country that matters? Colchicum (talk) 23:17, 2 June 2008 (UTC)
I think you are contributing in write-only mode, with is a pet hate of mine. Russia, population 142 million, Finland, population 5.3 million. Both much bigger than 100000, so, according to my proposal, the approximate minimum size would be the same (I suggest 1000). So your comment supports my approach, and my modification of the initial proposal. Thanks, Geometry guy 00:03, 3 June 2008 (UTC)
What about France and Monaco? I still think that a settlement is always notable enough if the article has interwikis or links pointing to it, regardless of the population size. And I don't understand why Fritzpoll thinks that a country's development can be measured by its population. Cf. Andorra vs. Sudan. Colchicum (talk) 10:03, 3 June 2008 (UTC)

I'd rather not have any population limit; Amorita, Oklahoma (for example) is a perfectly fine stub to build on, and it's likely a better stub than what anyone who was going to create the article would create.--Prosfilaes (talk) 17:52, 4 June 2008 (UTC)

Relation to existing stubs

There are many place stubs with low quality, and lots of information (census data, references, maps) that FritzpollBot can do better than a human editor. I suggest that for existing places, FritzPollbot place a subpage of the form /AutoGenerated with what it would have generated, and a human editor can "pull up" the stuff that he likes. --Alvestrand (talk) 13:49, 2 June 2008 (UTC)

Yes, although actually I'd prefer to see the bot (in a month or two, when this first part is going swimmingly) start to stuff many of the existing stubs into internal sections of its own content, after its one-line lead (rather than mess with subpages, which are widely deprecated in mainspace). It would be able to handle most oddities like additional photos or maps, the stub template itself, and other markup. E.g. if stub says only "Obscure village Y was the birthplace of famous person X" with stub template, the text becomes new section 1 with a neutral title like "Details". The bot could defer complex cases back to the projects. JJB 14:20, 2 June 2008 (UTC)
Better, when going through the data before the article-creation run, editors could be able to signal that they would like to see an autogenerated version (created at, say, Wikiproject:FritzpollBot/Foobar/Foo, for the article named Foo in country Foobar) - these can then replace or merge into the current versions where necessary. It'd be a bit of a waste of time in my view to create thousands of repeat articles in cases where we already know we wouldn't benefit from it. Pfainuk talk 14:27, 2 June 2008 (UTC)
For the record, this can be easily done, and I'm perfectly happy to implement it Fritzpoll (talk) 14:57, 2 June 2008 (UTC)
This sounds like a good idea, but think this can be done most simply in articlespace. How would an uncreated article get linked to an autogenerated draft, and how would a "random" user discover this draft in order to benefit from it? I think this would just double or triple the number of pages necessary to add these articles, and I think there are other ways to improve this proposal. --NickPenguin(contribs) 03:11, 3 June 2008 (UTC)

Global coordination

I think we should have a Wikiproject page for the bot itself, where global issues are addressed. This should link to other participating WikiProjects.

There should also be a "single place to go" to see all upcoming scheduled runs of the bot, with advanced notice of at least 24 hours for large runs and at least 30 minutes for runs larger than a handful of articles.

This page should also have an emergency-stop switch that would stop the bot and suspend all upcoming runs.

davidwr/(talk)/(contribs)/(e-mail) 14:10, 2 June 2008 (UTC)

The WikiProject idea is point 1 of the new proposal. Advanced notice is also suggested, thought the timing is open to debate. Good idea to include a schedule within the new WikiProject page. A note to me would prevent all upcoming runs as the bot is supervised, and not fully automatic. A stop switch can be added with relative ease - a block by an admin would have the same effect. Fritzpoll (talk) 14:14, 2 June 2008 (UTC)

Thoughts

I might support this (notability has never seemed a problem to me), but I still have questions. 1) Could you please elaborate how it is possible to make sure that all the produced pages will be watchlisted by somebody with enough expertise to detect misinformation in the future? It is still not very clear. Authors normally know something about the places they write about, randomly assigned volunteers don't. 2) What is wrong with the suggestion to create lists with redirects instead of stubs? They are much easier to patrol. 3) How will it be possible to clear the backlog of orphaned stubs? And if we are going to create lists anyway (which may be useful for other reasons, but is an artificial and wrong way to clear the backlog), why do we need the stubs? Colchicum (talk) 14:16, 2 June 2008 (UTC)

Well, because we would seek volunteers from WikiProjects associated with the articles being created and get them heavily involved in the process, they can watchlist the articles. That's one reason why, if there are no volunteers from areas of the encyclopaedia that would be interested in such articles, I don't think we should create them until such volunteers exist. Lists with redirects are perfectly acceptable, but I think from a content point of view, this could be decided by the WikiProjects - how much are they willing to maintain, etc. Certainly, there is no technical objection to this. The lists are purely to administer the bot's operation, so clearly shouldn't count towards clearing the backlog, as ultimately they may well be removed. I think discussion on a country-by-country basis will allow a decision on how to "add to the web" - I see the bot as more of a tool for people to use as they wish - sort of a builder doing the work of some architects, eahc of whom wants it sone slightly differently. I'm not sure if I answered everything - let me know if I didn't Fritzpoll (talk) 14:23, 2 June 2008 (UTC)
1) It will be necessary to write some script which would enable the volunteers to watchlist the pages (selected on the basis of the regional categorization they like) quickly.
2) As to lists vs. stubs -- ok if the relevant projects are explicitely, fairly and without bias informed of both options and the pros and cons raised in this discussion.
3) I suggest to notify WP:O and to create a tag that would place the article in a special subcategory of Category:Orphaned articles for bot-created orphans, so that your bot could immediately tag the pages it creates as such if they are orphaned. We shouldn't clutter the list of human-created orphans. Colchicum (talk) 14:57, 2 June 2008 (UTC)
2 and 3 are simple to do, and will be done. 1)....well, I can certainly output a list based on a user's query and then they can copy and paste the list into their watchlist. Sound good? Fritzpoll (talk) 15:00, 2 June 2008 (UTC)
I have created {{geo-orphan}} to tag the orphaned ones. The tag will place articles in Category:Orphaned articles about a place--Aervanath lives in the Orphanage 05:55, 3 June 2008 (UTC)

Italy experience

I've a similar experience to Llywrch's Ethiopia experience reported above - this year I've spent a lot of time sorting out the place articles in Italy. Which as countries go should be as good as it gets - a highly developed, top 10 economy with one of the most active Wikipedias at it.wikipedia.org, Internet availability is not an issue, massive emigration to anglophone countries and a popular holiday destination to ensure lots of English speakers are interested in it, plus in the top 5 for length of recorded history and archaeology. Everything is in your favour, it must have some of the most notable territory on the planet if you define notability by the availability of sources. And yet....

...notability for Italian settlements seems to dry up somewhere around the 2000 population mark, even on it.wiki where they have all the "home team" advantages. On a consistent basis anyway, that's just an observation of what seems to "work" out in the field. It gets pretty patchy below that, and often the smaller villages work better merged into a larger administrative area, often the most notable thing in an area doesn't fit neatly into a village in any case. There are advantages in having a broad picture in a single article rather than lots of microarticles. But for the rest of the world, I'd suggest a mental model of somewhere in the 5000-10,000 population mark as being the lower limit for consistent notability, which is nice because that typically corresponds to a unit of municipal administration - the township, commune, municipality, district or whatver. In Italy the equivalent level is the comune - the population of 60m people is spread over 8,101 comuni, an average of 7,400 people per comune, although that's distorted a bit by the likes of Rome and Milan, I guess the median is about 6000 people. That's the level of local administration that has a mayor, town hall etc, and is the level you should be aiming for - we (now, finally!) have an article for all 8101 of those Each of those comuni typically has a couple of subdivisions (frazioni, località and the like, approximating to wards or parishes) so there must be 40,000+ of those - but en.wiki only has about 200 of those, and it.wiki has about 1700 but a lot of those are either- I'd guess that long term the "right" number would be somewhere around 1000 out of 40,000+. However one of my long-term aims would be to create redirects for all 40,000-odd frazioni to the appropriate comune article.

Honestly, the comune articles just seem to "work", whereas going down to frazione level tends not to - there are real synergies in not fragmenting the information too finely, and information about the frazioni is still on Knowledge, it just gets a bit more context from being in the comune article. And that's just from the perspective of Italy, which has so many advantages in proving notability from WP:RS. Perhaps the sensible thing would be to go down to township level for countries like Italy, and down to the step above (counties/provinces/the unit covering 50,000-100,000 people) for countries where WP:RS are more scanty. That way every point on earth is covered by a Knowledge article, but that article has a chance of being notable in its own right. The debate is not about comprehensiveness, but about the granularity of that comprehensiveness. Obviously in almost every case (perhaps barring the likes of Nauru?) the granularity should be below "country" level, but personally I think "village" level is too fine a cut.

And I don't like using geographical entities such as "village" - it doesn't guarantee the WP:CSB comprehensive coverage of the surface of the Earth we want, that you get if you use administrative areas. Governments tend to make sure that all points of their territory are covered by an administrative division, but that's not the case if you rely on geographical points such as "villages". In particular not if you rely on Maplandia Level 3 data. To take a random example, I was in the province of Asti a few months ago. The territory of that province is covered by 118 comuni, which are listed at Category:Communes_of_the_Province_of_Asti. Compare that with Maplandia's coverage of Asti - just in the A's it lists only Asti itself out of 6 comuni beginning with A, and Aie, which I guess is the suburb of Bruno (comune population 379) that Google shows as Borgo Aie some way to the north of where Maplandia has it - even it.wiki doesn't have an article. So a hit rate of 1 in 6, in one of the most developed regions of western Europe! However at first glance their provinces at Level 2 look OK. So another reason to concentrate on getting comprehensiveness at Level 2 level, you can probably hope to get that reasonably "right", but Level 3 will need country specialists.

And I would echo Llywrch's comments about the amount of work needed to produce a "clean" list of articles - even in Italy where the government provides handy lists of comuni in Excel format, it still took a lot of work to knock the articles on it.wiki and en.wiki into shape. Despite all the work that had been done on Italian places, there were a few articles missing because people had seen blue links from articles on everything from Chicago supermarkets to Latvian goddesses. And that's before you get onto the typos in the official list, variations in accent use, renamings, a morass of local dialect names for places, bilingual names, and of course WP:COMMONNAMEs in English. The data needs a LOT of cleaning to get it "right", even when it's from an authoritative government source with no problems about transliteration from a non-Roman alphabet.

So has someone who's done 1000's of edits to "place" articles in a country where WP:RS are as abundant as anywhere, I would beg you to :

  1. Use administrative divisions - the only way to ensure comprehensive coverage of the Earth's surface
  2. Using arbitrary "places" or "villages" doesn't help WP:CSB, as they aren't comprehensive
  3. Don't use arbitrary population limits, aim to get 100% coverage of a single administrative level.
  4. WP:RS may mean that "Level 2" divisions are what you should aim for with comprehensive coverage on Knowledge - counties/provinces or similar
  5. Even in Western Europe, you should only go down to township/commune level for comprehensive coverage
  6. Don't use Maplandia Level 3 - the data sucks
  7. Articles at the parish/ward level only "work" for a relatively small percentage of places, and the township article is improved by having the parishes all in the same place
  8. Be great if you can make redirects of the parishes
  9. Please tag the Talk page with an appropriate country Project - much easier to have that done at the start.
  10. Beware misleading blue links - a check of the appropriate Categeory:Townships in... can help you catch those

But as long as you don't go mad going down to village level, I fully support the intent, albeit quite not the letter of how the proposal is currently phrased - good luck. FlagSteward (talk) 14:28, 2 June 2008 (UTC)

Thanks for some of the tips. Using census data should help achieve many of the aims you specify. Maplandia will not be used in this new proposal. Redirects are possible if the WikiProject desires them. Tagging the talk page is also no problem. Misleading blue links will be picked up by the last stage, where the uploaded lists are checked. Thank you for your thoughtful response - I hope some consensus on what depth to aim for can be achieved Fritzpoll (talk) 14:35, 2 June 2008 (UTC)
I would also prefer using administrative/governmental/political (whatever you want to call them) divisions to determine which articles to create or not. Political divisions are more likely to reflect the history of a place (and therefore notability) more accurately than simple population figures.--Aervanath lives in the Orphanage 06:33, 3 June 2008 (UTC)
I endorse the critique above using Italia as an example. I have a couple of reservations that I would like to air.
  • Whose census data- a census is a political act, and this introduces bias.
  • Choice of a capital city is a political act- so using it as a metric for notability seems less than rigourous.
  • The Mainz/Mannheim question. Which is the most notable: a city of 200.000 or one of 372.318? The length of the articles could indicate the answer. But an arbitary limit on 250.000 or 300.000 would have excluded one but not the other.
  • Please can we have a well behaved bot that leaves us with an infobox, checked geocoordinates , and a subsection with = = References = = {-{reflist}-}
ClemRutter (talk) 13:21, 3 June 2008 (UTC)
The last point remains implemented, per the very original idea behind the bot. The metric for notability was deliberately vague so as to stimulate discussion - it was not clear from the now archived discussion (see the talkpage) what ideas people had about this. It might be best to allow WikiProjects some leeway in determining notability for their area in conjunction with any other interested editor. The reliability of sources used would have to be assessed on a case-by-case basis - insufficient reliability would mean that the creation of articles for the country by the bot would not go ahead Fritzpoll (talk) 13:29, 3 June 2008 (UTC)

Precedent

Is there any precedent for roboticly-adding geographic stubs, even on a small scale? If so, can we learn lessons from it? How about non-geographic stubs? davidwr/(talk)/(contribs)/(e-mail) 15:03, 2 June 2008 (UTC)

User:Rambot (lots of geography stubs), and User:Polbot (some US politicians, and then lots of animals and plant species), are the two I can remember. There must be others as well. Carcharoth (talk) 15:28, 2 June 2008 (UTC)
From what I've heard, Rambot did it without the human intervention that is proposed for this bot. Not sure though Fritzpoll (talk) 15:31, 2 June 2008 (UTC)
According to this, Rambot add ~30000 articles in an 8-day period. With 86400 seconds in a day, 8 days, and 30000 article to create, this works out to 23.04 seconds per article. (86400 X 8 = 691200 seconds / 30000 = 23.04) I think it's safe to assume this was done without human intervention. Thingg 16:13, 2 June 2008 (UTC)
Well, I can do simple AWB jobs at 6 articles per minute, checking for errors and skipping problems on the way. Give me 23.4 seconds and I have enough time to correct a small error on each edit. :-) Unfortunately I can't do that 24/7 :-) Seriously, though, all bot tasks involve human intervention and human oversight: the bots are error tested, the operation is monitored by the operator, and the bot flags up exceptions when human intervention is needed (or skips and writes to a log). What makes a task a bot task is that most of the time the bot autosaves the edit, without a human check before each save. In the case of this bot, my understanding is that, after much careful preparation, human intervention and error testing, the actual run to generate the stubs will mostly involve autosaving like any other bot task. But please correct me if I am wrong. Geometry guy 21:52, 2 June 2008 (UTC)
You've nailed it exactly, much more succinctly than I could Fritzpoll (talk) 11:39, 3 June 2008 (UTC)
heh. wow did I get owned... That is kind of what I meant btw. I think... ¬_¬ Thingg 13:20, 3 June 2008 (UTC)

How to handle blue-links

I would recommend that instead of "just skipping" blue-links, blue-links can be logged with the blue-link-article sizes, and checked for "#redirect." Redirect- and too-small articles - which might turn out to be garbage articles or stubs of inferior quality to the bot-generated version - can then be visited by hand and if appropriate, replaced by or merged with a bot-created version. davidwr/(talk)/(contribs)/(e-mail) 16:01, 2 June 2008 (UTC)

Yeah, sorry - forgot to mention this. As a rollover from the old proposal, this logging mechanism will still be in place (as it took long enough to write it) Fritzpoll (talk) 16:03, 2 June 2008 (UTC)
All blue-linked articles should eventually be checked by hand to make sure they aren't something completely unrelated. The town of Big River Dam (England) might be a blue-link for the dam, not the town. This can be partially automated by checking to see if the blue-link is in a "cities and towns in " or similar category, but even that will have false positives and false negatives. davidwr/(talk)/(contribs)/(e-mail) 16:04, 2 June 2008 (UTC)
Also, a lot of this will hopefully be avoided by human checking - but the bot will still catch human errors using the logging described Fritzpoll (talk) 16:06, 2 June 2008 (UTC)

Talk pages

The bot should create talk pages with the appropriate wikiprojects filled in. It should also add a first section noting the original stub was created by a bot. davidwr/(talk)/(contribs)/(e-mail) 16:01, 2 June 2008 (UTC)

Yes, again, consultation with the WikiProjects here is essential. A small tag from the Bot's WikiProject should suffice as the bot notification? Fritzpoll (talk) 16:05, 2 June 2008 (UTC)

Name conflicts and disambiguations

If a country has two places with the same name, we need to flag that for human intervention and/or have an automated way of handling it. It's not inconceivable that two towns in two different counties or provinces will have the same name. If the country's naming convention is "townname, countryname" that will be a conflict. This will also be a problem for blue-links, where one of the places already has an article and the other does not. davidwr/(talk)/(contribs)/(e-mail) 16:08, 2 June 2008 (UTC)

As people examine the lists of places, these should become apparent. It's why the bot will list places in alphabetical order by country - the duplicate names end up next to each other. Disambiguation then is a breeze, existing articles can be moved to the disambiguated name, and ta-da! Then the bot doesn't have to try to do this either Fritzpoll (talk) 16:36, 2 June 2008 (UTC)
What "lists of places" are you talking about? Kaldari (talk) 18:52, 3 June 2008 (UTC)
The Knowledge:WikiProject Missing encyclopedic articles/Places page I think contains the various extant lists of populated locations, with several red links for others which haven't been started yet. I'm fairly sure that's what he meant. John Carter (talk) 18:56, 3 June 2008 (UTC)

I am an English teacher who has students create articles for municipalities/towns. I am trying to standardize how we deal with towns that have municipal status (see my talk page for details). At the moment, I have students combine town/municipality as governmentally they are the same and most often have the same or very similar names. I would have to imagine that this bot would generate pages using the town's formal name? If so, this would not be a problem as this is how I am titling the pages I have created although some disagree with doing this.Thelmadatter (talk) 14:48, 5 June 2008 (UTC)

Percent of capital city is not the right metric

The size of capital cities depends strongly on whether the city is also the commerce center (such as London, Paris, Tokyo, Moscow) or just a government center (USA, Germany before unification). The difference can be more than an order of magnitude (Bonn has 300K, Berlin 3M people). Better, perhaps, would be a percentage of the largest city. LouScheffer (talk) 16:32, 2 June 2008 (UTC)

Good point, and I support percentage of the largest city as the right choice for the project. Maybe start at 25%?? Cheers, Pete Tillman (talk) 22:13, 2 June 2008 (UTC)
I don't think population should be the metric at all. Political divisions (as I said above) are more likely to reflect the notability of a particular location. My preferred limit would be creating articles down to (at most) the 2nd-to-lowest administrative level. In most cases, I'm sure even that far may be too low for notability purposes.--Aervanath lives in the Orphanage 06:41, 3 June 2008 (UTC)
City population distributions are decidedly non-linear. If you were to apply that criteria to the United States, we'd have articles on New York City, Los Angeles, Chicago, Dallas, Philadelphia, Detroit, Houston, and Atlanta. If you want to go with relative population, I'd recommend starting at either 1% or 0.1%. --Carnildo (talk) 21:01, 3 June 2008 (UTC)
I think a town of 5..10K population in inherently notable. We can really expect that over centuries somebody was born there or died there or there was a battle nearby or a historical monument or something. Assumin the major city of 1M poulation, 5K is 0.5%, Carnildo is probably right. I think we can say that the bot put articles about each settlement of more than 5K population. Alex Bakharev (talk) 12:52, 6 June 2008 (UTC)

Inherent notability

Since a lot of people, including the proposer, are prepared to jump on the bandwagon of "inherent notability" of places, I want to state my arguments that such a concept should not exist. Here is the comment I left at the old discussion:

Here is a comment on "inherent notability" which I left at Knowledge talk:Notability (Places and transportation)#Inhabited places: Regardless of whether it is the community's norm (and regardless of whether it is already done in other policies), I do not think it is a good idea to declare, merely by administrative fiat, that a broad class of topics is inherently notable. This amounts to an abdication of responsibility for the quality of the encyclopedia. Consider how notability can be proved absent such a declaration: given a topic, one consults first primary sources which document it, and then secondary sources which, in commenting on the primary sources, relate it to other ideas and provide the crucial aspect of notability: human commentary. This is why Knowledge is a tertiary source, merely weaving together the contents of secondary sources as a unified testament to human interest in a subject. The insistence on secondary sources is at the core of two of our core policies: no original research, and notability. By requiring that we rely on secondary sources, we force our articles to incorporate only documented facts and opinions, preventing us from being what we can never successfully be: a forum for the original publication of new ideas. But it also ensures that we only write about topics which have been demonstrated to matter outside themselves: that's what the existence of secondary sources (commentary) proves. That's the basis for our notability criteria. By declaring something to have inherent notability, we give license to circumvent secondary sources and, therefore, sacrifice true notability. Essentially, a permissive notability policy is original research.

In addition to this, I have also said that in my eyes, a place article whose only contents are rote information like coordinates and population (or even elevation, etc.) can be considered to be no more than a dictionary definition of the place, and should be treated like we treat dictionary definitions of words: deleted. The principle, here and in the above paragraph, is that the only information which is notable according to WP:Notability is that which comes from secondary sources, and therefore represents some kind of human interest in the subject.

On the other hand, geographic locations represent a large body of formally similar subjects, of each of which there is potentially a lot to say; it is dishonest to claim that two similar towns, one with an article and one without, truly differ in notability, but probably more in the amount of expertise of our editors. The question here is how to identify two locations as being "similar", and how to provide for all similar locations the kind of minimal acceptable notable information that we know must exist. Since this information is not rote, by definition including it will require human intervention, but there is still a place for a bot to collect the basic rote data and form the stubs for each article (which must then be raised to acceptable standards by people). Since the resources available to contribute to articles in various countries or regions varies (that is, the resources of the geographic WikiProjects), the determination of "granularity" of coverage must be made on this basis. The observations made in #Italy experience about the utility of administrative regions seem to me to be the key to reconciling these requirements.

Here is my suggestion:

  1. Article creation will proceed on a country-by-country basis, but it doesn't have to be sequential.
  2. For each country, the Wikiproject's willing members will be tallied and the number will be compared to a list of administrative regions of the country, at various levels of granularity. The generation of articles will proceed at the finest level of granularity for which there is sufficient membership to ensure some kind of individual attention to each article.
  3. The members of the WikiProject will gather reliable secondary sources on these regions. Once a sufficient guarantee of notability is obtained for each region, the bot can generate stubs for all of them in the format currently proposed (coordinates, population, etc.)
  4. After the stubs are created, the WikiProject will go to work on them, adding truly notable information from their sources.

The ordering of points 3 and 4 is important: we should not encourage the indiscriminate creation of articles for which there is no guarantee of notability! However, the services of a bot in doing work that no human would want or be able to do are invaluable in facilitating the discriminating creation of articles whose expansion is certain. This method is a lot of work, to be sure, but it will do the most justice to each country according to the maximum extent of the resources we have available to us. Once this "first pass" is completed (hopefully within my lifetime :) we can move on to finer granularities; quite possibly, this project will attract attention to the geographical WikiProjects and there will be more participants the second time around.

I admit that this will not counter the systemic bias we already have to the same degree that the original, contentious proposal would have. We will not achieve the same granularity all over the world. Like I said in the first discussion, this is an inevitable consequence of the inherent bias in the world concerning reliable, usable, and available sources for English-speaking readers and editors, and the inherent bias on en.wikipedia of editor expertise. However, it will combat this bias to the fullest extent possible under these circumstances and provide a fair shake for all parts of the world. By covering each country by administrative regions, we create a framework for further expansion and a convenient measure our coverage. As the editor bias is alleviated, the bias in sources will itself lessen (because a broader experience among editors will open up more sources to us, and of course, over time more sources will appear), and the bot can be run again to achieve greater coverage.

We can never achieve good encyclopedic content from a fully mechanical process. The bot should be used to make our work easier, not to replace it entirely. Just having articles named after these places is not enough: the articles should be on each place, and that takes human participation. Ryan Reich (talk) 16:21, 2 June 2008 (UTC)

Hmmm...implementable. If consensus is reached that this is the notability requirement I allude to requiring in the body of the proposal, then it will be implemented Fritzpoll (talk) 16:55, 2 June 2008 (UTC)
I completely agree with Ryan: inherent notability is a dubious concept for a source-based encyclopaedia. Notability, as it presently exists in Knowledge, encapsulates the more important ideas that we need to be able to say something about the subject, and that this "something" needs to be verifiable through secondary sources. We need a human to determine whether sufficient source materials are available to make this is possible. Jakew (talk) 17:07, 2 June 2008 (UTC)
"Notability" is a dubious concept for an encyclopedia. I suspect the whole concept was invented because people were tired of seeing articles for MySpace bands. If information has to be verifiable through secondary sources, you're talking about coverage — and coverage does not make something worthy of notice. You think parts of the human body are not inherently notable? You think languages are not inherently notable? You think lifeforms on this planet are not inherently notable? The idea that information has to be verifiable and information has to be worthy of your personal attention are two totally different things. Notability is a perception, not a reality. --Pixelface (talk) 07:06, 4 June 2008 (UTC)
Although I certainly wouldn't (and don't) support having a separate article on every named community in existence regardless of its actual legal status or availability of sources, my view (at least as pertains to places in Canada, with the acknowledgement that it may not easily translate to other places due to differing structures of government) is that any place that is legally incorporated as a municipality (city, town, etc.) should have an article, and smaller communities or neighbourhoods within that municipality should be redirected to the municipality unless and until there are sufficient sources for the communities as independent topics. As far as I'm concerned, if a place is incorporated as a municipality, then properly sourced census data and/or the municipality's own website are, in and of themselves, sufficient sources to justify the creation of at least a stub, although obviously additional content needs to be properly sourced as well. I do, however, absolutely oppose any notion that any notability standard beyond legally incorporated as a municipality by the laws of its province or territory can or should be used to distinguish "notable" from "non-notable" municipalities. Unincorporated places are a different story, certainly, but at least for Canada the rule needs to be that if it's incorporated as a municipality, then it definitely gets an article even if it has a population of two. Bearcat (talk) 17:17, 2 June 2008 (UTC)
Essentially, you are proposing that government warrant of incorporation is a secondary source (hence, sufficient to establish notability). I don't think this is the right standard for an encyclopedia, because it is indiscriminate. The use of the word "notable" is unfortunate in this context because it seems to imply that a non-notable town is without value, whereas what it really implies is that the town is simply not the subject of enough external recognition to fill its own article. You are advocating that every incorporated town get an article, even if only the fact that it is incorporated is in there; thus, the Knowledge geography articles will constitute an atlas. If this is what you want, then I agree with MickMacNee below that we need an entirely different Wikiatlas or whatever for this sort of thing. As he says, and I agree, "knowledge" is not "information" alone, and an encyclopedia demands more. Ryan Reich (talk) 17:41, 2 June 2008 (UTC)
No, I'm stating that those are starter sources which are, in and of themselves, sufficient to justify the start of an article. No incorporated municipality in Canada is ever going to be entirely unreferenceable beyond that — by the very nature of being a municipality, media sources will exist: mayors get interviewed in newspapers and on TV. Events happen. Famous people are born there. Museums and community festivals take place. And on, and so forth. The sources simply are not a problem. But people are much more likely to add new content and sources to an article that already exists than they are to create and properly format a new one from scratch. The fact that the process of starting a brand new article from scratch on an incorporated municipality is kind of time-consuming and exhausting is precisely why not all Canadian municipalities are already done, which is why there should be a way to automate the parts of the process that can be simply filled in by a bot, such as filling in a basic infobox. There isn't a single incorporated municipality of any size on earth whose article is entirely unexpandable beyond what a bot can automate, but the article needs to start somewhere, and there's no reason why census-data-plus-municipal-website should be insufficient as a start. Bearcat (talk) 21:24, 2 June 2008 (UTC)
The example of the US towns generated by Rambot suggests that you are overly optimistic. Examples are floating around this page of articles that, in fact, have never received human attention in the more than six years of their existence. A reasonable application of the notability standard would conclude that they do not meet it (and would have concluded it upon creation; the intervening years simply show that what you hope could come to pass, in fact, does not), and the same will be true of Canadian towns, African towns, or farm houses in the Caribbean. If, as you claim, the sources can always be found, then it is your responsibility to find them when the articles go up; otherwise, WP:N is not visibly satisfied, WP:V is certainly not satisfied, and the article should be deleted. My proposal is simply that the bot be made available to the Wikiprojects to generate templates for facilitating the start of articles, exactly as you want, and that some organizational effort be made so that articles be started in parallel at a particular level of granularity across each country, and with notability claims as we require of everything else. I have also said elsewhere that simply because people may be more likely to improve some stubs than to create the article themselves does not mean that if we create every possible stub, that most of them are likely to be improved, and in the mean time, their existence is a policy violation, a vandalism problem, and a blight. In short: I approve of the automated part of the process; I simply disagree with you about where in the process that automation should occur. Ryan Reich (talk) 21:39, 2 June 2008 (UTC)
I'd like to note that I started to identify and list Canadian municipalities that are still redlinked on Knowledge yesterday, and the process has already resulted in two new, well-referenced and well-formatted articles about Canadian municipalities — Bristol, Quebec and Sutton, Quebec — getting created by human editors today. So this process also serves to help editors identify and work on missing topics in advance, as well. And it's every Wikipedian's responsibility equally to add content and sources to any individual article. Nobody has any special responsibility for any individual article beyond ensuring that the content that they do personally add is properly sourced. A clearly-defined and organized list that exists for a WikiProject to collectively review for inclusion of further sources, yeah, absolutely. That's already one of the inherent results of this project as proposed in the first place. But it's not my personal responsibility to go out of my way to hunt down unsourced articles on Knowledge just for the sake of adding sources to them. My only responsibility is to properly reference the content that I do add to the articles I choose to work on. Bearcat (talk) 21:47, 2 June 2008 (UTC)
Also, there's a big difference between "unexpanded" and "unexpandable". The former does not automatically imply the latter. Bearcat (talk) 22:04, 2 June 2008 (UTC)
We agree entirely, but are somehow still arguing. We agree that the process of automatically making available to editors a list of needed place articles facilitates their creation. We agree that when that process is started, the resulting articles are well-sourced and filled with nontrivial facts. I apologize for suggesting that it is your personal responsibility to find these sources for every such article; the point I wanted to make was that the success you demonstrate with Bristol, Quebec, for example, is the result of a human editor stepping forward to take responsibility for an article and substantiate the supposition that notable facts exist about this town. You say that your only responsibility is to properly reference the content you add to the articles that you choose to work on; consider, then, that the responsibility of the users of this bot is to do this for every article it creates. Your examples strengthen my belief that the use of this bot must be coordinated with interested editors who will do for the stubs it creates exactly what you have done with Bristol and Sutton. And do this before the stubs are unleashed. Ryan Reich (talk) 23:17, 2 June 2008 (UTC)
If editors were going through geographic redlinks that systematically, the bot wouldn't be necessary in the first place, because the articles would all already be done. But they're not. Bearcat (talk) 04:53, 3 June 2008 (UTC)
So why don't we create Google bot then? Create a bot that automatically stubifies any redlink with text it can find from 2/3 independant websites that mention the link? You are just short circuiting a normal part of the wikipedia manual expansion process for the sake of it. As per above, it was the presence of redlinks that caused the expansion, this actually disproves the idea that hundreds of editors are just waiting for them stubs so they can expand them. MickMacNee (talk) 12:29, 3 June 2008 (UTC)
The bot helps the system. It provides the list of places and their basic information. It generates the articles quickly. And besides, you have already claimed in this discussion that your WikiProject is making a systematic run through Canada, and you want the bot to make this job easier. Clearly, the will to implement this system exists already; using the bot to organize the work provides the boost that will get the project on its feet. But it is work that the bot must be organizing, not the creation of substandard (in the literal sense) articles for its own sake. Ryan Reich (talk) 14:12, 3 June 2008 (UTC)
That's precisely what I've been saying along: it's a tool to help kickstart the creation of proper articles. Nobody, least of all me, is arguing in favour of adding 400 barebones stubs for Canadian places and then leaving them in that condition permanently. In fact, one of my biggest ongoing projects of late has been redirecting badly written and/or poorly referenced stubs about Canadian places to larger parent articles. Bearcat (talk) 00:02, 5 June 2008 (UTC)
Like I said, it seemed like we were agreeing. Sorry if I misconstrued your position. Ryan Reich (talk) 22:53, 5 June 2008 (UTC)
Step 4 just isn't going to happen in my opinion, and will be ruined by the fact that there are going to be enough people in every project who conclude that every stub proposed by the bot should be created through 'inherent notability', completely nullifying the human intervention element being held up above as being the difference between the dumb automatic creation of an atlas as opposed to the assisted creation of an encyclopoedia of knowledge. The core issue is that some people have misunderstood the purpose of wikipedia, and clearly don't see the issue that 'X is a place in region Y, which has a population of Z' is information, but is not knowledge. Inherent notability of places is appropriate to a wikiatlas, not wikipedia, in the same way that words are inherently notable for wiktionary. I think a completely new wiki project on the lines of wikimapia is what is needed here, not this tremendous diversion of effort away from the real jobs that need doing here. MickMacNee (talk) 17:25, 2 June 2008 (UTC)
In order to make it possible, we need to establish a recognized official policy on "inherent notability" (hopefully, that there is none). Then, this policy can be brought to bear on articles created as "nullification" attempts and they can be properly deleted or trans-wikied. What you say about a separate Wikiproject is right, but just because there is a separate project for all places doesn't mean that we shouldn't have a place here for those that deserve real articles under our actual standards. Just like book has an article despite being merely a word, so should possibly tens of thousands of towns have articles despite millions of others not having enough notable information on them to warrant one. What I want to accomplish here is a determination that this is the correct standard, and agreement that what I've proposed is the correct way to implement it with FritzpollBot. Ryan Reich (talk) 17:52, 2 June 2008 (UTC)
"The insistence on secondary sources is at the core of two of our core policies: no original research, and notability."....AND that's when I stopped reading. If you expect anyone to take your opinion seriously you should get your basic facts straight. --Pixelface (talk) 06:52, 4 June 2008 (UTC)
Alternatively, you could read the whole thing and decide if I have a point that goes beyond my attempt at rhetoric. For example, you could take the sentence to be: "The notability principle, with its insistence on secondary sources, strengthens and complements one of our core policies: no original research." No real change in meaning, but more defensible from the policies. I was just starting to learn this three days ago; forgive the occasional lapse. Ryan Reich (talk) 23:02, 5 June 2008 (UTC)

People over at WP:Fiction have been struggling with the same inherent notability concept, and I will again state my opinion that it is not only wrong, it is inherently dangerous. Things achieve notability by being described, directly and in detail, by reliable, independendent, third-party sources. No other criteria can force inclusion. Attempts to decree that something is inherently notable is a shortcut that leads to controversy and strange divisions. Why are towns notable, but not shopping malls? Television episodes? Asteroids? Stars? Garage-rock bands with a really cool MySpace page? It's an effort to force an issue that cannot be supported by logic, and as such will inevitably cause trouble. Kww (talk) 19:02, 2 June 2008 (UTC)

All notability decisions lead to strange decisions and controversy. Asteroids are inherently notable; we have articles on all numbered asteroids, I believe. Inherent notability just means that we have that many fewer complex decisions on AfD, and reduces some systemic bias by reducing the quick deletion of inherently notable things that don't have their sources on Google in English.--Prosfilaes (talk) 21:36, 2 June 2008 (UTC)
You are arguing by assertion. Asteroids aren't inherently notable. One group of editors decided that they were. As for complex decisions ... isn't that why we have editors? Kww (talk) 23:44, 2 June 2008 (UTC)
I have no concept of what you mean by "arguing by assertion" and how it differs from what you're doing. Asteroids are "inherently notable" in that they all have articles created, and those articles are either not submitted to AfD or invariably get kept if sent to AfD. No, editors are not here to run AfD; editors are here to create an encyclopedia. AfD (=some form of discussion about whether an article should be kept) is a necessary evil, but any article that we can keep from having to go through AfD if it's just going to be kept is a win.--Prosfilaes (talk) 22:48, 5 June 2008 (UTC)
I think Ryan Reich is correct on his fundamental assertion that there is no such thing as inherent notability. As he says, notability is confirmed by coverage in multiple verifiable secondary sources. However, I think this is merely a problem of semantics. Certain categories of articles WILL be notable, no matter what. For example, we have an article for every country in the world. Now, are these inherently notable, under Ryan's definition? No, nothing is. They are notable because every country in the world has been written about in multiple verifiable secondary sources. However, because we can reasonably assume that every country in the world has had things written about them, then if we somehow discover a country which doesn't yet have an article, we can go ahead and create the stub, even though we may not have the proper sourcing that a finished article would have. Even in the extreme case that verifiable secondary sources don't even exist yet, I highly doubt that anyone's going to argue that an independent country is not notable enough for the 'pedia. Why? Because we know that for something like an independent country, there will very soon be sources about it. That is what the various notability guidelines are trying to do. They are taking the notability policy and applying it to groups of articles, as opposed to one article at a time. So, while there may be no such thing as inherent notability, there are certainly groups of articles that we can assume are notable right from the get-go. (inserted) The challenge for the community is now to decide at what level we are willing to assume notability.--Aervanath lives in the Orphanage 11:01, 3 June 2008 (UTC)
The trouble with "inherent" (or if you prefer, "assumed") notability is that it lends itself to intellectual laziness. To use your example, if you pick a (real) country at random, then I can pretty much guarantee that you will be able to find sources that discuss it, and in considerable detail. From that, you might conclude that all countries are notable, and as a generalisation that's pretty reasonable, provided that we don't use it as an excuse not to think. One example is that sources may contain typos, so if an atlas lists a country called "Frince" or "Argentony", then it may prove to be impossible to find sources about it, and hence there's no potential for verifiable content, even if we can verify that one source states that it exists. Another example is that articles about "countries" may turn out to be hoaxes or about fantasy micronations, and again often there's no potential for an article with verifiable content.
So I would say that assuming notability is a mistake. We should always check whether the subject has been noted. If a subject has been noted, in detail, in secondary sources then we can have verifiable, neutral content. If it hasn't, we can't. And if we can't find any sources, then the obvious answer is that we shouldn't have an article at this time, but that can change if/when sources become available. Jakew (talk) 14:11, 3 June 2008 (UTC)

You can find someone to note most any placename

I'm not too keen on flooding the internet namespace with placename articles, considering how difficult it is already becoming to get past all the content-less directory pages out there. But if we do this mechanically, we will in practice take one of the big (semi-)governmental listings, which is going to lead us into two different kinds of errors that these directories contain.

But let's try a different angle for a second. By using newspapers, I can source the existence of a lot of small places. For instance, in the vicinity of my present location, I can identify each subdivision (or in some cases clusters of subdivisions) as a named community. They are certainly real, and the newspapers refer to them by name. Are they notable places? Well, they don't fit into the traditional town-centered model of place-ness, so the post office for instance doesn't generally assign them post offices with names. The census isn't that fine-grained.

OK, so we go for the government-like sources. Well, using census designated places is a problem, because they do not reflect real borders. I've been working on Columbia, Maryland, whose legal existence, like some other new towns, is peculiar. It would be possible to go through real estate records and determine quite precisely what the boundaries of Columbia are, because it is covenants that determine whether or not any given property is really in Columbia. The CDP map of Columbia is quite "inaccurate" insofar as it shows several areas outside Columbia (e.g. Holiday Hills and Allview) that are geographically distinct; it also assigns part of Clarksville to Columbia, for reasons that aren't at all apparent to me. Meanwhile, the post office keeps trying to reduce the granularity of "place-ness". Their theory of how large Silver Spring, Maryland is, for example, is quite inflated. There are also some blatant mistakes in their assignment of secondary names (e.g. they have Colesville and Cloverly swapped).

My sense is that a mechanical translation of these data into articles is going to generate a great deal of argument. If we go with the Census, we will get data, but we'll also get a lot of "places" whose geography is a debatable artifact of statistical collecting convenience. If we use the list of geographic place names, we will produce a ton of stubs, and there will undoubtedly be argument as to whether many of them actually exist. In the meantime, the process is likely to generate a lot of cluttery misinformation. Mangoe (talk) 19:22, 2 June 2008 (UTC)

Regarding CDPs, you could argue that at the time of the census the borders of the CDP were whatever the Census Bureau said they were, deed records notwithstanding. If deed records of land parcels 1-100 were part of ABCville, but the census counted all land between The River and The Road, regardless of what parcel it was in, the census-bureau definition would be the one to use, with a notation that it was current as of the last time the census bureau used it. You could also write an article about ABCville but call it "ABCville, a community outside Metropolis, parts of which are part of the CDP by the same name" then use the deed records to define the boundaries. davidwr/(talk)/(contribs)/(e-mail) 19:32, 2 June 2008 (UTC)
Those boundaries were wherever the census said they were (and indeed, they provide maps), but the problem is not as to the existence of these boundaries, but as to their meaning. Take a look at the article on Columbia. By any standard, it is a notable place, census or no census, as it represents an important social experiment. The problem with identifying it with/as the CDP, though, is that the latter includes large areas which aren't part of the experiment! Or to be more accurate, places that aren't part of the experiment. Allview is "in" Columbia according to the post office and the census, but it isn't part of the planned community; it predates it by some six or seven years.
To put it in other words: I think it is going to be difficult to find geographical authorities for these places that amount to more than lists of names and some sort of coordinates. The census provides more data, but it is not a geographical authority, and its geographic divisions are founded in statistical efficacy, not whether other people consider such-and-such to be a place. The situation for the post office is worse, since zip code maps are figments of back deduction from delivery routes. They both work OK in places where there is a lot of empty space between "places", but they don't work that well when their granularity starts to fall below normal senses of "place". Even semi-rural density produces problems. How big is Laurel, Maryland, anyway? Well, it is incorporated, at least; but the post office assigns it to four counties, and the census makes similar divisions. The problem is, of the four CDPs outside Laurel, only one of them is a real place (Maryland City). To the degree that North Laurel is a real place, it mostly doesn't have anything to do with the CDP (most of which is either Scaggsville or Hammond Village or any of a bunch of old developments); I've never heard anyone refer to "West Laurel" at all. And in two years the CDPs are going to move; I'm already seeing that the intermediate dataset leaves out a lot of places.
Yes, one could research all this, which is rather the point. Bot-generated articles are by their nature unresearched, and therefore fairly cry out for thoughtful correction. Or thoughtless correction, for that matter. Mangoe (talk) 21:51, 2 June 2008 (UTC)
Thanks for the thoughtful posts. Another problem with (forex) the RAMBOT-generated articles on CDP's is, they're hard to correct, as some editors are convinced the CDPs are sacrosanct. See, for example Talk:Village of Oak Creek, Arizona#Proper name? to see how tedious such a correction (by a knowledgeable local resident, me) can be. Oy. Pete Tillman (talk) 22:23, 2 June 2008 (UTC)
It's not a correction, though; it's merely one POV.--Prosfilaes (talk) 22:32, 2 June 2008 (UTC)
Sorry, but the census doesn't agree. Please read these:
They intend to correct their geography to match the places the locals recognize, and they're taking off all the size standards. It is possible that each of the developments I've mentioned will be listed separately, as by my reading of the proposal, each of them is a distinct, countable, named neighborhood. Interestingly, they take one of the cases I've mentioned as a problem example. Mangoe (talk) 11:07, 3 June 2008 (UTC)
Asteroids: don't go there

Someone mentioned asteroids above. They provide an excellent reason not to do this. So far there are "only" 544 articles in Category:Asteroids. So far. If one picks an article in the category at random— say, this one— you'll find that it is nothing more than a row out of some catalog. Or rather, a really huge table. A page for each is a really crappy presentation, but these kinds of projects are unstoppable once they get underway, because the effort invested in them protects them from serious criticism. And at least the asteroids are nice, discrete chunks of rock that can be readily differentiated one from the other. Mangoe (talk) 14:49, 3 June 2008 (UTC)

How broad or narrow is inherent notability?

I don't think the problem is with inherent notability, but on how it is used. I don't think anyone would have a reasonable problem with very very narrow categories such as "any nondwarf planet in this solar system is inherently notable", which includes 8 members and any planets newly discovered. The issue as Ryan points out is overbroad inherent notability. The community has left the line deliberately vague. This bot indicates it's now time to draw a line. It's probable that a very narrow standard like "all cities with four independent RS and population of 50% of capital city are inherently notable" would gain at least 80% consensus, although it excludes many other notable cities. When you start to tweak it-- 40%? 30%? Percent of national population (probably better, per Geometry Guy)?-- consensus also decreases. However the bot requires the community to take up a position and have the national projects determine hard and fast minimum standards for inherent notability, so that the bot can function within those strict standards. If it's population 500, then 501 gets you autogeneration and 499 doesn't, and if you don't like that, sofixit. After that we can lower the standards later, even if the bot may not pick them up for many moons after the first run. I encourage everyone to consider editing WP:NPT and its talk. JJB 15:11, 3 June 2008 (UTC)

In my proposal, I essentially argued that there should be some method of classification — by administrative region, for example — so that among all the entities at a particular level, if any one of them is notable, then we should believe that the others will turn out notable as well. However, I don't think that this determination should preclude finding actual proof of notability: reliable secondary sources. It's simply a way of organizing this geographic project. What you have proposed, namely "all cities with four independent RS and a population of 50% of capital city are inherently notable" is redundant, in that with the four independent RS (secondary, of course), there is no need for inherent notability: that's proof according to WP:N. What you say about, for example, nondwarf planets, is the same as my idea about finding a method of classification (in your example, it's "being a nondwarf planet"). But, I repeat: this is just a way of identifying broad classes of likely notable subjects. You still have to provide the notability claims. However, I agree: let's contribute to WP:NPT; it's more important than just this bot. Ryan Reich (talk) 15:32, 3 June 2008 (UTC)

Dissenting comments

I think the mixed use of rhetoric here may be a bit misleading. The idea of inherent notability is that certain classes of subjects can be identified as inherently notable based on shared features. You don't really deny that notable subjects can be identified as such; you seem to be arguing that even though inherent notability can sometimes be demonstrated, our general notability guideline is still a useful one in that it implements some reasonable verifiability standards. There is certainly a correlation between notability and verifiability (hence WP:N), but the two aren't intrinsic intertwined -- there are some notable subjects for which we can't find adequate sources, and visa versa. So it's not really a question of notability that you raise, but essentially one of verifiability.

Now, I don't really agree with your comments regarding primary and secondary sources. You're right that Knowledge articles are generally tertiary in nature, and this is supported as a general principle by WP:V, WP:RS and so on. But I think that a careful analysis of the underlying intent suggests that the principle doesn't apply in this case. Often material from primary sources is subject to widely varying interpretations. The secondary source doctrine was designed to facilitate neutrality efforts mainly in historical-political articles and the like. Another common problem with primary sources is that they tend to be inaccessible -- unpublished interviews are an obvious example. Imagine if editors were to use those methods in, say, the Holocaust denial article -- big trouble would likely ensue.

In this context those issues aren't really pertinent -- the primary sourcing involved is used to show that a particular geographic area exists, that it has a certain location, population and so on. The data is easily accessible, it contains few ambiguities, and bits of data could possibly be controvertible tend to be uncontentious and inconsequential in nature (i.e., a possibly skewed population estimate is nothing like a misrepresentation of an author's political philosophy). — xDanielx /C\ 08:19, 4 June 2008 (UTC)

Diversity of data types

My reservation, and it is a small one, is that I'd still like to see more data types integrated than just population and co-ordinates. Depending on region and country, there are potentially many data sets that could be integrated (and referenced) to produce a valuable resource. Economic data, political, religious, linguistic, geographical, for a start, plus there may be local websites that have pages for many towns in a region. This could make the difference between an article being a copy-paste from an existing gazetteer and a new and useful synthesis. Pseudomonas(talk) 17:06, 2 June 2008 (UTC)

Absolutely agree - the more the better. That's why I need editors familiar with the topics to be able to provide the sources, or to find them, or to know where they are. Only when we've exhausted everything would we fuse the data together for checking. Fritzpoll (talk) 17:07, 2 June 2008 (UTC)
In that case I thoroughly look forward to seeing the results; this could be a great opportunity for integrating data that would otherwise be near impossible to get in one place. Pseudomonas(talk) 17:40, 2 June 2008 (UTC)

Proper sourcing, no spam

The end result should be articles with proper sourcing and no spam. Every claim in the article should exist at a reliable published source whose page is indicated in the References (or Source) section; and no link should be in the article that is not warrented by the information at that link. WAS 4.250 (talk) 17:28, 2 June 2008 (UTC)

Also, are you going to clean up the example articles that you have already created that violated this "Proper sourcing, no spam" standard? Or perhaps someone already did? Could links to those articles be placed here for accountability and verification? WAS 4.250 (talk) 17:28, 2 June 2008 (UTC)

The first job of the new WikiProject will be to clean up the existing articles, adjust the source to point at a more precise location, and remove the Encarta links that I know you feel strongly about. I am personally a little tied up with this page and the others being spawned, but I will see if Blofeld et al. can fix these now. Alternatively, you can remove the Encarta links yourself! But seriously, this will be done in due course. Under the new proposal, there would be more than one source, so sourcing would be even better. Best wishes Fritzpoll (talk) 11:46, 3 June 2008 (UTC)

Point 8: Clarification?

I'm not sure I understand point 8.

When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly.

You're saying we should have a base global standard for "inherent notablity" and then fine tune it to national cases. I don't think anyone would argue against this, but what that base global standard should be is the point of contention. Your suggestions for initial criteria seem problematic. For instance, there is no city in Hungary or Mexico with anywhere near 50% of population of the capital. Maybe I've misunderstood this point entirely.

More importantly, if it will always be broadened to include more locations, why bother creating a global standard to begin with? TheMightyQuill (talk) 18:56, 2 June 2008 (UTC)

I've already started what I hope will be a discussion on "inherent notability" above. It would be better if conversation on the subject were centralized there. Ryan Reich (talk) 17:45, 2 June 2008 (UTC)
Not sure I actually added that bit...either the place is notable enough or not. We can't change this by changing the criteria when we run out of things to do. I really just need to know what the community wants in terms of minimum notability Fritzpoll (talk) 12:09, 3 June 2008 (UTC)
That was me attempting a friendly amendment, as adverted, and that is why I left in vaguenesses like "such as". Since inherence is so hotly disputed, if you start with a very narrow minimum, you can show a relatively stronger consensus that the small number of such cities are inherently notable (while other notables like St. Petersburg Russia don't make that particular list). And I would second the argument that "50% of capital" is unexpectedly US-centric. But it doesn't matter what narrow minimum you start with, so long as you can tighten it in later phases. Now, if ratcheting down is not technologically easy, then please delete the appropriate clauses and refactor it as the national project getting consensus on local notability in one go and then the bot going ahead. But right now there is no hard-and-fast minimum notability standard anywhere because vague guidance has been favorable. Due to the bot's existence, vague guidance on this point must become deprecated. Differing national and local standards are fine but they must be enunciated in guidelines. I'm not picky about other points than the bot strictly obeying an enunciated minimum notability standard, and I think Fritzpoll is fine with that if the community does its duty and enunicates the standard unambiguously. JJB 14:57, 3 June 2008 (UTC)
"The national project getting consensus on local notability" is a good idea if and only if that means that they will determine that there are enough sources at hand to support notability claims for the locales in question. If it means that they will unilaterally decide to ignore the notability standard then it's bad: even on this encyclopedia, just because (especially because!) a group of people with similar and sympathetic interests gets together to declare that a subject is "notable" doesn't make it so. That's WP:ILIKEIT, and I don't like it. Ryan Reich (talk) 18:15, 3 June 2008 (UTC)
I think that, unless clear evidence to the contrary is presented, that there would be no reason to not use the standard that was used earlier for generating articles within the US, which was having a population of 3. Even that, considering the fact of the existence of historically notable ghost towns, is possibly unduly limiting. But I can't see any reason why, in effect, US locations should emerge as having lower standards of inclusion than locations elsewhere in the world. John Carter (talk) 18:21, 3 June 2008 (UTC)
I think I've argued extensively that this standard was wrong. Ryan Reich (talk) 18:29, 3 June 2008 (UTC)
And I note that most of the discussion here does seem to be based on, so far as I can tell, your own WP:POV. I also note that the notability guideline is only a guideline. Really, in all honesty, this page seems to me to be almost an instance where one individual is trying through his repeated postings to try to impose his opinion, that individual being you. Part of the argument seems to be that this will create stubs. Well, most of the articles we have are stubs. Should we apply the same standard to living editors who generate such articles? You're saying there's no need to rush into it. The same standard can be applied to living editors as well. This does seem to me, at least in part, your own personal attempt to try to impose your opinions on wikipedia. Granted, some of what you say I agree with, such as contacting the various national projects to have them determine the outcome. But I really have to question when one party tries so hard to try to impose a personal opinion when several measures were taken already and approved it. This isn't, and shouldn't be, a case where one person through his repeated postings seeks to impose his own POV on wikipedia, without clear support for it, and, in fact, some degree of pre-existing disagreement from the broader community. I'm sure you have several articles you could be working on, rather than seemingly almost obsessively responding to almost every individual post here. I think it might help if you were to actually do so. John Carter (talk) 18:47, 3 June 2008 (UTC)
Don't put a scare link on "POV": point-of-view in an article is forbidden; in an open debate, it's required. As I see it, this debate is an opportunity to address the issue of inherent notability, and what you call my "obsession" has been an attempt at tying together the discussion on this point. I can no more impose my opinion on Knowledge than I could personally delete all the stubs if they ended up as I'm hoping they won't: because any change here results from commnity consensus. I have been trying to gauge the extent of consensus on both sides of the issue by raising a flag when it arises naturally in the discussion, and there is a significant amount of dissatisfaction with the state of affairs created by the Rambot operation years ago. All the new talk now is about getting the national WikiProjects involved in sourcing FritzpollBot's articles, which suggests that these editors are agreeing with my assertions about notability requirements (not necessarily because of me, mind you. They could have their own reasons, and I'm not looking for or getting much of the credit). This is not, at all, a case where one person is imposing his POV without clear support; I am trying to get a movement towards consensus on a point which had already broken open. No consensus has at all formed yet, because this is an issue which addresses what it means to be "encyclopedic", and since that's the goal here in this big place, people have strong and different opinions on it. However, if this debate proceeded without a view towards that larger issue, the outcome could easily be decided on less compelling, more expedient grounds (quite possibly, by the way, guided by the "POV" of the people most interested in using the bot) which ultimately harm Knowledge. Finally, your last sentences border on being rude. Ryan Reich (talk) 19:46, 3 June 2008 (UTC)

Auto-draft, human-patrol, auto-move

Some Wikiprojects may want you to do a run with the output in a separate namespace, maybe wikiprojectname/FritzpollBotresults/articlename, patrol the articles, then robo-move articles successfully patrolled. As part of the patrol process, articles can rejected, moved to a human-assistance-required namespace, or marked "ready to move." Please consider giving this capability to the bot. davidwr/(talk)/(contribs)/(e-mail) 17:19, 2 June 2008 (UTC)

As I understand it, that's pretty much what's going to happen. The data are going to be placed in a WikiProject's subpage, and human editors are going to check over each article before it gets created in mainspace.--Aervanath lives in the Orphanage 06:55, 3 June 2008 (UTC)
Can be implemented on a case-by-case basis depending on the needs of the WikiProject Fritzpoll (talk) 15:42, 3 June 2008 (UTC)

Why not a new Wikimedia sister project, a la Wikispecies?

If the bulk of these articles is simply going to consist of interpolated database info, then it would seem to me that creating a new whole wiki specially designed for this purpose would be more ideal. Locations with Knowledge articles could be interwiki-linked, and perhaps a bot could create soft interwiki redirects to all locations which exist in that sister project but do not currently have articles here. Nothing would stop human editors with an interest from creating articles here, and this would alleviate any concerns of having tens of thousands, if not millions, of new, unwatched, and likely unmaintained articles here. Girolamo Savonarola (talk) 17:37, 2 June 2008 (UTC)

Wikimedia projects are not intended to be content forks are are supposed to present a different type of information (i.e. Knowledge includes encyclopedia articles, Wiktionary includes dictionary entries...). Wikispecies was not set up to include encyclopedia articles or anything approaching Knowledge-like content. For what it's worth, Wikispecies was set up with significant community opposition (hence the charter to limit it to taxonomy) and the project has failed. Who would it value to create similar projects with overlapping purpose but separate user-bases, policy and governance? Who would it value to remove so much content from Knowledge when Knowledge is not limited in size and does not face the same problems as paper encyclopedias? The benefit of Knowledge containing such a broad array of information is that this content is subject to the same policies (it is consistent), there is no requirement to browse out of the site to get to articles about particular things and articles are exposed to a much larger user.base (that is, articles are seen by more people and problems are more likely to be ironed-out). Content forks create redundancy, create inconsistency and divide the user-base so that articles have more limited exposure. --Oldak Quill 19:50, 2 June 2008 (UTC)
You miss my point - I am suggesting that what will currently be the only content of these bot-articles (ie - statistics) would perhaps be better suited as the main corpus of an atlas-like sister project. Any human editors that would like to initiate Knowledge articles on any of these places will still be able to, as they always have been. Girolamo Savonarola (talk) 20:26, 2 June 2008 (UTC)
It depends how statistics are presented. If statistics are presented as prose and infoboxes in an encyclopedia article (like many geography and science articles), this content should be in Knowledge. This bot intends to start prose articles, not just tables of statistics. --Oldak Quill 20:38, 2 June 2008 (UTC)
If all they are is statistics, then you are just putting lipstick on a pig. Knowledge is not a collection of statistics, and if you can't find real prose, not just window dressing, to put in these articles, then this content should not be in Knowledge. Note that although this policy article does advocate using infoboxes, it specifically does so as readability devices only. The main point is that statistics must be presented in context, which these stubs will not do. Ryan Reich (talk) 21:06, 2 June 2008 (UTC)
(xpost) Prose is not our sole standard for an article, nor should it be. My point is that while these all may be (and ideally should be) encyclopedia articles, the lack of information besides statistics indicates that there may be a more useful, informative, and accessible way to present the data, prose or not. Anyone wishing to actively create articles on these places, where the article would consist of other real-world data from reliable secondary sources, is of course welcome to do so. But creating bot-written encyclopedia articles of prose statistics for the sake of creating them is not a goal worth pursuing independently, in my opinion, no. If there is a concern about preserving the information on-wiki, then surely this information could all be just as easily rendered into tables within list articles concerning the local regions, and with the much greater benefit of having regional comparisons quickly at hand.
In short, if we want to create a geographical register, why not create a geographical register rather than trying to force-feed one into Knowledge? Girolamo Savonarola (talk) 21:12, 2 June 2008 (UTC)
(edit conflict) "Lipstick on a pig" certainly describes the Rambot-generated prose text in articles like Storden, Minnesota, mentioned previously. I'd rather see the auto-generated prose kept to a bare minimum and just keep the numbers in the infobox. — Andrwsc (talk · contribs) 21:14, 2 June 2008 (UTC)


Lately I've been hitting the 'Random article' link quite a lot, and it's interesting how many of the pages I hit are Rambot generated pages for small communities in the United States that have only been ever been substantially contributed to by bots. They contain nothing but tabulated statistics expanded into prose form. Nobody is reading these pages and nobody is adding to them, because there is nothing to add to them. Some of the area sizes of these communitiess are no larger than a couple of city blocks and the population no larger than smalll-sized office buildings. There are no secondary sources for these places; they've never been written about in a book, a magazine, or a newspaper; so basically, what's the point of these pages existing? Whatever altruistic goals Wikimedians set out for, surely it should start out with being informative and useful?
As for the proposal, I'm not hostile to it, and I believe the project opponents hearts are in the right place. I think it's a good idea to use technology to mine and unify statistics from across different sources and to provide them openly in a human-readable format. I do think, however, it should be taken one step at a time, and a separate project from wikipedia would be more appropriate such as wiki-atlas, as been proposed in this section. Statistics about Podunk, Kirgygstan aren't inherently interesting or useful isolated by themselves, but if presented in a concise and consistent manner, making them easier to compare and constrast with other rinky dink communities, they would be. This would also make it easier to translate between languages, which would do more for countering systemic bias than believing in an illusion that just because a wiki article exists means they are thought about.
Knowledge articles should be left for places that not just have a story to tell, but that have already had a story told in secondary sources, such as books and newspapers; it's an encyclopedia after all. For those inclined to start an article about whatever place they're interested in, the information would then still be there in the wiki-atlas, and still be just as every bit as useful. Bosintang (talk) 13:54, 6 June 2008 (UTC)

Interwikis - another asset, even for stubs

It is possible that at least some of these exist on other language wikis. With interwiki links, even a sub-stub could be very useful if it links to an article in another language with which a reader is somewhat familiar (or which could be machine translated). (A lot of locations in Africa have decent articles in French even where the articles here are atrocious, for instance.) It might be worth doing some sort of query of the wiki of the relevant language for all the titles of the articles to be created, so interwikis could be added automatically where appropriate. Mangostar (talk) 17:53, 2 June 2008 (UTC)


Llywrch's must-read, expert assessment of the issues

Llywrch has been deeply involved in this sort of thing improving our coverage of places in Ethiopia. Even handled at the Wikiproject level, there are many problems as well as payoffs (or as Lyrwich put it in his edit summary: "warning: there be monsters -- & also treasures -- here"). Please see his now-archived comments in response to the first proposal:

--A. B. 18:44, 2 June 2008 (UTC)

400,000 larger articles vs. 2 million stubs

We should consider who's going to monitor 2 million more articles for vandalism, errors, etc. Merging stub-type data on all these places into fewer, bigger articles might make for better ongoing monitoring and quality control. For instance, an article on Anywhere Township would have a section for Nowhereville and each of the 4 other little hamlets in the township. The 5 hamlets each have a redirect to the township article. --A. B. 19:15, 2 June 2008 (UTC)

As a reader, I prefer navigating and using smaller-scope stub articles about particular things than big list articles (containing several stub articles). Articles per item facilitates easier and more dynamic browsing. Stub articles are useful to the reader and present a lot of information in a compact area. Traditional encyclopedia articles and dictionary entries are more similar to stubs (and similarly contain concise and compact information) than to 32kb articles. 32kb articles are preferable, but it is untrue to say that stubs are useless or difficult. Furthermore, I feel more ready to expand a stub article than an item in a list article. --Oldak Quill 19:56, 2 June 2008 (UTC)
While this is also my biggest concern, it was claimed above that This proposal will probably drastically reduce the number of articles created <...>. It will be nowhere near the predictions of the first proposal. Colchicum (talk) 20:01, 2 June 2008 (UTC)
I based this comment on the fact that the original proposal would have created 1.8 million articles, with no inherent criteria for restricting it. Now we are all discussing what restrictions should be placed on articles about places - any restrictions will reduce this number, and the reduction will probably be fairly drastic. How drastic depends on how it is implemented Fritzpoll (talk) 12:14, 3 June 2008 (UTC)

Ensuring quality articles before their creation

Perhaps each geographical WikiProject that wants to use this bot should have a data check and implementation plan for review by others before unleashing the Sorcerer's Apprentice-bot to create hundreds of articles. --A. B. 19:18, 2 June 2008 (UTC)

This is exactly what I have proposed myself. This kind of quality assurance is necessary before undertaking such a large and complicated project. It is all the more necessary because Knowledge policy mandates that articles make notability claims; for a bot to create multitudes of articles without notability claims would constitute a systematic violation of this policy. Ryan Reich (talk) 19:44, 2 June 2008 (UTC)
Plenty of other categories, such as school-stubs and living-things-stubs, have only boilerplate references or no references at all. davidwr/(talk)/(contribs)/(e-mail) 20:02, 2 June 2008 (UTC)
Two responses: 1. WP:OTHERSTUFF. 2. Maybe they should be deleted, depending on the notability claims they make. It's one thing to have unsourced claims; it's another to have no claims and no sources. There's nothing to say that a systematic violation of policy hasn't already happened elsewhere; I just don't want it to happen here. Ryan Reich (talk) 20:10, 2 June 2008 (UTC)

Underlying Issues

I Oppose the idea, but for different reasons, which i think need to be separate from the main straw poll, because it is part of what i oppose. I think that Fritzpoll has done the communty a service by creating this bot, and bringing the proposal forward, not to mention modifying it dramatically, and taking a lot of flack for making the idea a possibility. So, Thank you, Fritzpoll. Because of the amazing and quick blow-up that happened yesterday, however, i have to oppose: It clearly shows that there are severe underlying issues which need to be resolved:

  • Inherent Notability: Ryan Reich has made good points about this, and i agree that it is a dubious concept that cannot be brushed under the rug at this point.
  • Scope: It seems that, as Girolamo Savonarola has suggested, the capacity of this bot is better suited to the creation of a new project, holding data about all these places, names, villages, hamlets, whatever.
  • Behaviour: I was appalled by some of the comments made in the other discussion. As i pointed out there, i saw hectoring and badgering going on, as the majority of Opposers were argued against. I also saw Opposers talking about "horrible" imaginations, "inherently repulsive" ideas, and something being "wful, awful, AWFUL". Not the good, temperate language we ought to expect (and, to be truthful, usually find) in WP.
  • Ownership: It seems to me that the reason for the behaviour was an attitude behind it, one of, "We own this project, and therefore...." Not from one side or the other only, i hasten to add. Now, i may only have made a thousand or so edits, but my understanding is that i am as valued an editor (though maybe not as valuable) as he who has made twenty or thirty thousand; let me tell you, that the idea that the fewer edits we lesser mortals have the less our contribution is valued or we are respected is a sure way to drive us away. And that is not within the goal of this project, is it?

I think that these issues are far more important to resolve now than the question of whether we let a bot create articles. Perhaps i could be accused of being overdramatic, so i won't say that the future of WP depends on it, but i will say that in any organisation the huge blow-ups are almost always not actually about the apparent trigger. FritzpollBot is our trigger only, not the most important thing we must focus on. Cheers, Lindsay 20:04, 2 June 2008 (UTC)

Clearly, I agree, and thanks to Lindsay for saying what needed to be said. This is an opportunity to settle a long-standing policy debate over "inherent notability", hopefully in the direction that we do not relax our standards just to gain some more page titles. The question of FritzpollBot also needs to be settled, and the precedent set by the consensus (eventually) reached will certainly form an important part of this discussion, but the bot is not the biggest issue here. Ryan Reich (talk) 21:17, 2 June 2008 (UTC)


What if the bot were created as a tool for regional WikiProjects to do with as they would, with no global concerted effort to actually run the bot worldwide? This seems to be in the revised plan already. The main wikiproject would live on to support the bot as a technical tool, not to decide what articles got created or when. davidwr/(talk)/(contribs)/(e-mail) 22:43, 2 June 2008 (UTC)
This is the direction I've also been going recently, but we still need to settle the issue of whether the raw stubs the bot creates are suitable articles. Otherwise, the geographic wikiprojects may well just do what was originally proposed, each on its own. Ryan Reich (talk) 23:23, 2 June 2008 (UTC)

Is systemic bias bad?

A good deal of this discussion is centred on systemic (or systematic) bias. In general there is an underlying assumption that systemic bias is a BAD THING. The argument is phrased along the lines of "why should a one-horse town in the United States be notable and an equivalent town in Africa not?" It seems at times as though Knowledge:No systemic bias has become the sixth pillar of Knowledge. I believe there is also a (mostly) unconscious but still significant association of systemic bias with racial or ethnic prejudice. But the purpose of an encyclopaedia is to inform, not to document, and in order for an article to inform it must first be read. Now, consider two villages: village X in a remote part of England and village Y in a remote part of Nepal.

  • The probability of students in X being asked to write about their home town is close to 1.
  • The probability of students in X being asked to write about Y is close to 0.

Of course, the converse is also true. I dearly hope that one day all citizens of Y will have access to broadband internet and that Nepalese Knowledge will grow to the size of English Wiki, but the usefulness of an article on X in Nepalese Wiki will still be close to 0. This is the nature of systemic bias and it is why systemic bias is a GOOD THING.
A paper encyclopaedia is biased towards what the editors want us to know. Knowledge is biased towards what we ourselves want to know. Correction of that bias by creating articles about what we don't want to know serves no educational purpose while creating the potential for harm (accidental or deliberate misinformation etc.). Nobody on English Wiki currently wants to know about village Y. Blofeld of SPECTRE, even if he wants to know about every village in the world, doesn't want to know about village Y because he is unaware of its existence. When the day comes that people do want to know about it, it will be time for a human to create an article. But for 90%+ of village Ys that day will never come.
The current proposals are in conformity with most of the above; nonetheless I feel that these points need to be made. Scolaire (talk) 21:00, 2 June 2008 (UTC)

I hope you don't mind that I replaced all your "systematic"s with "systemic"s. Systematic bias is quite bad, amounting as it does to a conspiracy to skew the contents of the encyclopedia; systemic bias is "merely" an imbalance in the enyclopedia caused by the system itself. Ryan Reich (talk) 21:13, 2 June 2008 (UTC)
I don't mind too much. My link was meant to be red, though—Knowledge:No systemic bias is neither a policy nor a guideline. Scolaire (talk) 21:28, 2 June 2008 (UTC)
Sorry, I missed the irony. Ryan Reich (talk) 22:14, 2 June 2008 (UTC)
If an article does not exist, readers cannot access it and the usefulness of the encyclopedia on this topic is zero. If a stub exists on a very obscure topic (like a small Nepalese village) then even if only one reader accesses this article per year, the utility of the encyclopedia on this topic will be greater than zero. Consequently, having articles on obscure topics will increase the utility of Knowledge, even if these articles are not frequently accessed and the resulting increase in usefulness is not very large. As the result is a net positive, the small magnitude of the positive effect is not an argument against this course of action. Tim Vickers (talk) 21:38, 2 June 2008 (UTC)
And if fewer than one reader reads (as opposed to accesses) the article? And on what basis can you assume that a majority or even a substantial minority of bot-created articles will be read? Scolaire (talk) 21:46, 2 June 2008 (UTC)
Even if only 1/10 of the articles are read, this still only changes the magnitude of the positive effect. What your argument fails to do is show that there is a negative effect on our readers from the creation of these articles. As this makes Knowledge more useful, rather than less useful, then we should do it. Tim Vickers (talk) 21:58, 2 June 2008 (UTC)
"Knowledge is biased towards what we ourselves want to know. Correction of that bias by creating articles about what we don't want to know serves no educational purpose while creating the potential for harm (accidental or deliberate misinformation etc.)." That's the negative effect. Scolaire (talk) 22:03, 2 June 2008 (UTC)
Knowledge is biased towards what we want others to know.M-72 (talk) 03:12, 6 June 2008 (UTC)
So you are arguing that more people will be misinformed than informed by this project? That certainly seems a tenuous argument. Mangostar (talk) 22:16, 2 June 2008 (UTC)
I would not by any means be the only person to argue that! But in fact my argument is what it says: systemic bias in many cases is natural and right, and anybody undertaking a project of this nature has to take cognizance of that fact. Scolaire (talk) 22:27, 2 June 2008 (UTC)
  • Someone mentioned above, there is nothing more infuriating than clicking a supposed blue link to find a bare skeleton of information. Creating articles because we think one person might read seems a real stretch. And I also don't get this accessibility to editing argument, we all as editors should probably recognise there is mere weeks between an editor who can (meaningfully) improve a stub, to someone who can create an article. As I said, this project is more appropriately changed to an MOS for article creation, or new article copyeditting. MickMacNee (talk) 22:01, 2 June 2008 (UTC)
  • I disagree wholeheartedly. As someone who works on a lot of articles about the developing world (lately it's been Cambodia), I am most frustrated to see red links. Even a tiny skeleton of information can provide the basis for further research. For instance, if I am researching a small town of a certain name, I would want to know whether there is one town XYZ in Cambodia, or multiple towns XYZ (which this project could tell me), so that I don't go mangling several towns into one article. I might also want to know the geographic location of a town, which could help me match up different transliterations of one town's name, or distinguish between similarly named towns, as the case might be. This project would also allow me to do this. Even a directory of towns would help identify major population centers in a given area. For the article Ratanankiri, for instance, it took ages to verify which towns outside the provincial capital were big enough to deserve mention in the province article. This project would also assist readers and editors in this respect. Mangostar (talk) 22:13, 2 June 2008 (UTC)
  • And as a sidenote, if you don't like being surprised by stubs, you can have their links be a different color by changing your preferences. Then you can just pretend they are redlinks, which is apparently better?... Mangostar (talk) 22:18, 2 June 2008 (UTC)
    • The point being is you would be doing that in preparation to create an article with more than that basic information. Having a mass information bank hoping that others will come and edit is unnecessary. The same could be said for companies, people etc etc, there are databases of basic information all over about everything, wikipedia goes further than being a basic store of information. What is wrong with posting the resources and basic article template at a MOS page if the problem is not knowing where to find this information? MickMacNee (talk) 22:21, 2 June 2008 (UTC)
  • I might support the addition of other basic bot articles if I thought their contents were notable. (I believe most or all settlements are notable, so this is fine by me.) Similarly, because I believe these are all notable, it is not indiscriminate. And all of wikipedia comes from other resources and is organized for convenience--that's the beauty of it. Your description of "a few central, comprehensive sources" misrepresents the reality of getting data from govt sites for third world countries. To find census data for Cambodian villages took a good deal of browsing through google hits on various queries--and it is in 18 orphaned numbered PDFs that don't appear to be linked from any other places on the Cambodian statistics site. This would be too much effort to go through to create any individual article or handful of articles, but when it is undertaken all at once (converting the PDF to XLS, merging with coords, etc) it scales and is efficient enough to make sense as a project. It is silly to think that humans should do this on a case-by-case basis, rather than getting some super-duper spreadsheets together and running a bot. Mangostar (talk) 22:35, 2 June 2008 (UTC)
  • I don't want to take too hard a line on this; the bot is a great idea for helping with the creation of articles. If the bot could do all the looking-up that you had to do, your life would be much easier. But I absolutely object to indiscriminately unleashing the bot to create articles containing just that information, because it is not notable in and of itself. Yes, every place has a strong likelihood of being notable in a thousand different ways, and certainly in one or two, but just like with every Knowledge article, that notability must be established by recourse to a reliable, secondary source: not just a collection of raw facts, but analysis, opinion, or connections. Furthermore, if the bot could provide you with a list, there is no need to create the articles; just generate the list for yourself, or include it as a subpage in WikiProject whatever (in your case, Cambodia). My point is that even if the stub articles would have a potential for expansion, without a notability claim establishing that potential, they are no more than dictionary definitions. Authors need to take responsibility for their articles. Ryan Reich (talk) 23:30, 2 June 2008 (UTC)
  • Given the common sense explanation of the origin of bias above, it might be better to organise a concerted translation project for all the place articles that the other language wikis may have that we don't. MickMacNee (talk) 22:01, 2 June 2008 (UTC)
No, IMHO systemic bias is not bad at all, it is just a matter of supply and demand. But a stub, provided that it is watched and it is not orphaned, is useful regardless of that. Colchicum (talk) 22:26, 2 June 2008 (UTC)

For all those who are concerned about how these stubs will make Knowledge look bad, I would argue to the contrary, and particularly because of the combating systemic bias effect. Systemic bias is bad in part because it makes people take Knowledge less seriously. Many people believe Knowledge is all about collecting geeky trivia that is of interest to American 20-something guys or is all about American and British shopping malls or elementary schools because, frankly, that is at least a good chunk of what goes on here. I'm not a big elitist and don't have a problem with that content existing, but think how people's perceptions of Knowledge might change if it had a reputation for being a premier reference for all places, and the premier reference on places in the developing world instead. Mangostar (talk) 22:41, 2 June 2008 (UTC)

How would it look if all those places were nearly empty stubs containing no distinguishing facts? We would be just as US-centric as we are now, and everyone would know that no thought or consideration was given to any of those articles. Writing these stubs is no more than an exercise is political correctness; to really counteract systemic bias, we would have to change the process that leads to it. That is the process which connects notable information with articles on Knowledge, and according to my proposal, this bot can and should serve as a facilitator in establishing that connection, but can never make the connection by itself. Ryan Reich (talk) 23:35, 2 June 2008 (UTC)
A point alluded to above is that who will protect all these 2 million stubs from vandalism? Systemic bias is fine as long as that bias is towards what is useful. Useful means useful to the audience of Knowledge, and if that audience is 95% Western, then too bad, the way to fix that is to get more non-Western people on the Internet - something Knowledge cannot solve - not create millions of unwatched stubs. - Merzbow (talk) 00:23, 3 June 2008 (UTC)
This is a chicken and egg argument. Where is the incentive to use and edit WP if content you are interested in doesn't exist yet? Creating stubs to reduce systemic bias greatly lowers the barrier to entry for editing and using those articles. Suicup (talk) 00:34, 3 June 2008 (UTC)
I'm not seeing this purported gulf of ability between an editor that can add menaingfull information to a stub, and one that can create an article. If style is the only issue, create a template and provide a list of resources. Creating thousands of pre-prepared stubs on this level, when we don't do it for any other topic, goes against the core principle of how wikipedia develops. MickMacNee (talk) 12:18, 3 June 2008 (UTC)
Agreed. Knowledge could become the definitive resource for information on these sorts of places, attracting readers (and hence editors). To give an analogous example, I don't use Knowledge for reference on fashion topics (another systemic bias issue, here because of the lack of female contributors), since the coverage is awful. (I'll contribute every now and again though.) If Knowledge were a place to get information on fashion, maybe more fashion-minded people would come to read, and in the process improve the articles. Mangostar (talk) 01:34, 3 June 2008 (UTC)
Systemic bias is bad: by failing to counter it, we tacitly accept that there is a geographical and cultural limit on the things we should write about on the 'pedia. In fact, Scolaire's example of the villages of England and Nepal is rather ironic, in that in itself it shows systemic bias. Yes, schoolchildren in England may not be asked to write about a village in Nepal. But right next door to Nepal is India, which has 90,000,000 English speakers, which is 50% more than the entire population of the United Kingdom. The percentage of those people who want to know about a village in Nepal is going to be far greater than those in England. Should we not cater to them as well? — Preceding unsigned comment added by Aervanath (talkcontribs) 08:45, 3 June 2008
Actually, I would be quite happy to substitute India for England in that example. The point is the remoteness of the village, not its proximity to an English-speaking population. If "a far greater percentage" means 0.000001% rather than 0.00000001% then a bot-generated article/stub will still have a usefulness of virtually zero. The systemic bias that I am talking about arises from people's desire to learn, and creating articles not to be read is only a cosmetic excercise. Scolaire (talk) 09:01, 3 June 2008 (UTC)
I guess we just disagree at a more basic level then. I think you underestimate people's basic curiosity, and I think that these articles WILL be read and improved upon as Knowledge's user base continues to grow and expand. English has the most speakers (native or otherwise) out of any language, and it is the language of communication, business and diplomacy practically everywhere. As the Knowledge that caters to these readers, world language right now, we have a responsibility to serve all of them equally, not just the native speakers.--Aervanath lives in the Orphanage 09:39, 3 June 2008 (UTC)
This can be said about any topic, it doesn't just affect place names. As said elsewhere, why not create a concerted translation effort of all the place articles that non English speakers have no doubt created on other wikis, if this is the real concern. MickMacNee (talk) 12:21, 3 June 2008 (UTC)
Actually I proposed a WP Translation sister project a while back for translation between more language than babelfish and those sites have which would make trnalsation between wikipedias a huge benefit, but Jimbo never responded. Personally i think breaking down the language barrier is perahps the most important step towards knowledge accessibility ♦Blofeld of SPECTRE♦ 12:46, 3 June 2008 (UTC)
I do mean a human effort though, machine translation is next to useless. MickMacNee (talk) 12:51, 3 June 2008 (UTC)
Yeah, I've seen some horrific pieces of work handed in at schools and Universities which are so obviously machine translated that it's actually quite funny to read, if disturbing to think the students couldn't be bothered to do it themselves. Fritzpoll (talk) 12:57, 3 June 2008 (UTC)

I'll just echo what I said above about bias, since I think it's worth considering:

We don't solve systemic bias by having an equal number of articles about ill-covered countries - we solve systemic bias by covering those countries better - both in the articles which already exist, and - where we have enough sources to create substantial articles - by creating new articles about topics in those countries. The bias directive is to improve the coverage, not the article-count. Girolamo Savonarola (talk) 17:52, 5 June 2008 (UTC)

Do topics have a "right" to exist as articles?

I think that's what's at the heart of this discussion. I personally don't think that atomic-level topics need to exist on every little thing when the merging, redirection, or collation of such information creates a more useful and complete article. There is nothing inherently great about expanding our article count, but having more encyclopedic information - and more importantly having a higher average quality level - is going to have a large bearing on the public perceptions on and success of this encyclopedia. Exponentially expanding our stubs is not the way to achieve this, nor is it realistic to expect them to be quickly dealt with in any case, given the number of active editors currently on-hand wiki-wide, much less the small percentage of those who would be willing to edit these articles. Adding millions of new stubs into the system without any regard for how the information is organized and broken down into bite-sized articles merely dilutes our overall quality level, and thus creates a much larger system-wide negative factor for all editors, readers, and outside observers (to varying degrees) which cannot be equally mitigated by the small additional utility generated by several million substubs (of which less than 1% are likely to be improved beyond stub-level within any reasonable timeframe).

On the other hand, organizing this information more responsibly at a higher regional level, and in lists and tables, where the statistics can not only be laid bare, but actually used for easier and more useful local comparisons, will be able to maintain a higher standard of article quality, generate a far more manageable set of articles, and have a higher utility than a vast array of stand-alone stubs. Furthermore, settlements which can be expanded further and have a human editor interested in pursuing so, can always still split off into their own articles (while presumably leaving the tabled data intact on the regional page). Everyone benefits.

We don't get points for a raw article-count, so why is this being treated as such? Average article depth and quality is the metric we need to be keeping our eyes on. Girolamo Savonarola (talk) 01:01, 3 June 2008 (UTC)

Lists are good summary tools, but content doesn't exactly blossom on a list, or at least not the same way it does in an article. If the subject is notable, it deserves an article, and then we allow time for the articles to grow. Besides, this proposal will only allow the articles to be created when the relevant WikiProjects have become involved, so it's not like these articles will just get abandoned. --NickPenguin(contribs) 01:15, 3 June 2008 (UTC)
We merge things all the time - in this case, the articles are bot-operated prosified statistics. There's no good reason why they couldn't be rendered as tables within lists, and then when someone wants to expand them, they can be split off, with human input and further information. Therefore any topic which an editor is willing to actually substantially improve would become an article, while the vast number which many of us are worried will not undergo any significant edits, will not be left in abandoned substubs. Girolamo Savonarola (talk) 01:17, 3 June 2008 (UTC)
Perhaps this is where we do not see eye to eye. You envision one or two editors contributing a substantial amount of high quality content, while I envision dozens of anonymous users contributing a handful of sentences and corrections. We all dream of Knowledge being perfect tomorrow, but we forget that it wasn't built yesterday. --NickPenguin(contribs) 01:28, 3 June 2008 (UTC)
I am intrigued by the proposal and am less than vehement in my opposition to it, but I do think it's not the best way to move forward. For the reasons Girolamo Savonarola laid out above, I would support it if it were a question of bot-generated tables. With such a scheme, many or most—perhaps eventually all—of the entries for which an article would be appropriate could get their own articles in due time. Full articles likewise could be based on stubs, but there is an advantage to that only if one believes that stubs have significantly more intrinsic value than items on a list, and I suspect that wouldn't be the case for most of the two million stubs that are proposed. The majority of them, I believe, would still be stubs a decade later.
The primary disadvantage I foresee could well be a serious one, and that is vandalism. Currently, vandalism is controlled to a large extent because the editors who create articles or make the major additions which turn stubs into articles monitor them on their watchlists. Who will watch two million new stubs? Bots cannot do it alone, at least not effectively, and there aren't enough human editors to do it.
I, for one, do not dream of Knowledge being perfect tomorrow or ever, but I do sincerely hope it will continue to improve in the years and decades to come. Perhaps it wasn't built yesterday, but it was built in a remarkably short length of time, and that shows all too often. The way I see it, genuine improvement will not happen rapidly because of better tools, no matter how brilliant their design and clever their implementation; it will happen incrementally because of human editors treating it as a labor of love—taking meticulous care over details, emphasizing quality over quantity and accuracy over speed, and taking the time to get it right. Rivertorch (talk) 05:41, 3 June 2008 (UTC)

(<-)You're right, Rivertorch, better tools will not automatically cause genuine improvement. But they do make the task of genuinely improving the 'pedia far, far easier. The discovery of fire didn't automatically make our ancestors' lives as good as we have it today, but it created the potential where none was there before.--Aervanath lives in the Orphanage 07:22, 3 June 2008 (UTC)

Also, as for the "right" of topics to exist, see the discussion RyanReich started, above at #Inherent notability.--Aervanath lives in the Orphanage 07:22, 3 June 2008 (UTC)

I quite like Girolamo Savonarola's suggestion of creating preliminary lists and tables that can be used a springboards to creating genuine articles. Such a procedure would also provide an opportunity to structure and prioritise which articles should be created. For example, I think it would be worthwhile to prioritise creating articles on Third World places, since they are virtually non-existent at the moment. But the key is this: it is far better to have only a few accurate articles than to have myriad unreliable articles - particularly on "unfamiliar" subjects such as Third World places. I'd like to mention, also, that there are existing Internet resources that provide the type of geographical information we're talking about; one is the Getty Thesaurus of Geographic Names (TGN), which provides the most accepted form of each placename, variants, map coordinates, bibliographic sources, and often historical notes about the place (here's an example, for Ougadougou). TGN is scholarly, well-funded (for this type of academic endeavour), very carefully thought out and maintained, but still faces the ongoing problems of keeping up to date with name changes, political developments, etc. If Knowledge is going to start a wave of new articles of this kind I believe they must be at least as good as those in TGN or else there's no point in doing it at all. Pinkville (talk) 19:35, 3 June 2008 (UTC)

Looking through TGN, it seems like a fine resource but I'm not too impressed, nor am I convinced that it will be that much help in most cases. Ougadougou is a capital city, so the entry is relatively good. Compare our article on the city though--it's much better already. TGN is not terribly comprehensive (96 inhabited places for all of Cambodia), and generally the entries seem to give only a placename, the place in an administrative hierarchy, and coordinates (see this typical entry for Kântuŏt Sâmraông)--just what people here have been complaining about as insufficient. Mangostar (talk) 20:28, 3 June 2008 (UTC)
Also, for small villages like that one, its source is just US govt data anyways--why not go straight to the horse's mouth? Mangostar (talk) 20:30, 3 June 2008 (UTC)
You're highlighting the problems I have with this bot proposal. TGN has been around for many years, with funding, expert documentation personnel, etc. and it is still far behind achieving a global vision, keeping up with political changes, etc. Since en:Knowledge also doesn't have a truly global vision, the prospects for a bot-initiated project to create unbiased, accurate, reliably-sourced, globally comprehensive, and useful articles are rather gloomy. Any project to add a minimum of real information on thousands of places is going to have to rely on multiple (not always equally reliable) sources (including UN, TGN, other organisations) and this will require humans to evaluate each case - particularly for Third World places where the information is thinner on the ground. Pinkville (talk) 20:41, 3 June 2008 (UTC)
Agree with the point above, that we don't want too many micro articles. Unfortunately, and I speak from some experience here, there are several parts of the world where we have few if any really interested or knowledgable editors, so we won't have the people to evaluate each case in all cases. Having said that, I think that the current arrangement, where people from each group generate the list of place names to have articles created for, and presumably in the process exclude names which they're dubious about qualifying for articles, will probably address at least in part some of those concerns. It will require having a few people willing to go through the effort of trying to determine which populated areas in Tonga, for instance, qualify as notable and verifiable from other sources, but I think that there probably are enough editors willing to do a little work in that area to get most of that done within a few months. China, India, Russia, and the other huge countries might actually wind up being the countries where it's harder to find information on all the settlements, but they should have more editors willing to work on them as well. John Carter (talk) 20:54, 3 June 2008 (UTC)

Is NGIA data reliable?

How much do we actually know about NGIA data and where they get their information from? If we're going to base thousands (hundreds of thousands?) of new articles (even partially) on this source, we'd better be sure we know what we're getting. Kaldari (talk) 19:29, 3 June 2008 (UTC)

Such information must not come from a single state-sponsored source. The United Nations or independent academic (therefore subject to peer review) sources (such as the TGN I linked just above) are more likely to be free from political or strategic bias. Pinkville (talk) 19:39, 3 June 2008 (UTC)
Wow, TGN seems to be quite a useful resource. I would feel much more comfortable using a source such as that than NGIA, which doesn't seem to cite any of it's data. Kaldari (talk) 19:53, 3 June 2008 (UTC)
NGIA I think is based on the US Defense Mapping Agency and similar US government attempts to find out just where every potential bomb target in the world is. OK, they probably had other reasons too, but much of the data was compiled by the Defense Department. So, considering it is US Government, although I can't swear to this, I'm assuming many/most of the coordinates are determined by the military's satellite network, and I personally tend to think that's probably about as reliable a source as we can get. The only places where they might have a clear COI is regarding domestic locations, but the US locations had been run by a bot several years ago and are probably, basically, already finished. John Carter (talk) 20:19, 3 June 2008 (UTC)
I'm kind of confused by worries about "political or strategic bias". The coordinates can be checked against a satellite map to confirm that a village actually is at the location given. If there is a village at the coordinates listed, I can see no political or strategic reason that would cause the US govt to misname it. Mangostar (talk) 20:24, 3 June 2008 (UTC)
But we're not only talking about map coordinates, other data has been discussed for inclusion. Regardless, when non-single-state sources are readily available, reliable, and provide multiple sources, they should be used under these circumstances to underscore NPOV. Pinkville (talk) 20:30, 3 June 2008 (UTC)
"I can see no political or strategic reason that would cause the US govt to misname it.". Access to reliabable data might be a problem, though, even for US government agencies. Yaan (talk) 20:33, 3 June 2008 (UTC)
One simple example. Myanmar is a politically-charged name that has been internationally adopted as the official name of Burma - even though many (probably the majority) Burmese oppose the name (being associated with only one ethnic group in the country). Because the military government has been supported by various nations (e.g. the US, et al), the use of Myanmar by the latter parties tends to support this régime - and so you have a political/strategic reason to "misname" the country. Pinkville (talk) 20:55, 3 June 2008 (UTC)
I was just addressing "political and strategic bias" because it was brought up by Pinkville. As a practical matter, basically all the statistical data about small towns and villages will necessarily come from national censuses and hence single-state sources. As for TGN, as I noted in a separate section above, it includes relatively few entries and often sources its data exclusively to US govt databases anyways. I am more than open to concrete criticism of the US data where problems have been demonstrated, but I think at this point no one has shown such problems as would disqualify it from being a WP:RS. It appears to be the best and most complete data set that exists. Mangostar (talk) 20:41, 3 June 2008 (UTC)
But notice that the discussion has already implicitly excluded information from non-governmental organisations, which often have more reliable (but not necessarily officially sanctioned) information from empirical evidence on populations, etc. My point in mentioning TGN wasn't to suggest it - alone - as an alternative to US government sources, but to indicate that any one source will never be adequate, and so relying on one is problematic. Furthermore, to rely on only US governmental sources is to adopt whatever systemic biases they have and to implicitly suggest - yet again - that the US speaks for the world. So I'd favour using UN data from the start - with other sources, including NGOs. Pinkville (talk) 20:55, 3 June 2008 (UTC)
One point about TGN. Yes, they use US government data at times, but they also research its reliability and neutrality. Pinkville (talk) 20:57, 3 June 2008 (UTC)
Perhaps we are not too far from agreeing here. However, I am still totally lost on how a site that only lists geographical coordinates can have any "systemic bias" whatsoever. As for the US speaking for the world, it seems totally irrelevant whether we use US sources or non-US sources as long as they are reliable. I am all in favor of using as many sources as possible, but the fact that the NGS is willing to list locations in its database solely on the basis of their inclusion of the same US database that we will be using (unless you think they somehow know that the US govt coordinates for these 96 towns in Cambodia are more reliable than US govt coordinates for other towns in Cambodia) indicates that the NGS thinks the US govt source is reliable, and we should too. Mangostar (talk) 21:04, 3 June 2008 (UTC)
No, I don't have any problem with map coordinates being taken from a US government source, I simply think that any additional information should come from other sources/a range of sources. From the start I think it's important to keep in mind that the appearance of a global vision is an important component of attempting to represent a global vision. Fundamentally, I'm concerned that already marginalised countries don't get short shrift in this kind of project (one of the reasons I'm against the bot proposal), and I'm concerned that a kind of slippery slope could materialise in which accepting one (or another) official government source for certain (potentially politically significant) information might lead to allowing biases to stand for real information. Humans creating one article at a time can more easily control for such problems (as difficult as it is, regardless), but with a bot, I can see thousands of nuggets of disinformation filling in existing gaps in Knowledge and standing uncorrected for very long periods of time. To give a more concrete example, it's routine for marginalised peoples to be undercounted in (even well-conducted) census figures; in my country, Canada, native people, homeless people, the poor, etc. are slightly under-represented in the census because they don't have the same access to the communication tools (mail, telephone, internet, etc.) that are or might be used to count them (and this has important economic and political implications in terms of funding for services, etc.). Nevertheless, Canada presumably operates an efficient and comparatively unbiased census... but what are the chances for countries that have fewer resources, more corruption, ongoing internal conflicts, etc.? And what if the main sources we intend to use are based in a country or countries with deep involvement in the circumstances of such already unreliably documented nations (e.g. what quantity of salt should we take with figures produced by any US governmental organisation concerning Iraq?). I just think these are some of the issues we should keep in mind while devising any scheme to create articles on places in such an automated fashion. Pinkville (talk) 01:28, 4 June 2008 (UTC)

To bring up Mongolia again, NGIA has a lot of "populated places" whose names contain "Hüryee", "Hiid", or "Dugang" (monasteries and temples), most of which were either destroyed or renamed in the late 1930s. They also have some populated places that contain "örtöö" (postal horse relay station), a name that seems also very much out of fashion now. A quick google maps survey by me, of places whose names contains "Hüryee", gave the impression that most of them are inhabitated, renamed or even that the NGIA location is dozens of kms off.

What is maybe worse (but maybe also more specific to Mongolia) is that they call the district centers by their official names rather than by the official name of the district. Mongolians will almost universally use the name of a district both for the district and the district center. If you ask for the right way to "Jargalant" in Khövsgol, people will always send you to the district center of Jargalant district (the official name of which is Orgil), and never to the district center of Tömörbulag district (the official name of which is Jargalant). Same goes for locally-made motorist or tourist maps. NGIA's "BGN standard" does it just the other way 'round. This is not a case of bias, just a case of inclusionism in the case of monasteries and horse relay stations, and poor data in the case of district centers. Yaan (talk) 18:07, 4 June 2008 (UTC)

French communes, anyone?

Back in February Blofeld, AlbertHerring and myself created thousands of articles on towns in France. The issue was raised at the AN, and Blofeld etc. were allowed to do so. What makes, say, Burkina Faso different? Because we, not a bot created them? We certainly acted like bots, creating 4 articles a minute when we could've been doing other things. Whether or not the article is created by a bot or a human is no different: it gets created nonetheless. They started out worse than this and look at some of them now. Also, many of the involved Wikiprojects have very few editors (chech out Knowledge:WikiProject Gabon)...I'm an Editorofthewiki 21:03, 3 June 2008 (UTC)

So now Knowledge has Skin-jobs?  :) davidwr/(talk)/(contribs)/(e-mail) 21:56, 3 June 2008 (UTC)

Size limit - poll or algorithm?

First of all, may I say—and I should have said it 24 hours ago—congratulations on putting together a well-thought out and workable project. My one reservation remains that in essence the scheme puts the cart before the horse. Instead of wikiProjects being given the means to create the articles they want, they will be given the articles and must learn to want them—"you marry for duty; love comes later." The key factor in closing the gap between the needs of the users and the output of the bot will be the lower limit of size. I suggest there is a need to develop a robust algorithm to determine the optimal size limit. The current proposal, asking the users, seems to me rather like developing a traffic plan for a major city and then doing a vox pop on the best places to site traffic signals. An algorithm should take account not only of current size limits on the wikiProject, but of related factors such as the quality of the average article at the lower limit, and unrelated factors such as the number of active members (on geography-related articles) and the frequency of article improvement drives. There should not be a need for a separate algorithm for every single wikiProject, but at the same time I would not favour a one-size-fits-all model. I have written above about systemic bias; I believe the algorithm should to some extent incoporate systemic bias. In other words, the requirements of Knowledge:WikiProject Georgia (country), for instance, will be different from those of Knowledge:WikiProject Ireland. My own suburb of Dublin has its own article, of which I'm quite fond, but I would be very surprised if WikiProject Georgia were equally enthused about an article on an equally undistinguished suburb of Tbilisi. In summary, an appropriate cut-off, scientifically arrived at, provides the best chance of creating those articles—and only those articles—that will most benefit Knowledge. Scolaire (talk) 21:16, 3 June 2008 (UTC)

I would like to respectfully disagree with the above editor. The proposal, as I read it, calls for the lists of potential articles to be created be generated, but then the various involved WikiProjects will be given the opportunity to pare down any names whose notability they can't be sure of and remove them from the final list of articles for the bot to create. Granted, they could create those articles anyway, without the bot, but in a lot of cases the other sources available, like travel guides and history books, won't have the data the bot will. If I'm wrong, I would appreciate clarification of where, but that was the impression I received. John Carter (talk) 21:49, 3 June 2008 (UTC)

Stubs? A deletionist dream!

  • In wikipedia experience, it appears that the attempts to automatically create stubs is a deletionist's dream to quickly post AfD on the article based mainly on low attendance and, secondarily, notability. If deleted, that would later create havoc for people who would want to reinstate the article and then go through the deletion review. This seems as a trend now to make it harder to create articles on certain topics. Having those articles auto-made means it is just a matter of elapsed time before the AfD nominators want to stroke a win ego with easy pickings. — Dzonatas 23:36, 3 June 2008 (UTC)

See Knowledge:Articles for deletion/Amurn. The deletionist challenge has already begun! MWAH-HAHAHAHA! (sorry, evil laugh reserved for Blofeld; never again). But seriously, this really could disrupt the project majorly. I'm an Editorofthewiki 01:39, 4 June 2008 (UTC)

If the community approves the bot, then I think deleting its articles will be more difficult based solely on the fact that "consensus has already determined that these articles should be created". *shrug* That's what I'd hope at least. --Falcorian  02:52, 4 June 2008 (UTC)

I think the Philippine WikiProject will opt out of this

The editors of Philippine-related content had agreed on a tentative consensus that barangays--the lowest level of political unit in the Philippines, currently totaling to 41,995, and are roughly the equivalent of villages in other countries--do not inherently deserve articles. In the next higher level of political units--municipalities and cities (total: 1,631)--all of them already have articles with 2000 census info and locator maps. So this proposed bot job will have very little use for our particular needs. --seav (talk) 07:26, 4 June 2008 (UTC)

Can I point out that this drive is intended to cover countries which have very poor coverage (much of Africa, Asia and Latin America) and the weaker parts of Europe, not those countries like Philippines and Italy which already have articles on all of the smaller towns or communes, Thankyou ♦Blofeld of SPECTRE♦ 09:58, 4 June 2008 (UTC)

Ah, but there is no mention of which countries are actually included in the scope of this bot discussion. Also, there are no articles on 99% of barangays in the Philippines and barangays are the closest equivalent of the villages that this bot project is intended to create articles on. So I'm right to assume that the Philippines is part of the intended scope of this bot project, and I'm preempting by saying that the Philippine WikiProject would likely opt out of this project given that this project intends to consult the various country-based WikiProjects. --seav (talk) 10:26, 4 June 2008 (UTC)
Your're completely missing the point of what is being proposed. It is exactly the responsibility of the new wikiproject and bot group to discuss with the wikirpojects first whether places are notable enough to start. When we come to the Philippines we ask "Hello WIkiProject Philippines. Which articles are notable or need starting? You reply "99% of barangays ar emissing, but they aren't worthy of note", so we say "OK, moving on with the next country", plain and simple. ♦Blofeld of SPECTRE♦ 10:56, 4 June 2008 (UTC)
No, I get the point. That's why I created this section: to essentially say "skip the Philippines." Read the title of the section. --seav (talk) 11:11, 4 June 2008 (UTC)
I suppose what Blofeld means is that it is not yet time to look at exactly which countries will be excluded from the scope of the bot. There are other things to consider first. Waltham, The Duke of 21:56, 4 June 2008 (UTC)

I think this raises a good point, the very countries that wikipedia supposedly has systemic bias against that this will supposedly fix, are the ones that are likely to have no wikiproject editors with the knowledge to work with this bot, and many that do are probaby already done. I think giving example countries might be a very good idea at this time. MickMacNee (talk) 22:36, 4 June 2008 (UTC)

Um, I think that's actually all countries, although Mongolia doesn't have one specifically devoted exclusively to the extant nation of Mongolia. John Carter (talk) 16:47, 5 June 2008 (UTC)
Actually, I noticed that you're the only member for a whole lot of those WikiProjects, so I guess you're going to be hearing from us a lot ;P --NickPenguin(contribs) 22:52, 5 June 2008 (UTC)
I agree, many were created as subprojects of the existing regional projects, at least in part as ways to allow editors who have an interest in only one country in a region to be able to keep track of the articles of interest to them. All those you mention do however use the regional projects banner. And, it should also be noted, many of these are comparatively new projects. However, the fact that there is so little interest in so many countries, however, is I think one of the reasons why it makes sense to use the automatic creation method, unless we want to continue to see huge, gaping lacks of content regarding several parts of the world. John Carter (talk) 22:56, 5 June 2008 (UTC)
I am inclined to agree. I also think there's something to the adage that if you build it, they will come. No one (or very few people, it seems) want to join a WikiProject with only one or two articles to edit. --NickPenguin(contribs) 22:58, 5 June 2008 (UTC)
No, if there not enough editors interested in doing articles on a country, then no one will join the project (except of course JC). Johnbod (talk) 02:23, 6 June 2008 (UTC)
Probably right, although it should be noted that, to date, many haven't done much to publicize themselves. However, as stated elsewhere, it's probably better to have one comparatively inactive project, with a common banner, than a similarly inactive project with its own banner which has to replaced with that same common banner somewhere down the road. However, that would, I think, make it more important to ensure that the articles relating to those areas which are of encyclopedic merit, if not particular popularity, to be geneated somehow, even if by bot if that is the only practicable way to do so, otherwise we will continue to have fairly large, rather unpleasantly obvious, gaps in our coverage, which most would probably prefer not to see. John Carter (talk) 02:45, 6 June 2008 (UTC)

Notable houses

There are articles about individual houses (see James B. Simmons House for example). So, articles about many small villages would also satisfy WP:Notability. The size of a village is not a consideration. The notability is.Biophys (talk) 04:09, 5 June 2008 (UTC)

It does not follow. I could use your argument to say that there are articles about individual people and so articles about many small groups of people (e.g., show bands) would also be notable. If there are small villages that are notable, then it's because they are noted by many secondary sources and not because there exists articles about houses. --seav (talk) 04:52, 5 June 2008 (UTC)
I'm unclear as to which side of the debate Biophys is actually espousing. Girolamo Savonarola (talk) 04:54, 5 June 2008 (UTC)
I *think* what Biophys is saying is that, if a house can be notable, then size is irrelevant. So in other words, we should be creating articles only on notable places, rather than arbitrarily trying to saying "any place with a population over X is inherently notable". I didn't make a !vote in the straw poll, but I must say I agree with this reasoning, and also am against increasing the page count by so much simply to satisfy /one/ issue of systemic bias -- but no others. ♫ Melodia Chaconne ♫ (talk) 13:38, 5 June 2008 (UTC)
Let's have a bot to create an article on every house in the world ;-) Scolaire (talk) 08:17, 5 June 2008 (UTC)
Be careful, county tax assessors have databases on every property in their county, and some of these are online. Phlegm Rooster (talk) 16:45, 5 June 2008 (UTC)
I suggested one for every star in the galaxy in the Straw Poll. 100,002,000,000 articles here we come! Ferdia O'Brien /(C) 14:20, 5 June 2008 (UTC)
If you have a couple of publications, specifically about this star, you can go ahead and create an article. Same thing applies to the houses, to villages, and to anything at all. The bot should be looking for publications about the places (I am not sure if it does), and if he finds something, an article can be created. There was a similar discussion with regard to User:ProteinBoxBot. See here. The references about proteins were taken from a database, which resolved the issue.Biophys (talk) 16:42, 5 June 2008 (UTC)
Yeah NASA, ESA and RSA all have databases detailing most of them, a bot could references these easy. Ferdia O'Brien /(C) 17:06, 5 June 2008 (UTC)
I do wonder, though... If there will be an extra-terrestrial civilisation a few million years from now a few million light-years away from here (the current ones cannot have noticed us yet), and that civilisation will produce an encyclopaedia much like we do, will our Sun be notable enough for inclusion? And if our planet and civilisation actually find their way into such an alien, future compendium of knowledge, what monumental act of stupidity will humanity have done to manage to attract inter-stellar attention and thus become eligible for inclusion? Food for thought, eh? Waltham, The Duke of 23:22, 5 June 2008 (UTC)
There is a serious question here whether the present size of a village or settlement is directly relevant to its notability. There are or have been several ghost towns which have little if any population today, but are of extreme historical notability which would deserve a separate article. What I am going to try to do with many of the areas, starting I think with Anguilla, which I've already done, is add on the list of geographic articles a notation on a specific other source which can be used to help establish notability. Maybe that's the way that they could all be done. The bot could then choose those names which already have other, specifically indicated, sources to establish their notability, and when the bot is done creating the articles those references can be added by whomever to help establish their notability. John Carter (talk) 16:52, 5 June 2008 (UTC)
I have started becoming rather confused by the entire discussion. Researching notability for articles does not seem very massive or robot-like to me, but rather a part of the usual, manual article-creation process. And one would think that notable places will have been mostly covered anyway, despite the general coverage level for settlements at large; this kind of information is more atlas-like than anything else. Although the point about ghost towns is a valid one, the only way I can see it usefully applied on a 'bot process is to take not the current population of a settlement as standard for inclusion, but its highest ever. The alternative source could be used to lengthen and enrich an article, but since the articles created by the automatic process are not supposed to be deleted, how would notability thus established affect anything? Waltham, The Duke of 23:22, 5 June 2008 (UTC)
Actually, for what it's worth, it seemingly isn't the case that several locations in Africa in particular have gotten any coverage, despite in several cases being rather large cities, because (1) until the recent creation of the Africa project banner, it was less easy to keep track of articles, and (2) in many of those countries in particular there haven't been many editors interested in the subject in the first place, and they have often been more focused on other matters, like wars, governments, famines, and such. The only thing I'm seeing based on the proposal is that, at the least, any new articles about places which are notable, but not in countries which have gotten much attention, will be more than just a single sentence stub, which many of the current similar man-made articles are. Possibly, if the bot can be adjusted to it, and I think it probably can, a second run can be made later to ensure that many of those existing weak articles contain more information than they now do as well. The same AutoCreation technique was used regarding several US Congressmen a few years ago as well, as I've seen having edited some of those articles for the first time, noting that they have hidden text about being autocreated in them. Unfortunately, what people think should happen often times isn't what really happens. I grant you that several of the autocreated stubs may be no more likely to get attention than many of the existing stubs made by humans are, although in several cases many of these missing articles may be more important than those. But I think it does make sense that we at least make an effort toward addressing some of the weaknesses we do have, by what seems, in several cases, to be the most effective way possible. John Carter (talk) 23:47, 5 June 2008 (UTC)

End the discussion already

It's quite clear that the majority of viewpoints on this bot are from a philosophical standpoint, and therefore unresolvable if there enough people to blindly support it, which they do so easily and without much supporting text, because there is no tangible 'harm' from the proposal. i.e. thousands of untouched bot created stubs that all look the same will do no harm to wikipedia. And there is actually no consequence for anyone who does support or oppose it, as anyone can walk away and never be affected by this issue again, the stubs created will still be there.

People don't appear to be willing to address the major issues raised by this bot about changing how wikipedia creates articles, uses redlinks, encourages stub development, or determines if something is 'worthy of note' (although yes, there is a mini discussion above). People supporting seem content to treat approval of this bot as an issue soley pertinent to geo-stubs, as if there isn't a wealth of other database resources on notable topics out there.

If you look at the wikiprojects of some of these supposedly discriminated against and under-represented countries, the only members are Blofeld and one or two others, which casts doubt on the proposed wikiproject particiapation model. The resonse from wikiproject Phillipines only cast further doubt on the merits of this assertion. With Blofeld's many comments on this page, this bot looks to me more and more like a personal tool to make his percieved urgent mission of documenting every place on earth easier (despite the fact wikipedia is in no rush, which is oddly quoted on the support side, but not the opppose), as a counter against the proliferation of pokemons or some other perceived current failing of wikipedia.

I personally do not buy the argument that there are millions of users out there just waiting for someone to create a stub of their town, so they can edit it. Nor are there people out there castigating wikipedia for the lack of these stubs. But if others do think that, then lets discuss that for what it is, a fundemental change to the principles of how wikipedia serves the public, and get it enshrined somewhere more concrete than a village pump discussion. Because as I understood it, editors are readers, and vice versa. It is the encyclopoedia you can add to, not the encyclopoedia you wait for others to create the basic framework of before you read it/expand it.

Frankly, no one on either side has any practical or documentable evidence for either position, the debate is currently very much an exercise in assertion without proof. I can point at Category:Villages in Botswana as supporting my position, where a single user has mass created 222 virtualy identical stubs by hand, and based on a significant sampling taken before I got bored, not a single one has been expanded since, while one was Afd'd and rejected because "everywhere is notable". And Botswana isn't exactly lacking a healthy amount of citizens/expats with an internet connection, even if they don't necessarily live in these villages. They are all linked to and not orphaned, but only through the use of templates. The only two substantive articles I found, with real world knowledge, and not just statistics and location information (barring the creation info on the presence of a primary school), had actually been created before the mass stub project, so needing a stub didn't seem to put those users off creating/editing. However, supporters will merely point out that these 222 articles have only existed for a few months, and they can presumable all be found on an atlas.

So I am saying, even though I oppose it from a philosophical standpoint, let's just approve it now, and move on. With the lack of a higher discussion, if it's a success, great (but in my opinion we'll never really know), and if it's a failure, well, where's the actual harm? (as long as someone recodes special:random) MickMacNee (talk) 16:01, 5 June 2008 (UTC)

The general purpose of having a "stub that may never be expanded" is to fill a hole: Such a stub should be created when the absence of an article on the subject weakens Knowledge. This was much more important in the early days, and still has importance in areas that are still "greenfield" such as topics relating to some 3rd world countries. If, for example, INSERT_COUNTRY_HERE had no city-articles for the cities other than its capital, then it would be imperative to create stubs for the other major cities: Not having such articles is a glaring hole in Knowledge. It would be desirable but not mandatory to create stubs for lesser cities, and at some point down at the village or community level you could argue it isn't even desirable.
The question of "where to draw the lines" between "we have to have it, even if it's a stub that may never be modified," "it would be nice if we had something, even if it were only a stub," and "if you can't say anything about it more than a stub, it's a waste and doesn't belong" doesn't have a consensus answer. However, if we or the editors in interested WikiProjects can agree on a group of geographical locations that almost everyone agrees should have an article, even if there are many more locations that do not have consensus or which by consensus are not worthy of a stub, then we can proceed to create those for which there is a consensus to create a stub.
As for the other issues you raise, such as weblinks, yes, those need to be hashed out. davidwr/(talk)/(contribs)/(e-mail) 16:21, 5 June 2008 (UTC)
The lack of interest in a higher discussion is the most disappointing part of this debate, and the majority of supporters in the poll seem to support not because they (visibly) reject the opposing arguments, but because they don't care to take part and are willing to, as you say, let what will be, be as long as it does no harm to them. Unfortunately, the use of a poll (even combined with an unstructured debate) in this proposal has led to the appearance of a democratic process, which it is not. Perhaps the debate would have been the same without the poll, but then there would be no excuse for ignoring what is clear anyway: there is no agreement here. Ryan Reich (talk) 17:20, 5 June 2008 (UTC)
Part of the reason there is no agreement is that several of the parties involved have voiced opposition based on a very incomplete, if not potentially in fact inaccurate, view of what is being proposed to be done. A clear idea of what is being proposed can be found at the Knowledge:WikiProject Missing encyclopedic articles/Places/anguilla page, which I think clarifies that in fact not all the names being listed are going to necessarily have articles created, but only those which can have independent reliable sources attesting to their notability. On that page, I have added notices regarding the comparatively few which I have to date been able to verify meet such independent notability criteria. The others which don't have such established notability would not be expected to be created by the bot. However, having the relevant data available on such a sub page, should such an article be created later, wouldn't necessarily be bad, as it would allow the information to be more easily inserted after the article was created. John Carter (talk) 17:31, 5 June 2008 (UTC)
Dually, an unknown but seemingly large number of the supporters, which include some of the major customers of this bot, also insist that the bot's scope be wider than this (though it's hard to tell at this point). I do approve of what you're exhibiting, and if it were clear that this is what would be done, I think everyone else would too. That it's not clear is a combination of it not being explicit in the proposal, and the desires of the more ardent inclusionists. Since the support and opposition seem to meet at this point, perhaps a third vote is in order to determine if everyone can agree on it, assuming of course that enough people aren't already tired of the subject. (As for this higher debate I wish would happen, perhaps it is hasty to expect it to happen all at once, right here. Anyway, consensus in the third vote would be an important part of that debate.) Ryan Reich (talk) 18:06, 5 June 2008 (UTC)
I can only say that, for what it's worth, on a lot of the smaller countries, like those in the Caribbean and Oceania particularly, and some in Africa and elsewhere, I think the lists of places to be created are going are definitely going to likely be both created and reviewed by me personally before implementation, as I am often one of the few if not only individuals who have shown any notice in those areas. I ain't exactly looking forward to the amount of work that might be putting me in for, but I can say that I can say that only those which have so far as I can see at least two or three reasonable paragraphs of information available from some of the geographic encyclopedias or numerous travel guides, both of which I have fairly easy access to, will be moved by me into the "sourced" sections. John Carter (talk) 18:39, 5 June 2008 (UTC)
Hence my comment about the proposed idea of mass participation of wikiproject members appearing misleading looking at some projects and this discussion, as between you and Blofeld, this looks more like a handy tool for completing a personal one or two-man mission, and not a community/wikiproject requested initiative. Several top level topic wikiprojects barely receive regular attention from a few core members in my experience. Not that there is anything wrong with that, or is there?. Do two men make a wikiproject? It's an interesting question. MickMacNee (talk) 19:28, 5 June 2008 (UTC)
And, dare I say, a completely irrelevant one, particularly as there are 14 people to date who have signed onto the project as per section 3 of this page, which evidently you didn't see before. If the above is some sort of attempt to indicate that the choices made will be biased in favor of creating articles, then I strongly suggest that you actually look at the list I linked to, in which about 1/3 of the articles suggested actually qualify as being notable. The real question, and perhaps a bit more of an interesting one, is whether this wikipedia is intended to maintain a systemic bias, based on the interests of the existing editors. The majority of the opposition to this proposal seems to believe that should be the case. Alternately, we could, at least potentially, try to at least appear to be interested in presenting an accurate summary of the world, something I think most of us would agree we should be doing. And, perhaps, by creating these articles, the other extant projects, including the historical ones, the other geographic ones (like Rivers, Mountains, Volcanoes, and the like), may work on these related articles as well. I agree that there is little if any sustained activity in any number of fields. The perhaps most interesting question, to my eyes, is whether we are going to decide that, on the basis of lack of participation of the existing editors, we are going to allow a very large segment of the known reliable, verifiable, information on clearly notable parts of the geography of the world, which is probably one of the most important aspects of any encyclopedia, to not be included because of lack of interest of the majority of the editors. I honestly think that the result of this discussion may well decide that question, for good or bad. John Carter (talk) 20:29, 5 June 2008 (UTC)
As said, you're just repeating philosophical assertions that can't be proved or disproved with facts and can only be argued against with philosophical assertions. There is still the continuing insistence that this mission to document every part of the world is priority number one for wikipedia because if we don't, somehow wikipedia is so obviously broken. All this despite the fact the opposing argument that these permastubs will take forever to be developed, is rebutted with the fact that wikipedia is in no rush and wikipedia is imperfect. Can you honestly not see how this has more of the look of a personal mission of zeal than any coordinated effort to fulfill the goals of wikipedia (which are to present knowledge, not information). The newly signed up project participants are to be applauded, it doesn't change the fact that projects like Lesotho etc etc, have but two members. MickMacNee (talk) 23:02, 5 June 2008 (UTC)
You know, I have never seen anyone make a statement such as you have said above, that documenting every part of the world be priority number one. You, however, seem to have seen it as a "continuing" matter. As I have indicated above, regarding the Anguilla page, in fact 2/3 of the places mentioned do not, so far as I can see, qualify. In short, sir, if I may be so bold, it seems to me that all you are doing is repeating your own philosophical assertions, and even going, possibly, further, in apparently referring to facts of this "continuing" matter which I for one have never seen. Also, regarding the matter of the projects with few members. You will find that historically there have been several projects, many with their own separate banners, which became inactive, and had to be merged into some other project. I think Eritrea was one such. By creating these projects first, using a single banner, it became much less likely that the same would have to be done in the future. Also, specifically regarding African projects, you would know, if you had looked into the history of the matter, there were no projects for any of the Subsaharan African countries other than South Africa for some time, even though there were several editors interested in perhaps only one or few countries, but no way for them to keep track of the articles relevant to that country. Evidently, you saw fit to neither AGF or investigate the subject to find that out. Regardless, however, I think we would all be better served if you might actually confine your own philosophical commentary to the matter at hand, rather than raising these almost completely irrelevant impugnings of others. John Carter (talk) 23:37, 5 June 2008 (UTC)
I'm frankly not sure what you're on about with the first part above, but the second, I think a lack of apparent project members, and now you're saying entire country projects as well, is entirely relevant given the repeated explanations above about how the bot is a mere part of this whole process to be run in conjunction with wikiproject members. MickMacNee (talk) 23:59, 5 June 2008 (UTC)
You admitted above that you do not know much about a subject which you still feel free to jump to conclusions about. I strongly suggest that you make an effort to become familiar with matters before jumping to conclusions about them, otherwise your comments are without substance and can be, at least potentially, seen as being irrelevant. In the case of the existing countries of Oceania, Africa, the Caribbean, and the like, the work could be done in conjunction with the continental or regional project, of which the others are subprojects, whatever their names, and I'm assuming that in fact they would be. You did notice that in almost all cases they are listed as subprojects of a larger project, right? John Carter (talk) 01:23, 6 June 2008 (UTC)
I find my interest in any conversation wanes when it descends to the level of 'you don't know what you're talking about'. Like I said above, I've got to the point where I don't give a crap if it goes ahead or not, because there is no way that a support or oppose opinion can be factualy made. Whether there are 2 or 2000 Oceana project members actually waiting for this bot, can probably be judged by the number of them that have arrived at this discussion. MickMacNee (talk) 01:37, 6 June 2008 (UTC)
If you have no interest in the subject, as you said above, and apparently have no interest in familiarizing yourself with it before jumping to conclusions anyway, then may I suggest to you that you perhaps leave the conversaion to those who do? And I note once again that you feel just as free to jump to conclusions regarding the activities of others. I'm assuming, based on your comments, that you verified which if any projects the various supporters and opposers of the proposal above belong to. I know from direct experience that at least several members of the Africa project voted above. While there is no Oceania project, which I assume you knew, there are Melanesia, Polynesia, and Micronesia projects, and they don't have many members, or, likely, many articles to be created from the bot, so I would assume their interest would be less. But I do see some evidence above that the most relevant continent, Africa, has had some of its most active editors supporting this. Oh, and I'm still waiting for you to point out where someone said this was the most important task pending for wikipedia, or however you phrased it above, by the way. John Carter (talk) 01:43, 6 June 2008 (UTC)
Which members? How many? For an entire continent. And Blofeld's comments make it absolutely clear to me that he sees this bot as a priority task for wikipedia, but I'm repeating things I've already said. 02:09, 6 June 2008 (UTC)

Ahh! Diametrically opposed philosophical views are flying! Whatever can we do? Why don't we pick a name for a dedicated WikiProject, and with a fresh slate, the interested parties can find concensus on the fine details of the bot. We all promise not to do something crazy. --NickPenguin(contribs) 02:14, 6 June 2008 (UTC)

Personally, I think the project already exists, Knowledge:WikiProject Missing encyclopedic articles. It is to a subpage of that project that we're placing the lists of places. I have also noted that a few other projects have expressed concerns about their own missing articles. WikiProject Mammals, for instance, has indicated on its talk page that it still has several articles that it believes to be notable which have yet to be created. If, and that is a big if, we can find some sort of public domain source to at least begin articles on the ones which that project believes are more or less required, and I think they were considering those to be the genus as opposed to species articles, then it could potentially help there as well. The same may well be true for any number of other projects. And, like I said, I think that at least I am going to try to ensure that, however long the lists are, I am going to try to ensure that only those which I can verify through sources at hand to me definitely have enough potential content to merit at least a very good stub, and probably a start, will be those which I request be created. However, I do think that it would also make sense to check with all the various geographic projects, including the WikiProjects Rivers, Lakes, Glaciers, Mountains, Volcanoes, and others to see if they wish to participate as well. I know the sites we're using list other geographical features as well, and if nothing else we might be able to add infoboxes to those that don't have them and at least a little content to those which are weak stubs. John Carter (talk) 12:58, 6 June 2008 (UTC)
Finally some recognition from someone that this has nothing to do with systemic bias. This issue affects all topics where databases can be found and stubs created using bots, which is a fundemental change to the way wikipedia creates articles worthy of more discussion than the drive by support votes stacking up above about 'bias' being the reason to do this. There's absolutely no systemic bias to the argument that we don't have enough coverage of mammals etc etc. MickMacNee (talk) 13:08, 6 June 2008 (UTC)
No, that isn't what I said at all, either regarding your first contention that there isn't a systemic bias or the second that the Mammals project case I mentioned is even remotely relevant to this current discussion. I mentioned that only because, like I said, if there is some interest in having similar work done elsewhere, and Mammals was the example that came to mind, then, when this problem is addressed, there is no good reason to think that the same could not be done elsewhere. John Carter (talk) 14:44, 6 June 2008 (UTC)
They are the same thing, using a bot to increase the coverage of wikipedia by auto suggesting articles from databases, exposing the systemic bias argument as a mere makeweight for making changes that go further than just correcting bias in geo-projects, despite the fact no one has an effective argument on whether this supposed bias is a good or bad thing presently, or if it is a priority to fix quickly with bots, or just another facet of the encyclopoedia that anyone can edit, which will improve as we move towards the deadline. MickMacNee (talk) 16:47, 6 June 2008 (UTC)
To address the concern that stubs will never be improved/expanded, can the bot be allowed only if the related-country project has atleast 10 members with 500 edits (or more) and have been active in the last 6 months. This way the project will have to invite/establish members first before requesting a bot-run. Thoughts? Regards, Ganeshk (talk) 23:20, 6 June 2008 (UTC)
Doesn't even approach the inherent problem (pardon the pun). How about it can only be used if all members of the project take an oath that they will never attempt to claim that geographic locations are inherently notable, and will only use the bot to create articles about places that have been examined directly and in detail in multiple independent third-party sources? Kww (talk) 00:04, 7 June 2008 (UTC)
Uh, while we're at it, why don't we make them take an oath that they are not and never have been a member of the Communist Party? They can claim whatever they want.--Prosfilaes (talk) 01:48, 7 June 2008 (UTC)
I'm not sure if my point flew right past you or your point flew right past me, so I'll repeat without sarcasm: the chief problem with this bot is that will be used by people that have declared their topic area exempt from any need to prove notability.Kww (talk) 02:44, 7 June 2008 (UTC)
Are these people who, somehow, have failed to figure out how to create new articles? If they create non-notable articles, you put them up for AfD. People creating lots of non-notable articles is a problem that the community knows how to deal with.--Prosfilaes (talk) 03:56, 7 June 2008 (UTC)
One of the tiny places in Category:Villages in Botswana was put ujp for Afd, and speedy kept after about 3 votes as 'all places are notable'. I'll leave you to find which one it was. MickMacNee (talk) 11:53, 7 June 2008 (UTC)
So what you're saying is that consensus is that these articles are notable. Change that, and then AfD will clear them out. So long as that consensus stands, then the subject of the article is notable.--Prosfilaes (talk) 23:46, 7 June 2008 (UTC)
10 members? Each project could get potentially thousands of new articles. This is my main issue with creating articles en masse like this. It takes a few seconds for the bot to create them, but adding meaningful content (something other than statistics and other numbers) can take quite a while, gathering sources that may only apply for one article and writing actual prose. If the project gets as few as 1000 articles, and each takes 20 minutes to expand (meaningful expansion), each of those 10 members is going to have to spend more than 30 hours expanding. Mr.Z-man 03:19, 7 June 2008 (UTC)


  • Actually, Prosfilaes, the community has proven that it cannot deal with massive quantities of articles generated by people that don't recognize that their articles aren't about notable things. Just look at the massive problems in dealing with television episode articles ... twice through arbcom, and so far, the only editors to be punished are the editors that are trying to stem the flood. I really think that it is important to prevent this problem, rather than try and fix it after it has expanded even futher than it already has. Kww (talk) 12:10, 7 June 2008 (UTC)
  • Look at television episode articles; it's completely appalling how hard it is to delete articles that are verifiable and of interest to our audience. Articles which the community agrees aren't about notable things disappear quite quickly; it's just that things which the community doesn't agree on are a lot harder to delete. If someone made articles for every garage band in Oklahoma, AfD would quickly clean them up.--Prosfilaes (talk) 23:46, 7 June 2008 (UTC)

Systemic bias?

I have to say that I'm not following the "systemic bias" angle on this. It seems far more likely that if there is any such bias lurking about (which I'm a bit dubious about anyway), michanical and quantity creation is simply going to fulfill the definition of a computer as "a device for the amplification of human error". We're simply going to favor the bias of whatever source it is that we use for the data, except amplified in numbers and made hard to combat because of "garbage in, gospel out." Mangoe (talk) 21:30, 5 June 2008 (UTC)

The "systemic bias" refers to the fact that, for better or ill, the majority of the editors of the english wikipedia, which is this, tend to be from English-speaking countries, and as a result tend to have a disproportionate interest in, and access to, sources relevant to the English speaking world. As has been pointed out by others, there are often well-developed articles in other languages about locations which this version of wikipedia completely lacks. This, which I personally think may well be just the first possible run of this bot, seems to me to be one of the few available steps to try to counter that problem. With any luck, we may be able to establish interlanguage links on subjects which do have articles elsewhere, and in that way encourage editors in those languages, where they can, to develop the content here as well. I don't think anyone at this point is proposing that articles on all subjects listed in any atlas is what is being sought here, just a way to begin to at least address the perceived weakness the english wikipedia has regarding several parts of the world. And, of course, if other publicly available sources of information can be found later, or if the proposed project, shown above gets enough interested parties, we may well find other sources of information which can be added as well. Such as also done, at one point, when a bot autocreated articles about former members of the US Congress. But it does seem to me that use of a bot might well be one of the few available ways to address the apparent systemic bias of the english wikipedia regarding several parts of the world. John Carter (talk) 22:51, 5 June 2008 (UTC)
A bot autocreated articles about former members of the US Congress? Did that not increase systemic bias? Do we now have pro- and anti-bias bots? Or is it just a scheme to increase volume by alternately creating and counteracting bias? Scolaire (talk) 23:41, 5 June 2008 (UTC)
Yeah, several years ago, before I started here, so I know this only second-hand. It used the existing public domain bios available from the US Congress website. If there were similar public domain bios available on leaders of foreign countries, it might be possible to get a bot to generate them as well, but I'm not entirely sure how relevant it is to raise that earlier, separate matter regarding this one, as I personally can't see any direct relationship between them, other than, in the way I used it, to indicate that we have autocreated articles on other notable subjects as well. John Carter (talk) 23:52, 5 June 2008 (UTC)
I don't think that the congressbot is really quite analogous since at least, if it is agreed the all congresscritters, past or present, are notable (and I have my doubts), we can at least be assured that they exist. That's really my concern about this: that I don't think in general that we have a source of geographic information that is reliable enough.
But the bias issue turns the other way around. We had some issues a few years back about articles on train stations, where there were distinct national differences of opinion on what was notable enough to have an article. The whole thing was torpedoed by the fact that in two countries people had gotten so many articles entered that it was impossible to tell them that they'd gone overboard, particularly since the Americans were pushing higher standards of notability, and they could be safely accused of some-kind-or-other-centrism in applying their standards to other countries. That the proponents of these other countries were being equally arrogant in dodging criticism in this manner couldn't be discussed. Mangoe (talk) 02:59, 6 June 2008 (UTC)
You may have a point regarding overcreation of stubs. However, like I said before, I think we will probably be following the model I at least initially suggested with the Knowledge:WikiProject Missing encyclopedic articles/Places/anguilla page. As I said, there I separated out those particular listings which I can verify meet WP:NOTABILITY requirements, and, personally, those are the only articles I would request be created. Those in the top section I can no personally attest have enough information on them to meet notability requirements. If the same is done with the other areas, then I don't think that problem will necessarily arise, and it does look like there are enough people interested to make it likely that any such attempt will be reviewed. The other listings, if they're near one of the created articles, can be mentioned in that article, but wouldn't be given an article of their own without themselves meeting notability requirements. Granted, I and a few of the others who have signed on have a lot of work in other areas in front of us as well, and I can't guarantee how quickly if at all the articles will get the additional sourced material, but doing things like this can I think make it much more likely that non-notable locations won't be included. However, unfortunately, it still is the case that Olosega County of American Samoa still doesn't yet have any sort of content, and several other similar articles on even that basic level of articles don't exist yet. John Carter (talk) 14:50, 6 June 2008 (UTC)

PLEASE - no more micro stubs

Having trawled through Knowledge on and off for a while now, there is nothing more frustrating than thinking you are about to read an interesting article about some obscure thing or place only to find that all it lists is the thing's name and location.

Please do NOT generate pretend articles which are just reformatted rows out of a massive catalogue of limited data in each row.

By all means generate list articles showing the data for all places in a certain subdivision of a country, in a wikitable, for example.

I DO support automatically generating NON micro stubs where data from more than one source can be integrated. Preferably from more than two basic sources.

Why do we not try to generate start class articles ?

Peet Ern (talk) 13:45, 6 June 2008 (UTC)

If I may, I believe that to create a bot that automatically creates a start class article as a rule would be difficult to say the least. I do believe that there should be enough data on some database that would allow us to create say a two or three paragraph (even if only a couple sentences each) on the details of the cities that this bot is designed to create. Once the initial article is built we can them tweak the bot to add more references, add infoboxes (if not done on the initial build) add more data to the article etc. The key is to start somewhere. Based on what I have seem above some have raised concerns about what cities it will build in what countries. I recommend we pick 1 country initially, set some criteria for the bot (such as what the minimum population should be) and see what it looks like. Maybe Iraq or Afghanistan given the high volume of edits to articles related to these areas. There are still plenty of cities or towns (even large ones) that have not been added to wikipedia from these 2 countries. Just my 2 cents but I say we go for it and let the bot start working. Worst case we have a few extra stubs, best case we add a few articles that generate interest in wikipedia from the citizens of the listed town or city.--Kumioko (talk) 22:36, 6 June 2008 (UTC)

Consensus - time to assess

You'll all notice that my contributions have been noticeably absent from this page. Partly, this resulted from work committments, but I also think it gave the community more time to respond to the proposal. This debate is not a vote - I objected to the introduction of a straw poll, but was, in essence, overruled! :) As such, I have actually asked an uninvolved admin to assess the consensus of this discussion. I will not overrule the consensus, as I hope many of you will appreciate (especially those participating at my recent RfA). I await the 'result' with interest, and will comment further once an opinion has been rendered Fritzpoll (talk) 14:03, 6 June 2008 (UTC)

Hello, uninvolved admin here :). This debate is closeable, in my opinion, at this point. However to do it full justice will require considerable time and effort, and likely I will not be able to make a judgement regarding consensus until Monday, in order to do some off-line analysis and due to RL commitments (it is the weekend !). I would ask that if anyone in the community has concerns that either;
  1. I am not un-involved and not able to make a judgement on this, or
  2. That there is still significant further debate to be had that has not already occured, or
  3. That this needs wrapping up sooner than I am able to
please make notification here. I don't want to spend several hours on this only for an editor to express concerns that I may be biased or unable to consider impartially here. Ta! Pedro :  Chat  11:31, 7 June 2008 (UTC)
I would only add that I personally think that there might be further modifications to the procedure by which the articles to be chosen to be generated might be selected, and that those modifications, if enacted, might both potentially increase the number of potentially creatable articles, and, at the same time, decrease the number of articles actually created. I base this on the idea, which I think is at least slightly substantiated, that several populated places, such as harbors, might not be listed as populated places, but rather as harbors or whatever, in the various sources. I am e-mailing the creator of the bot with the possibly lengthy and meandering proposal of the revisions I am suggesting. John Carter (talk) 15:09, 7 June 2008 (UTC)
Alas, I'm not going to close this, but would like to give expanded reasoning and hopefully some insight. Firstly, some notes, to save them being wasted;
  • I find well over 200 different voices here, indicating an acceptable proportion of the active community has commented to make consensus discenrable.
  • I find the debate has has suitable coverage to make Wikipedians aware of it's existence
  • I find the conduct of all editors who have kindly taken the time to comment to be of the kind of collegial level we should expect, and my respect goes out to everyone here - although it made this summing up that much harder!
At first glance, this looked like consensus was for the revised bot to go ahead. However working through the arguments in oppose this afternoon I became less certain that there is consensus. Opposers to the idea have generally been challenged, whereas supporters have not been (as is common during Knowledge debates). However the opposers seem to have presented robust arguments to their position when challenged. There is considerable strong feeling amongst the opposers, and whilst there is also strong feeling in support I find it, overall less strong than that in opposition.
Consensus is not a numbers outcome. If it was then this would have consensus to move on without issue. I believe clear consensus here can only be founf if not only was there be considerable community support (which there is) but also that a significant ammount of the opposition would be less passionate about the outcome - e.g. "I strongly oppose it for X Y and Z reasons, but if it goes ahead then that's the way it is". I don't find this here. I find significant reasoned opposition that has great disquiet about the concept. Consenus is not at all totally clear, particularly if only the above is taken into account.
However, the supporters cannot be ignored due to a lack of challenge to their support (although there was much, it appears to be more related to wider ranging issues and not the one at hand). Aditionally, one must also factor in that supporters of a given recommendation are often less strong and vocal in their words, but are still passionate in their support - just feeling or assuming that they do not need to be verbose in defining their reasoning. Most importantly, given the level of attention this conversation has received, given that this is a debate that is unlikely to attract "fly-by" comments due to its deep policy/guideline/technical nature, I find a lot of community support. I believe that although there are issues, and this is a tight call, there is probably consensus to move forwards, despite the accurate and passionate concerns of the opposers.
Why I must recuse myself from closing. Reading back and forth I find I believe that there would be more help than harm in this revised proposal. Were I to have come to this page with the sole intention of commenting I would support the proposal. Given that I have been convinced by the arguments that this should move forth, and given that I, on balance, believe consensus is, in fact, for this to move forth, I must recuse myself. My apologies, but I cannot in good faith now act neutrally, although it would be easy for me to pretend otherwise.
I trust that my notes and comments above will be of assisatnce in resolving this debate, one way or the other. Pedro :  Chat  20:18, 7 June 2008 (UTC)
Thanks for your careful reading of all this stuff, comments and integrity in recusing yourself. From my perspective, this proposal can only be judged by actual results from a limited trial run, and I would urge the proposer to arrange for such a trial in a location of his (or the discussion group's) choice, using (as best he can) the constructive comments and suggestions upthread. It would be interesting to see if the bot could produce something useful. Cheers, Pete Tillman (talk) 23:20, 7 June 2008 (UTC)
Well I'm not sure if it's happened in the last three weeks, but if not, having read the above - Pedro for BUROCRAT!! Thanks for the analysis! Franamax (talk) 20:45, 8 June 2008 (UTC)

I too, have been asked to close this discussion. I would be happy to do so. As I have not been involved with this issue it will take me a several hours to read through it and formulate a decision. I should be finished no later than 36 hours from now. -- SamuelWantman 00:47, 8 June 2008 (UTC)

At the end of the day, the proposal will be passed, (and as above I don't care either way to be honest, because it honeslty doesn't actually have a quantifiable harmfull effect on wikipedia), but in the process, a whole bunch of established principles about how wikipedia works, or what wikipedia is, are going to be junked unilaterally, just because nobody will take this debate on at the level it belongs, above the village pump level, and would rather pretend this is just about a pet project for geo-stubs. MickMacNee (talk) 15:21, 8 June 2008 (UTC)

A bunch of established principles are going to be interpreted in different ways than you would, yes. "Unilateral" means by one party, but the interpretation of policy and changes of that interpretation are happening by general consensus. I find it bizarre that you can look at this page and think that anyone can miss the large arguments about notability; I find it incredibly unlikely that anywhere you choose to post this discussion that it would matter, that it wouldn't have the same outcome; cities are considered inherently notable by general consensus, and the idea of inherent notability useful within limited bounds.--Prosfilaes (talk) 17:45, 8 June 2008 (UTC)
I find it incredible you think I was just talking about notability. You have quite clearly missed the point and have not followed this discussion at all. But I do find it odd that you apparently think any Botswanan village is notable, when the majority here actually dont, even though they support this project. I think you need to do some background research on this topic to be honest, instead of giving half arsed opinions about consensus as you have above. MickMacNee (talk) 18:20, 8 June 2008 (UTC)
Why did you waste your time writing that? By "the level belongs" did you mean insult?--Prosfilaes (talk) 20:10, 8 June 2008 (UTC)
No, but again, this fact is clearly passing you by. MickMacNee (talk) 20:16, 8 June 2008 (UTC)
And if this proposal is passed, there's nothing to say that the opposers can't join the discussion in the WikiProject. Their voices would be a welcome balance, if anything. --NickPenguin(contribs) 18:05, 8 June 2008 (UTC)
Qualified Support I have been highly critical of this project from the start, but after receving assurances from this thead, and messages to my talk page, I am fairly confident that enough has been done to take this from village pump level to "higher up". There is enough consensus hear to indicate that the problems I have may not be resolved, but there is no point using any further argument. This now has to go the next level of Knowledge, it's too big for the VP to say "yea" or "nea".doktorb words 20:24, 8 June 2008 (UTC)
Qualified Support I opposed, but I think the proposal is sufficiently modified to address many of the opposers' concerns. If this is taken on step-by-step with appropriate reviews and central posting so opposers can monitor the progress, we should do it. Franamax (talk) 20:49, 8 June 2008 (UTC)
Qualified support. I've said all along that I would support the proposal if there were an assurance that articles would be created with discretion and an eye towards provable notability. I don't think there's much point in obstructing any progress to twist the arms of the people who will be implementing it until they swear an oath to do as I say, and I trust reasonably that the new WikiProject will have at least some parties who will speak for moderation. Pete Tillman's suggestion, above, to make a trial run is a good one, and should be the first order of business when things get started. Ryan Reich (talk) 21:30, 8 June 2008 (UTC)
Comment - If there is a trial run, I think it may be with American Samoa, based on the Knowledge:WikiProject Missing encyclopedic articles/Places/American samoa page and the data we can gather relating to it. What I have received agreement from the bot operator about is that we will try to first verify which places have enough sourcing to be created in their own right, and then add to that created article a little information on the other places which seem notable enough to be mentioned in that article. So, as an example, if Fatua`äiga Catholic Center isn't found to be sufficiently sourcable to have its own article, it might be included in the list of locations on Tatuila Island. John Carter (talk) 17:37, 9 June 2008 (UTC)
Qualified support. I welcome this motif, which reflects my own point of view. However, I think we should leave the closer of the discussion to base his conclusions on the entire discusssion. Geometry guy 22:11, 8 June 2008 (UTC)
Um, is there a reason we're doing some sort of mini-straw poll here? If we really want the closer to base his conclusion on the whole discussion, why are we bolding !votes in this section? Mr.Z-man 22:58, 8 June 2008 (UTC)
I agree. Geometry guy 23:16, 8 June 2008 (UTC)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.