Discussion:
[Xmldatadumps-l] Wikipedia page IDs
Renato Stoffalette Joao
2016-12-03 13:47:23 UTC
Permalink
Hi all.

Firstly, apologies for eventual duplicates or posting the question in
the wrong mailing list.

Secondly, could anybody kindly explain to me if some Wikipedia pages
changed their IDs from the past ? Or if so point to me where this might
be documented ?
I have Wikipedia pages-articles XML dumps from the years 2006 and 2008
and when I was parsing those dumps I ran across some situations
such as the following one. In the dumps from 2006 and 2008 I found that
the South Africa page has the ID 68854, while in the most current
Wikipedia pages-articles XML dump (i.e. 2016) the same article has the
ID 17416221.
I am trying to match some Wiki pages by IDs across time, but the example
above is not helping.

Much appreciated in advance for any help.
--
Renato Stoffalette Joao
- PhD Student -
L3S Research Center / Leibniz Uni.
15th Floor, Room:1519
Appelstraße 9a
30167 Hannover, Germany
+49.511.762-17759
John
2016-12-03 15:42:47 UTC
Permalink
It looks like the page was deleted/restored thus giving it a new page ID.
Originally when pages where deleted the page_id was not kept, which caused
a new page_id to be issued when it was restored. This phenomenon has since
been fixed, and should no longer happen.
Post by Renato Stoffalette Joao
Hi all.
Firstly, apologies for eventual duplicates or posting the question in the
wrong mailing list.
Secondly, could anybody kindly explain to me if some Wikipedia pages
changed their IDs from the past ? Or if so point to me where this might be
documented ?
I have Wikipedia pages-articles XML dumps from the years 2006 and 2008
and when I was parsing those dumps I ran across some situations
such as the following one. In the dumps from 2006 and 2008 I found that
the South Africa page has the ID 68854, while in the most current Wikipedia
pages-articles XML dump (i.e. 2016) the same article has the ID 17416221.
I am trying to match some Wiki pages by IDs across time, but the example
above is not helping.
Much appreciated in advance for any help.
--
Renato Stoffalette Joao
- PhD Student -
L3S Research Center / Leibniz Uni.
15th Floor, Room:1519
Appelstraße 9a
30167 Hannover, Germany
+49.511.762-17759
_______________________________________________
Xmldatadumps-l mailing list
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Federico Leva (Nemo)
2016-12-03 18:15:13 UTC
Permalink
Post by Renato Stoffalette Joao
Secondly, could anybody kindly explain to me if some Wikipedia pages
changed their IDs from the past ? Or if so point to me where this might
be documented ?
https://www.mediawiki.org/wiki/Manual:Page_table#page_id

Please avoid such massive crossposting for questions.

Nemo

Loading...