Renato Stoffalette Joao
2016-12-03 13:47:23 UTC
Hi all.
Firstly, apologies for eventual duplicates or posting the question in
the wrong mailing list.
Secondly, could anybody kindly explain to me if some Wikipedia pages
changed their IDs from the past ? Or if so point to me where this might
be documented ?
I have Wikipedia pages-articles XML dumps from the years 2006 and 2008
and when I was parsing those dumps I ran across some situations
such as the following one. In the dumps from 2006 and 2008 I found that
the South Africa page has the ID 68854, while in the most current
Wikipedia pages-articles XML dump (i.e. 2016) the same article has the
ID 17416221.
I am trying to match some Wiki pages by IDs across time, but the example
above is not helping.
Much appreciated in advance for any help.
Firstly, apologies for eventual duplicates or posting the question in
the wrong mailing list.
Secondly, could anybody kindly explain to me if some Wikipedia pages
changed their IDs from the past ? Or if so point to me where this might
be documented ?
I have Wikipedia pages-articles XML dumps from the years 2006 and 2008
and when I was parsing those dumps I ran across some situations
such as the following one. In the dumps from 2006 and 2008 I found that
the South Africa page has the ID 68854, while in the most current
Wikipedia pages-articles XML dump (i.e. 2016) the same article has the
ID 17416221.
I am trying to match some Wiki pages by IDs across time, but the example
above is not helping.
Much appreciated in advance for any help.
--
Renato Stoffalette Joao
- PhD Student -
L3S Research Center / Leibniz Uni.
15th Floor, Room:1519
Appelstraße 9a
30167 Hannover, Germany
+49.511.762-17759
Renato Stoffalette Joao
- PhD Student -
L3S Research Center / Leibniz Uni.
15th Floor, Room:1519
Appelstraße 9a
30167 Hannover, Germany
+49.511.762-17759