-
Notifications
You must be signed in to change notification settings - Fork 53
OldStuff
shilad edited this page Aug 11, 2013
·
1 revision
- Universal parameters for all scripts:
[-c conf] [-l languages...] [-h threads]-c Sets configuration file to the specified path
-l Selects languages by language code to retrieve from wikimedia, separated by commas
-h Sets the maximum amount of processors to use for parallel language processing- Run the requestedlinkgetter.sh file. The parameters should be formatted to match the following:
[-o outputpath] [-n names...] [-d date]-o Sets the path to output the tsv file containing all the links
-f Selects types of dump files to retrieve, separated by commas
-y Sets the date to retrieve from. Files are retrieve from on or before this date- Run the filedownloader.sh file. The parameters should be formatted to match the following:
[-o outputpath] [-t tsvpath]-o Sets the directory in which to output the downloaded dumps
-t Selects the tsv file from which to read the download links- Run the dumploader.sh file. The parameters should be formatted to match the following:
[file ...]file Selects the dump files to load- Run the redirectloader.sh file. The parameters should be formatted to match the following:
[-d]-d Drops and recreates all tables and indexes- Run the wikitextdumploader.sh file. The parameters should be formatted to match the following:
[-d]-d Drops and recreates all tables and indexes- Run conceptmapper.sh file. The parameters should be formatted to match the following:
[-d] [-n algorithms]-d Drops and recreates all tables and indexes
-n Selects the algorithms to use to map concepts- Run universallinkloader.sh. The parameters should be formatted to match the following:
[-d] [-n algorithms]-d Drops and recreates all tables and indexes
-n Selects the algorithms to use to map conceptsOptional scripts:
- Run phraseloader.sh. The parameters should be formatted to match the following:
[-n analyzer]-p Selects the phrase analyzer to use- Run luceneloader.sh. The parameters should be formatted to match the following:
[-d] [-n namespace...] [-i index...]-d Drops and recreates all Lucene indexes
-p Specifies the namespaces to indexes
-i Selects the types of indexes to use, as described by the configuration file- Download dump files
- Obtain dump links
- Download files from those links
- Load the Dump as XML
- Convert Dump into RawPages
- Convert RawPages into LocalPages
- Mark Redirects to be dealt with after this process
- Resolve Redirects
- Load into Redirect Table, fully resolved
- WikiTextParser does the following
- load links into table with src/dest IDs
- load categories with the source article as a category member
- Load Concepts
- Load Concept Links
Optional:
- Load Phrases Database
- Load Lucene Database