Jump to content

WikiXRay Parser Options

From Meta, a Wikimedia project coordination wiki
user@machine:path$ python dump_sax.py --help
usage: dump_sax.py [options]

options:
  -h, --help            show this help message and exit
  -t STUBTH, --stubth=STUBTH
                        Max. size in bytes to consider an article as stub
                        [default: 256]
  --pagefile=FILE       Name of the SQL file created for the page table
                        [default: page.sql]
  --revfile=FILE        Name of the SQL file created for the revision table
                        [default: revision.sql]
  --textfile=FILE       Name of the SQL file created for the text table
                        [default: text.sql]
  --skipnamespaces=NAMESPACES
                        List of namespaces whose content will be ignored
                        [comma separated values, without blanks; e.g.
                        --skipnamespaces=name1,name2,name3]
  -i STRING, --inject=STRING
                        Optional string to inject at the very start of
                        articles' text; string must be provided within quotes
                        (e.g. --inject='my string') or double quotes
  -f, --fileout         Create SQL files from parsed XML dump
  -s, --streamout       Generate an output SQL stream suitable for a direct
                        import into MySQL database
  -m, --monitor         Insert SQL code directly into MySQL database [default]
  -u MySQL_USER, --user=MySQL_USER
                        Username to connect to MySQL database
  -p MySQL_PASSWORD, --passwd=MySQL_PASSWORD
                        Password for MySQL user to access the database
  -d DBNAME, --database=DBNAME
                        Name of the MySQL database
  --port=MySQL_SERVER_PORT
                        Listening port of MySQL server
  --machine=SERVER_NAME
                        Name of MySQL server
  -v, --verbose         Display standard status reports about the parsing
                        process [default]
  -q, --quiet           Do not display any status reports
  -l LOGFILE, --log=LOGFILE
                        Store status reports in a log file; do not display
                        them
  --insertmaxsize=MAXSIZE
                        Max size in KB of the MySQL extended inserts [default:
                        156] [max: 256]
  --insertmaxnum=MAXROWS
                        Max number of individual rows allowed in the MySQL
                        extended inserts [default: 50000][max: 250000]

Please, note that some of these options (like log to a file, skipping namespaces, and text injection), are not yet implemented, though they are going to be included in the following days.