11- Follow Readme.txt for the benchmark to download and prepare the data.
Note: While downloading the data, make sure you have enough space. Tip: Use the /mnt/resource_nvme directory to store the data.
12- Run the following command to get the docker image name and tag.
Note the image name and tag associated with the benchmark you are running. <CONTAINER_NAME> in the next step is <REPOSITORY>:<TAG>
13- The command below runs the benchmark. Note that each benchmark has its own environment variables to set before we run. Please read the explanation of the variables to understand what value to give to each variable.
Run the command below to set the number of experiments to run
The variables in the above command refer to the directory structure created by the Data download and preprocessing steps.
DATADIR: Point this to the 4320_shards_varlength folder downloaded with the training dataset. DATADIR_PHASE2: Point this to the 4320_shards_varlength folder downloaded with the training dataset. EVALDIR: Point this to the eval_varlength folder downloaded with the validation dataset. CHECKPOINTDIR: Point this to a new results folder under bert data directory. CHECKPOINTDIR_PHASE1: Point this to the phase1 folder within the bert data directory.
DATADIR: Point this to the directory where RNNT data is downloaded. METADATA_DIR: Point this to the folder called ‘tokenized’ within the downloaded RNNT data. SENTENCEPIECES_DIR: Point this to the folder called “sentencepieces” within the downloaded RNNT data.