Nov 20 2020 06:28 AM
I've added a simulator package and started Train but the training never moves off 'Connecting simulators' In the list of Simulators, my package called 'block' shows and when I start training, 5 more simulators are listed, all called 'block Unmanaged' So, instances are starting, but are flagged as 'unmanaged'? In Azure I can see are 2 separate Azure 'Container instances' created, one with 1 container and the other with 4 containers. I've looked at the logs of the instances and they all say 'Registered' and 'Idle' (see image attached). So, for some reason the training doesn't seem to be able to connect to the instances. Any advice appreciated.
Prior to pushing the docker image I've been able to use this image as a container and successfully train the brain using this unmanaged simulator. The image is python:3.9-slim with the bonsai api and common added. The Workspace and Access Key are currently set in code.
Attached screenshots...
Thanks
Nov 20 2020 01:39 PM
Hi @Cliff_Evans can we please have some more information. i.e workspace id, brain name, brain version information. We would need this information to look up for the simulators which are causing an issue for you.
Nov 21 2020 02:38 AM
Hi @shivanshi thank you for your response.
Is still have the same problem. To try and debug the problem I have uniquely renamed the various pieces from my original question. The new names are shown below and new screenshots attached.
To recap - the issue is that I have a brain (Block) that when I attempt to Train using the managed package (BlockPackage) it unexpectedly creates multiple unmanaged simulators called 'block' and is then stuck 'Connecting simulators' and does not proceed. I have nothing in the workspace (or repository) called 'block', but I did previously create an unmanaged simulator called 'block' when I was testing my simulator code locally - btw this testing worked and it successfully trained the brain. It's as though the training is stuck using this unmanaged approach. In my Inkling I have tried it with "package 'BlockPackage'" and without a package command such that I 'Select a simulator for training' - both routes produce the same issue. Also, bizarrely the training creates two sets of instances, one with 1 container, and one with 4 containers. When I look at the logs of any of the containers, the code is running and saying it is 'Registered' and 'Idle'.
My apologies in advance if I'm doing something wrong.
I suspect & hope, that if I delete everything and just start again it might be ok. But I'll leave it as is for now so you can take a look.
Many thanks
Cliff
WorkspaceId: ef6cc48c-9b24-452e-9a9f-8abba5ae8d8b
Brain name: Block
Brain version: v01
Managed simulator: BlockPackage
Container registry repository: blockimage
Teach inkling for simulator:
Nov 23 2020 09:57 AM
Hi @shivanshi
I've solved the problem. The error was mine.
With the intention of creating a simple simulator I used a variation of the example 'bare bones' simulator code on here https://github.com/microsoft/bonsai-common
Nov 23 2020 10:44 AM
@Cliff_Evans Great! good to see, you got it working.