FAQ
Question: When I reserve a machine looking for a U250, I am assigned a node with a U250, but from within the pod, I can see the U250 and also the U280 (which is also present on that node). Is this intended behavior?
Answer: This issue has been fixed form kubernetes version 1.17, if you are using previous versions of kubernetes you can try updating your kubernetes cluster to the latest version. The previous problem is multiple types of FGPA cards or Shell on one node can not be handled by Kubernetes. You can Check the following link for detailed info: https://github.com/kubernetes/kubernetes/issues/70350
Question: When testing Vitis in AWS FPGA environment. There is an AFI agfi-069ddd533a748059b which is first loaded when I do the systemctl start mpd for the very first time when I boot the machine. Then, at the end inside my-pod when running the ./helloworld vector_addition_hw.awsxclbin, this time is the one associated to the vector_addition_hw.awsxclbin, AFI agfi-2 is loaded, and the one effectively used. Is both have “vector_addition_hw” in their name?
Answer: AWS F1 allows a user to use FPGA in two ways: traditional hardware design flow using HDK and 2) software like flow using Use SDAccel/Vitis. The flow you are using is the SDAccel/Vitis one. In order to differentiate the two flows, AWS came up with a device ID scheme that Xilinx adheres to. There is a Xilinx Run Time deamon named MPD that is started using the command “systemctl start mpd” as part of installing XRT. This daemon is required to download the prebuilt AFI “agfi-069ddd533a748059b” to allow AWS hardware to differentiate the SDAccel/Vitis from form the HDK flow. When user application runs, the AFI is replaced by the user AFI. Every time you reboot the machine, the MPD will be restarted and hence you will see that agfi-069ddd533a748059b got installed. Once the Accelerator Pod runs, it could install the AFI of interest.
Question: One application fails to run inside container, with possible error like “Failed to find Xilinx platform”. It can run well outside container. xbutil list show the device existing inside container, same as that outside container.
Answer: Linux is using ICDs ("Installable Client Drivers") to setup OpenCL. /etc/OpenCL/vendors/xilinx.icd is used to tell the ICD loader what OpenCL implementations (ICDs) are installed on the system. Some application is directly link to Linux system standard OpenCL lib but NOT Xilinx specified OpenCL lib. For this case, the OpenCL APIs in application will fail if the icd file is NOT set correctly.
The above problem is due to missing /etc/OpenCL/vendors/xilinx.icd inside container. Using following command to copy /etc/OpenCL/vendors/xilinx.icd (with one line “/opt/xilinx/xrt/lib/libxilinxopencl.so”) from host into container can solve this issue.
docker cp /etc/OpenCL/vendors/xilinx.icd containerID:/etc/OpenCL/vendors/xilinx.icd