ALS Computing Group Brings Machine Learning Models to Beamtimes around the World
From the types of samples to the techniques used to study them, user experiences at beamlines around the world can vary, but one commonality connects them: beamtime is precious. At different facilities, users encounter different beamline controls, and varying availability of compute infrastructure to process their data. Beyond needing to familiarize themselves with different equipment and software setups, they also need to ensure that they’re collecting meaningful, consistent data no matter where they are. For the past several months, the ALS Computing group has been traveling around the world for beamtime. Their firsthand experience is informing the development of a suite of tools aimed at lowering the barriers of access to advanced data processing for all users.
Today’s beamtime experience
As a beamline scientist at the ALS, Dula Parkinson has helped numerous users with microtomography, a technique that can yield ten gigabytes of data in two seconds. “In many cases, users won’t have done this kind of experiment or analysis before, and they won’t have the computing infrastructure or software needed to analyze the huge amounts of complex data being produced,” he said.
Computational tools and machine-learning models can help the users, from adjusting their experimental setup in real time to processing the data after the experiment has concluded. Eliminating these bottlenecks can make the limited beamtime more efficient and help users glean scientific insights more quickly.
As a former beamline scientist himself, Computing Program Lead Alex Hexemer has first-hand knowledge of the user experience. He was instrumental in the creation of a dedicated computing group at the ALS in 2018, which continues to grow in both staff numbers and diversity of expertise. A current focus for the group is to advance the user experience with intuitive interfaces.
Computing approach to beamtime
Recently, Hexemer and two of his group members, Wiebke Koepp and Dylan McReynolds, traveled to Diamond Light Source, where they worked with Beamline Scientist Sharif Ahmed to test some of their tools during a beamline experiment. “It is always useful to see other facilities from the user’s perspective,” McReynolds said. “We want our software to be usable at many facilities, so getting to test in other environments was very valuable.”
The computational infrastructure is an essential complement to the beamline instrumentation. To standardize their experiments across different microtomography beamlines, the team performed measurements on a reference material—sand with standardized size distributions. Each scan captures a “slice” from the sample; the slices then need to be reconstructed into three-dimensional images that contain 50 to 200 gigabytes of data.
Within that data, the researchers need to glean meaningful information. “We need to segment the data,” explained Hexemer. “This is sand. This is the vial holding the sand. This is air in between.” Identifying the segments allows researchers to more easily decide where to take the next scan—in essence where to move the beam to detect more sand and less vial. But, this type of analysis has traditionally happened after an experiment. That means that researchers might take more scans than necessary, because some scan parameters yield less insightful measurements.
Here, the computing group saw a need for users to assess the quality of their data in near-real-time. “The goal is to be able, at the moment when a scan comes in, to do some immediate analysis to inform the experiment further,” Koepp said. “Our goal is that, algorithmically, you’ll be able to greatly reduce the number of scans you need to take to get the same amount of meaningful data,” McReynolds added.
The seed for this idea has been planted; Berkeley Lab Staff Scientist Peter Zwart and his collaborators in CAMERA developed machine learning algorithms for segmentation. Through beamtime at Diamond and the ALS, the computing group is expanding the functionality and testing the robustness of the algorithms. “We’re replicating the experimental setup at different facilities as closely as possible,” Koepp said, “because different data processing steps, different exposure times, etc., could all potentially affect model performance.”
But, to take advantage of these algorithms, synchrotron users need to be able to access and use powerful computational infrastructure that can parse the many gigabytes, and even terabytes, of data.