DRC scripts, reports and parallelism

Hi,

Hope you are doing well,

In the context of gf180, I've been experimenting with DRC, how to make it faster and improve quality of life from a GUI user point of view.
My initial issue was the speed of the DRC deck. For a somewhat simple full chip design, deep DRC, I was getting run times of more than 20 minutes on my beefy computer when starting the DRC from the GUI.
I knew that there should be some way to improve this, because CPU utilization was generally low, albeit with some large spikes. That gave me the feeling that the process was memory-bound instead of cpu bound.
The gf180 pdk (same with IHP) also provides a python utility, where each file was run in its own klayout process. This considerably improves speed (~5x), but suffers from a few issues:
1) Each run generates its own reports, checking for DRC clean is a nightmare
2) General layers have to be recomputed each time. While this does not always result in longer run time depending on number of cores, it is a certain limitation. Moreover, the general layers are duplicated in ram for each process, causing a larger memory usage.

I've experimented a bit and implemented my own DRC approach to try and solve both issues. It is implemented fully in ruby using forks, and has great performance (similar performance as the python script with untapped potential and less RAM usage).

Here are the currently issues preventing it from being merged to upstream PDK

GUI Cancel

Due to something in QT/graphics driver panicking when joining threads in GUI, I have to run the forking script in a klayout batch mode process. This means that (1) the project has to be saved to disk before running DRC, but more importantly that (2) there are currently no way of canceling a DRC run from the GUI.
(2) is because it seems there are no cancel flags exposed from the C++ to ruby API. From my understanding, the cancel action causes the C++ side DRC calculation function to throw an exception, which is caught from ruby. In my case, I do not think I can use this approach in a clean manner.

Report merging and report files

My first issue with drc reports is that there is currently no way of obtaining one without creating a file. That means that in case of parallel DRC runs, I have to create one temporary file per run, which is not so clean. I implemented a basic functionality allowing to get reports as a string rather than a file here, but I don't think it is very good. I then reverted to using files.

My second and bigger issue is that there is no way of merging report files together, so that it is presented cleanly in the GUI report view. In the present case, we only need a simple approach, as we are guaranteed that the top cell of all report files is the same. I've implemented this outside klayout here. Beware, that specific part is mainly AI (goal was proof of concept).

Conclusion

Do you think that this approach to DRC is sane?
How would you see the cancel functionality implemented? Is there some way I did not think of, or new functionality is need on the klayout side?
Same thing with the report file merging. If you guide me a bit (where to implement this function), I could try to make an MR.

Thanks!

Comments

  • Hi @Scafir,

    I'd not try parallelization inside the application. Qt and large parts of the application are not thread-safe. You need to provide strict thread isolation and there are too many side effects beyond your control. Specifically I don't trust the script engines and the native modules involved. A single breach in thread safety will give you bugs that happen rarely and are extremely difficult to debug.

    I think the safest approach is to split the job into subprocesses and join the results. I don't understand what is so bad about files. After all, that's still the most reliable and portable way to exchange information between subprocesses. Threads are fashionable, but the sharing of memory creates too many issues and I don't want to waste my time with hunting rare glitches. A subprocess is easy to kill.

    But I think you'd need some kind of DRC dashbord - some UI that shows the running subprocesses and their status and has a kill button to stop all these subprocesses. At least that's how I would envision some farm-enabled DRC.

    Merging of DRC reports: there is no single function to do that right now, but the report database has an API, so it should be possible to do the merging in a script. But a merge feature is an enhancement worth considering. There is a method called "apply" which is doing some attribute merge, but it's intended to applying waiver flags from a waiver database.

    A general note aside: DRC performance can usually be optimized significantly be rearranging the DRC deck and using alternative implementation options. GF180 suffered from some issues initially and I can't say if all my suggestions where implemented. I'm willing to take a look, but I'd need the (preferably large) sample layout and a pointer to the current GF180 deck. For me that's also an opportunity to optimize the engine. Right now, it is not strong in parallelization and have a bad core utilization usually.

    Matthias

Sign In or Register to comment.