Modify Python script to allow for more concurrent tasks (Debian Linux)
$30-75 USD
Cancelado
Publicado há mais de 12 anos
$30-75 USD
Pago na entrega
I have a Python script that creates multiple threads, when I generate more than 500 threads I get this output:
Traceback (most recent call last):
File "./[login to view URL]", line 391, in <module>
[login to view URL]()
File "/usr/lib/python2.6/[login to view URL]", line 474, in start
_start_new_thread(self.__bootstrap, ())
[login to view URL]: can't start new thread
At 500 threads I have low cpu usage and 15903092k free ram.
The purpose of the script is to download websites and scan them for keywords, essentially it is a web crawler.
It appears that the limiting factors are currently stack size and the global interpreter lock.
This project is to:
1. Remove the requirement to change stack size and set a maximum thread limit within the code. I suggest this is done by moving aware from a threaded design, but I'm open to discussion about this.
2. Overcome the global interpreter lock limitation of one cpu. The script must run on 8+ cpus.
3. Currently certain websites cause threads to segfault or hang. You need to implement appropriate error handling to allow the script to log an error and continue.