Skip to content

OOM error caused by JSON-LD errors #403

@antoineeripret

Description

@antoineeripret

Hi @eliasdabbas,

If you crawl a website that happens to have a lot of JSON-LD parsing error, the following is triggered (source)

except Exception as e:
            jsonld = {"jsonld_errors": str(e)}
            self.logger.exception(
                " ".join([str(e), str(response.status), response.url])
            )

Which can lead to an OOM because the jsonld includes the full error string. If you have a dozen of pages, it doesn't really matter, but if you have more and a lot of errors, it does.

Do we really need to store the full error? Can't we use something simpler?

Happy to provide a PR, but I wanted to discuss the approach with your first :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions