{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":180136168,"defaultBranch":"master","name":"trafilatura","ownerLogin":"adbar","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2019-04-08T11:38:48.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/2125866?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1717259688.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"847e6c1d41205b908b1edb44d775d1a8f0e804ca","ref":"refs/heads/dependabot/pip/dependencies-98f4b8e004","pushedAt":"2024-06-01T16:34:48.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"build(deps): bump the dependencies group with 5 updates\n\nBumps the dependencies group with 5 updates:\n\n| Package | From | To |\n| --- | --- | --- |\n| [trafilatura](https://github.com/adbar/trafilatura) | `1.8.1` | `1.10.0` |\n| [html2text](https://github.com/Alir3z4/html2text) | `2020.1.16` | `2024.2.26` |\n| [html-text](https://github.com/zytedata/html-text) | `0.5.2` | `0.6.2` |\n| [justext](https://github.com/miso-belica/jusText) | `3.0.0` | `3.0.1` |\n| [resiliparse](https://github.com/chatnoir-eu/chatnoir-resiliparse) | `0.14.5` | `0.14.7` |\n\n\nUpdates `trafilatura` from 1.8.1 to 1.10.0\n- [Release notes](https://github.com/adbar/trafilatura/releases)\n- [Changelog](https://github.com/adbar/trafilatura/blob/master/HISTORY.md)\n- [Commits](https://github.com/adbar/trafilatura/compare/v1.8.1...v1.10.0)\n\nUpdates `html2text` from 2020.1.16 to 2024.2.26\n- [Release notes](https://github.com/Alir3z4/html2text/releases)\n- [Changelog](https://github.com/Alir3z4/html2text/blob/master/ChangeLog.rst)\n- [Commits](https://github.com/Alir3z4/html2text/compare/2020.1.16...2024.2.26)\n\nUpdates `html-text` from 0.5.2 to 0.6.2\n- [Changelog](https://github.com/zytedata/html-text/blob/master/CHANGES.rst)\n- [Commits](https://github.com/zytedata/html-text/compare/0.5.2...0.6.2)\n\nUpdates `justext` from 3.0.0 to 3.0.1\n- [Release notes](https://github.com/miso-belica/jusText/releases)\n- [Changelog](https://github.com/miso-belica/jusText/blob/main/CHANGELOG.rst)\n- [Commits](https://github.com/miso-belica/jusText/compare/v3.0.0...v3.0.1)\n\nUpdates `resiliparse` from 0.14.5 to 0.14.7\n- [Commits](https://github.com/chatnoir-eu/chatnoir-resiliparse/compare/v0.14.5...v0.14.7)\n\n---\nupdated-dependencies:\n- dependency-name: trafilatura\n dependency-type: direct:production\n update-type: version-update:semver-minor\n dependency-group: dependencies\n- dependency-name: html2text\n dependency-type: direct:production\n update-type: version-update:semver-major\n dependency-group: dependencies\n- dependency-name: html-text\n dependency-type: direct:production\n update-type: version-update:semver-minor\n dependency-group: dependencies\n- dependency-name: justext\n dependency-type: direct:production\n update-type: version-update:semver-patch\n dependency-group: dependencies\n- dependency-name: resiliparse\n dependency-type: direct:production\n update-type: version-update:semver-patch\n dependency-group: dependencies\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"build(deps): bump the dependencies group with 5 updates"}},{"before":"7e245aa323ebabd4d72ea7f37b3dc2a007e57650","after":null,"ref":"refs/heads/prepare_v1.10","pushedAt":"2024-05-30T15:45:29.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"bbf7bec12f2d0491c9d0dfc974dab822f4a8a65c","after":"b36b6fad68b02cef00d615c5a061e78b52504e6b","ref":"refs/heads/master","pushedAt":"2024-05-30T15:45:28.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"prepare version 1.10.0 (#608)\n\n* prepare version 1.10.0\r\n\r\n* fixes","shortMessageHtmlLink":"prepare version 1.10.0 (#608)"}},{"before":"3099393e8d1c79cf6b1503220a5b5a77e050f0c4","after":"7e245aa323ebabd4d72ea7f37b3dc2a007e57650","ref":"refs/heads/prepare_v1.10","pushedAt":"2024-05-30T15:34:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"fixes","shortMessageHtmlLink":"fixes"}},{"before":null,"after":"3099393e8d1c79cf6b1503220a5b5a77e050f0c4","ref":"refs/heads/prepare_v1.10","pushedAt":"2024-05-30T15:29:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"prepare version 1.10.0","shortMessageHtmlLink":"prepare version 1.10.0"}},{"before":"9569dad27dab2ebedac74856f339b60e95124ff5","after":"bbf7bec12f2d0491c9d0dfc974dab822f4a8a65c","ref":"refs/heads/master","pushedAt":"2024-05-30T12:51:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"Markdown fixes: table formatting (#601)\n\n* fix: do not add new lines in markdown cells\r\n\r\n* fix: markdown tables can only have one header\r\n\r\n* fix: add a space after text before a paragraph in a cell\r\n\r\nParagraph is a block level element so we would normally have a new line before\r\nit. However, here we are in a markdown cell and can't have new lines, so add a\r\nspace. Otherwise words will get concatenated.\r\n\r\n* fix: match maximum cell count on each row\r\n\r\nMarkdown does not support colspan, but at least this way we don't lose any cell\r\ndata.\r\n\r\n* fix: cells always need to append vertical bars\r\n\r\nCurrently there was a scenario where if a cell only contains a single

with\r\nsome text but no text directly, then vertical bars would not get appended for\r\nthat cells.\r\n\r\n* Fix table processing tests and add a few more\r\n\r\nAdded the following table tests in text format:\r\n- removing new lines in cells,\r\n- only allowing a single header row,\r\n- handling colspan by appending columns.\r\n\r\n* fix: remove row span attribute once it is no longer useful","shortMessageHtmlLink":"Markdown fixes: table formatting (#601)"}},{"before":"f4bbf10fa27a741b3cb2b3d64348d89f29566e94","after":null,"ref":"refs/heads/fix_stdin_binary","pushedAt":"2024-05-28T14:33:27.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"0170a9faff634f92204b63c02b2a5f56c11e2482","after":"9569dad27dab2ebedac74856f339b60e95124ff5","ref":"refs/heads/master","pushedAt":"2024-05-28T14:33:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"CLI fix: read standard input as binary (#607)\n\n* CLI fix: read standard input as binary\r\n\r\n* strip test output","shortMessageHtmlLink":"CLI fix: read standard input as binary (#607)"}},{"before":"c6e5a05c7a1ccc0b1889a9b0796a033c2634bd31","after":"f4bbf10fa27a741b3cb2b3d64348d89f29566e94","ref":"refs/heads/fix_stdin_binary","pushedAt":"2024-05-28T13:16:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"strip test output","shortMessageHtmlLink":"strip test output"}},{"before":null,"after":"c6e5a05c7a1ccc0b1889a9b0796a033c2634bd31","ref":"refs/heads/fix_stdin_binary","pushedAt":"2024-05-28T13:07:07.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"CLI fix: read standard input as binary","shortMessageHtmlLink":"CLI fix: read standard input as binary"}},{"before":"28793934248a84fccd285b05f1e6367033f8b7ba","after":"0170a9faff634f92204b63c02b2a5f56c11e2482","ref":"refs/heads/master","pushedAt":"2024-05-24T14:01:44.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"CLI file processing fixes: options, mtime and tests (#605)\n\n* CLI fixes: file processing options, mtime, and tests\r\n\r\n* use stat for efficiency","shortMessageHtmlLink":"CLI file processing fixes: options, mtime and tests (#605)"}},{"before":"01021a75fb477dc82b2ec92ab1df22a311ce18c8","after":null,"ref":"refs/heads/fix_cli_fileproc","pushedAt":"2024-05-24T14:01:44.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"ce0bca085cea27b33b8271853a1325efac61556e","after":"01021a75fb477dc82b2ec92ab1df22a311ce18c8","ref":"refs/heads/fix_cli_fileproc","pushedAt":"2024-05-24T12:56:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"use stat for efficiency","shortMessageHtmlLink":"use stat for efficiency"}},{"before":null,"after":"ce0bca085cea27b33b8271853a1325efac61556e","ref":"refs/heads/fix_cli_fileproc","pushedAt":"2024-05-24T12:44:22.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"CLI fixes: file processing options, mtime, and tests","shortMessageHtmlLink":"CLI fixes: file processing options, mtime, and tests"}},{"before":"6d6ebffec552b1d7d5d8e8b068a6efe50bc7871f","after":null,"ref":"refs/heads/add_meta","pushedAt":"2024-05-22T16:27:28.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"f21f2935271f9c14a75bc23e775c61118d080e45","after":"28793934248a84fccd285b05f1e6367033f8b7ba","ref":"refs/heads/master","pushedAt":"2024-05-22T16:27:27.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"metadata: add author XPath extractors (#567)\n\n* metadata: add author XPaths\r\n\r\n* add @data-testid with test","shortMessageHtmlLink":"metadata: add author XPath extractors (#567)"}},{"before":"0d56c02ddf85d96ccbbd995eeb2af5005577728c","after":"6d6ebffec552b1d7d5d8e8b068a6efe50bc7871f","ref":"refs/heads/add_meta","pushedAt":"2024-05-22T16:22:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"add @data-testid with test","shortMessageHtmlLink":"add @data-testid with test"}},{"before":"1ce0e76ced5d3589a1fbe7aa17fc35a96c3c431a","after":"f21f2935271f9c14a75bc23e775c61118d080e45","ref":"refs/heads/master","pushedAt":"2024-05-21T10:48:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"docs: fix typos (#603)\n\n* fix typos\r\n\r\nno functional change\r\n\r\n* Update evaldata.py\r\n\r\n---------\r\n\r\nCo-authored-by: Adrien Barbaresi ","shortMessageHtmlLink":"docs: fix typos (#603)"}},{"before":"090eda6964e99423733033230a9f805d0fdc3117","after":null,"ref":"refs/heads/fix_txt_lists","pushedAt":"2024-05-16T15:24:21.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"9307d90a3c0b4639e4dc4f02b7be827a1fe5fa6d","after":"1ce0e76ced5d3589a1fbe7aa17fc35a96c3c431a","ref":"refs/heads/master","pushedAt":"2024-05-16T15:24:20.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"fix: list spacing in TXT output (#598)","shortMessageHtmlLink":"fix: list spacing in TXT output (#598)"}},{"before":"f98f557b236e8eaf646e24c68d7e28950e065473","after":"9307d90a3c0b4639e4dc4f02b7be827a1fe5fa6d","ref":"refs/heads/master","pushedAt":"2024-05-16T11:47:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"Port of is_probably_readerable from mozilla (#587)\n\n* port of is_probably_readerable from mozilla\r\n\r\n* change xpath selector\r\n\r\n* test: add unit test\r\n\r\n* test: fixes\r\n\r\n* fix cli test\r\n\r\n* add tests for uncovered lines and fix xpath\r\n\r\n* fix test and lint error\r\n\r\n* switch xpath to use //div/br and use parent\r\n\r\n* minor changes\r\n\r\n* order imports","shortMessageHtmlLink":"Port of is_probably_readerable from mozilla (#587)"}},{"before":null,"after":"090eda6964e99423733033230a9f805d0fdc3117","ref":"refs/heads/fix_txt_lists","pushedAt":"2024-05-16T11:35:14.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"fix: list spacing in TXT output","shortMessageHtmlLink":"fix: list spacing in TXT output"}},{"before":"7abf8fe6f903fb292f66e338ba1057cea4639baa","after":null,"ref":"refs/heads/accept_encoding","pushedAt":"2024-05-15T16:12:16.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"2f66f1c2e3474b8555c82142c46831588cb8b242","after":"f98f557b236e8eaf646e24c68d7e28950e065473","ref":"refs/heads/master","pushedAt":"2024-05-15T16:12:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"downloads: fix deflate decoding and add optional zstd to accepted encodings (#594)\n\n* downloads: fix deflate and add optional zstd to accepted encodings\r\n\r\n* polish\r\n\r\n* better logging and minimal version","shortMessageHtmlLink":"downloads: fix deflate decoding and add optional zstd to accepted enc…"}},{"before":"23da16aaee74364234bb742f30309d4b7b35452a","after":"7abf8fe6f903fb292f66e338ba1057cea4639baa","ref":"refs/heads/accept_encoding","pushedAt":"2024-05-15T16:04:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"better logging and minimal version","shortMessageHtmlLink":"better logging and minimal version"}},{"before":"c5d0bf078f12b8f47611086058375a49c252af86","after":"23da16aaee74364234bb742f30309d4b7b35452a","ref":"refs/heads/accept_encoding","pushedAt":"2024-05-15T13:01:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"polish","shortMessageHtmlLink":"polish"}},{"before":null,"after":"c5d0bf078f12b8f47611086058375a49c252af86","ref":"refs/heads/accept_encoding","pushedAt":"2024-05-15T12:39:54.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"downloads: fix deflate and add optional zstd to accepted encodings","shortMessageHtmlLink":"downloads: fix deflate and add optional zstd to accepted encodings"}},{"before":"e114a47ed3d3577e7150bf332f388fd1a43d0d54","after":null,"ref":"refs/heads/update_justext","pushedAt":"2024-05-13T11:28:04.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"}},{"before":"ef32fe753a2f3bece8d9e191e1b00866b4710da1","after":"2f66f1c2e3474b8555c82142c46831588cb8b242","ref":"refs/heads/master","pushedAt":"2024-05-13T11:28:03.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"setup: update justext and lxml dependencies (#593)\n\n* setup: update justext and lxml dependencies\r\n\r\n* test lxml update\r\n\r\n* add CI test\r\n\r\n* restore setup clause","shortMessageHtmlLink":"setup: update justext and lxml dependencies (#593)"}},{"before":"e00b3c9f38ebd488d415a2996afd73a022ee7312","after":"e114a47ed3d3577e7150bf332f388fd1a43d0d54","ref":"refs/heads/update_justext","pushedAt":"2024-05-13T11:01:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"adbar","name":"Adrien Barbaresi","path":"/adbar","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2125866?s=80&v=4"},"commit":{"message":"restore setup clause","shortMessageHtmlLink":"restore setup clause"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEWc5sRgA","startCursor":null,"endCursor":null}},"title":"Activity · adbar/trafilatura"}