Commit Graph

49 Commits

Author SHA1 Message Date
Blade He acc30d4b72 if fail to get text by pdf to html API, then try to get text by pymupdf. 2025-01-15 18:36:02 -06:00
Blade He a89aa9c4de support fetch data from Prospectus 2025-01-14 16:21:48 -06:00
Blade He 0a867dcf07 complete configuration for AUS Prospectus 2025-01-07 16:25:13 -06:00
Blade He 201a809ffa comment remove_abundant_data function 2025-01-06 15:27:43 -06:00
Blade He 309bb714f6 fix issue for parsing data via Vision Function. 2024-12-11 16:49:04 -06:00
Blade He d673a99e21 switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He f71e2968cc simplify code 2024-12-09 22:24:40 -06:00
Blade He 75ea5e70de 1. support fetch data from messy-code page by ChatGPT4o Vision function.
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He d96f77fe00 Split share class names which with multiple share classes in same line 2024-12-06 16:31:42 -06:00
Blade He a25991e2bb 1. Set TOR reported name priority
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He 95c386911c Clean fund name after getting response from ChatGPT 2024-12-04 22:08:09 -06:00
Blade He 70362b554f Fix issue for "The last fund name of previous PDF page" logic:
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He 36fbaa946e Add the statement when transferring the last fund name of previous PDF page:
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement.
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He bc32860f87 remove_abundant_data 2024-12-02 17:16:56 -06:00
Blade He 843bbbd13f dynamic loading instructions for multilingual. 2024-11-20 17:00:22 -06:00
Blade He 2645d528b1 support output data point reported name 2024-10-29 16:47:45 -05:00
Blade He 9d453c9fae a little updates 2024-10-28 15:15:55 -05:00
Blade He 3f2bb38208 Resolve issue first records only with share class name but without fund name (in previous page text). 2024-10-16 16:55:32 -05:00
Blade He f166e73362 optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT 2024-10-15 15:57:54 -05:00
Blade He df66489c5f support this scenario: fund and share are with same name. 2024-10-11 13:14:04 -05:00
Blade He 17284c74f0 optimize for investment mapping: share feature logic 2024-10-09 14:07:07 -05:00
Blade He 04a2409c58 optimize investment mapping algorithm 2024-10-08 23:53:55 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He d92053a16e optimize mapping metrics algorithm 2024-10-01 12:19:45 -05:00
Blade He 18174bf1cf optimize mapping: choose proper candidates mapping list. 2024-10-01 11:35:29 -05:00
Blade He 60a26377e5 optimize investment mapping algorithm 2024-09-30 16:32:56 -05:00
Blade He 3aa596ea33 optimize mapping logic 2024-09-27 16:39:56 -05:00
Blade He 39cd53dc33 support calculate mapping metrics based on document investment mapping in database 2024-09-27 13:20:50 -05:00
Blade He 598e2ab820 investment mapping: optimize for currency logic 2024-09-25 17:28:22 -05:00
Blade He dd6701f18c 1. optimize investment mapping algorithm
2. realize investment mapping metrics
2024-09-25 15:15:38 -05:00
Blade He 0f14bf4a7a 1. get document/ provider mapping data
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He 8496c7b5ed optimize instructions
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He 91530d6089 add more description for Performance Fees calculation rules 2024-09-20 11:58:48 -05:00
Blade He c4985ac75f optimize data extract, metrics calculation algorithm 2024-09-19 22:45:08 -05:00
Blade He 48dc8690c3 support extract data by pdf page image 2024-09-19 16:29:26 -05:00
Blade He 67371e534e only calculate metrics for intersection document list 2024-09-19 11:54:51 -05:00
Blade He 27b3540c63 optimize metrics calculation algorithm 2024-09-19 11:44:17 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 50e6c3c19d a little change 2024-09-16 16:43:03 -05:00
Blade He 932870f406 support split text for this case: outputs over 4K tokens. 2024-09-16 12:03:13 -05:00
Blade He e17414173a update to get more precise results 2024-09-12 16:00:49 -05:00
Blade He 0887608719 support auto-mapping fund/ share by raw names. 2024-09-09 17:34:53 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 1caf552065 support extract data by ChatGPT4o.
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He 7c83f9152a try to improve page filter precision 2024-09-04 17:01:12 -05:00
Blade He 7198450e53 support calculate page filter metrics. 2024-09-03 17:07:53 -05:00
Blade He 32676728f6 optimize prompts 2024-08-28 10:21:26 -05:00
Blade He 6519dc23d4 support filter pages by data point keywords 2024-08-23 16:38:11 -05:00