Commit Graph

22 Commits

Author SHA1 Message Date
Blade He 2cd4f5f787 Supplement provider information to ground truth data
Calculate metrics based on providers
Integrate "merge" data algorithm for AUS Prospectus final outputs
2025-03-07 15:02:12 -06:00
Blade He 75ea383354 support identify aus prospectus document category: MIS or Super 2025-02-24 15:08:15 -06:00
Blade He 353bc28599 update a little 2025-02-11 11:49:53 -06:00
Blade He d41fae3dba prepare for 100 multi-funds document samples 2025-01-17 16:26:31 -06:00
Blade He ace0ac2674 a little change 2025-01-15 18:22:08 -06:00
Blade He a89aa9c4de support fetch data from Prospectus 2025-01-14 16:21:48 -06:00
Blade He 0a867dcf07 complete configuration for AUS Prospectus 2025-01-07 16:25:13 -06:00
Blade He 65e752e25a realize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams 2024-12-18 09:19:55 -06:00
Blade He 309bb714f6 fix issue for parsing data via Vision Function. 2024-12-11 16:49:04 -06:00
Blade He bc32860f87 remove_abundant_data 2024-12-02 17:16:56 -06:00
Blade He 0349033eaf update for more statistics methods 2024-11-06 16:39:42 -06:00
Blade He 81a424b00d Support replaces share class name in database to be more readable.
Examples document 532422720
M&G European Credit Investment Fund A CHFH Acc -> M&G European Credit Investment Fund A CHF H Accumulation

M&G European Credit Investment Fund A CHFHInc -> M&G European Credit Investment Fund A CHF H Income

M&G European High Yield Credit Investment Fund E GBPHedgedAcc -> M&G European High Yield Credit Investment Fund E GBP Hedged Accumulation
2024-11-05 11:14:56 -06:00
Blade He 9d453c9fae a little updates 2024-10-28 15:15:55 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He d25bae936c Optimize investment mapping algorithm. 2024-09-26 12:18:37 -05:00
Blade He 0f14bf4a7a 1. get document/ provider mapping data
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He 98e86a6cfd realize to calculate data extraction metrics. 2024-09-18 17:10:54 -05:00
Blade He 878383a72c support extract the continuous page(s) for not missing next page data which without table header. 2024-09-06 16:29:35 -05:00
Blade He 6519dc23d4 support filter pages by data point keywords 2024-08-23 16:38:11 -05:00
Blade He 993664cf78 a lot of functions to prepare data. 2024-08-22 10:37:56 -05:00
Blade He f91e0cf1a8 auto-fix json data format 2024-08-19 17:59:32 -05:00
Blade He fa46b45ad5 support output tables as markdown format from pdf documents 2024-08-19 15:49:45 -05:00