Commit Graph

143 Commits

Author SHA1 Message Date
Blade He 52515fc152 1. simplify management_fee_and_costs instructions
2. optimize management_fee_and_costs instructions
3. resolve the issues for complex scenarios: need sum management_fee, recoverable_expenses, indirect_costs as management_fee_and_costs
2025-03-06 17:27:18 -06:00
Blade He c4ed65770d Try to support more complex management_fee_and_costs scenarios
Support calculate all of data points metrics
2025-03-05 17:21:13 -06:00
Blade He cd7e09757d check in calc_metrics to repo. 2025-03-05 09:57:02 -06:00
Blade He d00820c14d update AUS Prospectus data point configurations 2025-03-04 16:52:06 -06:00
Blade He f4b4d00f58 optimize instructions for management fee and costs.
support dynamic loading complex instructions by keywords
2025-03-04 08:32:55 -06:00
Blade He d3be711859 optimize administration fees instructions 2025-02-28 22:12:18 -06:00
Blade He d4bc3aba4e optimize for management fees 2025-02-28 16:55:33 -06:00
Blade He d0295995d8 support judge whether next page contents with same structure table as current page.
If yes, handle next page data extraction pipeline.
2025-02-27 23:08:57 -06:00
Blade He d0128d6279 1. optimize for administration fees.
2. optimize for management fees
2025-02-27 17:36:41 -06:00
Blade He 543cab74e1 1. get production name
2. if some data point with production name, set each fund/ share with relevant data point value(s)
2025-02-27 12:07:49 -06:00
Blade He 412692e1c4 update keywords for management fee and costs 2025-02-27 08:34:46 -06:00
Blade He 70079d176e Support remove duplicated values to keep the values to be the latest ones. 2025-02-26 17:05:58 -06:00
Blade He f467945cd4 support benchmark name data extraction 2025-02-26 10:05:46 -06:00
Blade He 357bb6d580 1. support dynamic show fund level data examples.
2. optimize for minimum_initial_investment data point
2025-02-25 10:35:53 -06:00
Blade He e60e1fd546 move configuration files for all datapoints to "all_datapoints" folder 2025-02-24 15:23:16 -06:00
Blade He 590f7e2249 1. backup data points configurations
2. simplify data points configurations for important 11 data points.
2025-02-24 15:21:32 -06:00
Blade He 75ea383354 support identify aus prospectus document category: MIS or Super 2025-02-24 15:08:15 -06:00
Blade He bb6862b179 update a little 2025-02-19 14:32:08 -06:00
Blade He 705933bbdd optimized for phase 2 data 2025-02-18 18:52:26 -06:00
Blade He 353bc28599 update a little 2025-02-11 11:49:53 -06:00
Blade He 01e2a0e38d add configuration for datapoints data types
update configuration for minimum initial investment
support apply value to all of funds for minimum initial investment
2025-02-05 12:08:12 -06:00
Blade He a8810519f8 optimize instructions configuration
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He f9ef4cec96 update sql_query cache file store location
At most cache 5 days, then clean from local disk.
2025-01-31 10:59:54 -06:00
Blade He 7f37f3532f switch example document 2025-01-27 14:59:26 -06:00
Blade He 6f831e241c Merge branch 'aus_prospectus_ravi' 2025-01-27 12:32:42 -06:00
Blade He 41f8c307ff a little change 2025-01-27 12:32:36 -06:00
Blade He 47c41e492f 1. only get name mapping data from document mapping
2. Compare name mapping metrics between Ravi's and mine.
2025-01-27 12:29:49 -06:00
Blade He d9b0bed39a a little change 2025-01-22 09:57:42 -06:00
Blade He 350550d1b0 fix issue for removing item from list 2025-01-21 17:24:05 -06:00
Blade He e2b9bcbdbc initial abbreviation configurations 2025-01-21 17:09:45 -06:00
Blade He b15d260a58 migrate name mapping algorithm from Ravi 2025-01-21 16:55:08 -06:00
Blade He d41fae3dba prepare for 100 multi-funds document samples 2025-01-17 16:26:31 -06:00
Blade He b93a8d55e8 update for output data as template 2025-01-17 11:41:58 -06:00
Blade He f10ff8ee33 update for deployment 2025-01-16 20:34:43 -06:00
Blade He fb4a6402f0 support output merged data format 2025-01-16 16:31:04 -06:00
Blade He 2eace81f51 support more configurable parts 2025-01-16 13:54:45 -06:00
Blade He db0827435b supplement EMEA AR configuration files 2025-01-16 11:30:44 -06:00
Blade He 9f0e77a11e support load configurations by doc_source parameter 2025-01-16 11:17:48 -06:00
Blade He acc30d4b72 if fail to get text by pdf to html API, then try to get text by pymupdf. 2025-01-15 18:36:02 -06:00
Blade He ace0ac2674 a little change 2025-01-15 18:22:08 -06:00
Blade He a89aa9c4de support fetch data from Prospectus 2025-01-14 16:21:48 -06:00
Blade He e230a5bf15 a little change 2025-01-09 12:19:24 -06:00
Blade He 91c86bb983 update AUS Prospectus relevant configuration 2025-01-08 17:40:57 -06:00
Blade He 0a867dcf07 complete configuration for AUS Prospectus 2025-01-07 16:25:13 -06:00
Blade He 201a809ffa comment remove_abundant_data function 2025-01-06 15:27:43 -06:00
Blade He c335992ced update requirements.txt 2025-01-06 13:56:09 -06:00
Blade He 9348e32caa support more performance fee keywords 2025-01-06 13:14:20 -06:00
Blade He 65e752e25a realize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams 2024-12-18 09:19:55 -06:00
Blade He 309bb714f6 fix issue for parsing data via Vision Function. 2024-12-11 16:49:04 -06:00
Blade He d673a99e21 switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00