Commit Graph

154 Commits

Author SHA1 Message Date
Ravi Maheshwari 76fbb7c071 Merge branches 'aus_prospectus_ravi' and 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-12 14:16:27 +05:30
Blade He c7c36dbdd2 1. update performance_fee name to performance_fee_costs
2. support extract data for total_annual_dollar_based_charges
2025-03-11 17:15:39 -05:00
Blade He b7506c78f3 Add API code file 2025-03-10 16:00:17 -05:00
Blade He e9f6383258 apply configuration file to replace disorder table header contents 2025-03-10 11:09:00 -05:00
Blade He 2548606ccc a little change 2025-03-10 08:20:01 -05:00
Blade He 604ab326a7 a little change 2025-03-08 21:50:44 -06:00
Blade He 4ee762963e optimized for management_fee_and_costs and administration_fees 2025-03-08 21:40:00 -06:00
Blade He fa2dede454 optimize for management_fee_and_costs and management_fee 2025-03-07 18:38:36 -06:00
Blade He 2cd4f5f787 Supplement provider information to ground truth data
Calculate metrics based on providers
Integrate "merge" data algorithm for AUS Prospectus final outputs
2025-03-07 15:02:12 -06:00
Blade He 52515fc152 1. simplify management_fee_and_costs instructions
2. optimize management_fee_and_costs instructions
3. resolve the issues for complex scenarios: need sum management_fee, recoverable_expenses, indirect_costs as management_fee_and_costs
2025-03-06 17:27:18 -06:00
Blade He c4ed65770d Try to support more complex management_fee_and_costs scenarios
Support calculate all of data points metrics
2025-03-05 17:21:13 -06:00
Blade He cd7e09757d check in calc_metrics to repo. 2025-03-05 09:57:02 -06:00
Ravi Maheshwari fdcb4b2ec0 Merge branch 'main' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-05 12:01:12 +05:30
Blade He d00820c14d update AUS Prospectus data point configurations 2025-03-04 16:52:06 -06:00
Blade He f4b4d00f58 optimize instructions for management fee and costs.
support dynamic loading complex instructions by keywords
2025-03-04 08:32:55 -06:00
Blade He d3be711859 optimize administration fees instructions 2025-02-28 22:12:18 -06:00
Blade He d4bc3aba4e optimize for management fees 2025-02-28 16:55:33 -06:00
Blade He d0295995d8 support judge whether next page contents with same structure table as current page.
If yes, handle next page data extraction pipeline.
2025-02-27 23:08:57 -06:00
Blade He d0128d6279 1. optimize for administration fees.
2. optimize for management fees
2025-02-27 17:36:41 -06:00
Blade He 543cab74e1 1. get production name
2. if some data point with production name, set each fund/ share with relevant data point value(s)
2025-02-27 12:07:49 -06:00
Blade He 412692e1c4 update keywords for management fee and costs 2025-02-27 08:34:46 -06:00
Blade He 70079d176e Support remove duplicated values to keep the values to be the latest ones. 2025-02-26 17:05:58 -06:00
Blade He f467945cd4 support benchmark name data extraction 2025-02-26 10:05:46 -06:00
Blade He 357bb6d580 1. support dynamic show fund level data examples.
2. optimize for minimum_initial_investment data point
2025-02-25 10:35:53 -06:00
Blade He e60e1fd546 move configuration files for all datapoints to "all_datapoints" folder 2025-02-24 15:23:16 -06:00
Blade He 590f7e2249 1. backup data points configurations
2. simplify data points configurations for important 11 data points.
2025-02-24 15:21:32 -06:00
Blade He 75ea383354 support identify aus prospectus document category: MIS or Super 2025-02-24 15:08:15 -06:00
Blade He f7d53acdde support get sqlpass api by configuration 2025-02-19 14:37:21 -06:00
Blade He bb6862b179 update a little 2025-02-19 14:32:08 -06:00
Blade He 705933bbdd optimized for phase 2 data 2025-02-18 18:52:26 -06:00
Blade He 353bc28599 update a little 2025-02-11 11:49:53 -06:00
Blade He 01e2a0e38d add configuration for datapoints data types
update configuration for minimum initial investment
support apply value to all of funds for minimum initial investment
2025-02-05 12:08:12 -06:00
Blade He a8810519f8 optimize instructions configuration
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He f9ef4cec96 update sql_query cache file store location
At most cache 5 days, then clean from local disk.
2025-01-31 10:59:54 -06:00
Blade He 7f37f3532f switch example document 2025-01-27 14:59:26 -06:00
Blade He 6f831e241c Merge branch 'aus_prospectus_ravi' 2025-01-27 12:32:42 -06:00
Blade He 41f8c307ff a little change 2025-01-27 12:32:36 -06:00
Blade He 47c41e492f 1. only get name mapping data from document mapping
2. Compare name mapping metrics between Ravi's and mine.
2025-01-27 12:29:49 -06:00
Blade He d9b0bed39a a little change 2025-01-22 09:57:42 -06:00
Blade He 350550d1b0 fix issue for removing item from list 2025-01-21 17:24:05 -06:00
Blade He e2b9bcbdbc initial abbreviation configurations 2025-01-21 17:09:45 -06:00
Blade He b15d260a58 migrate name mapping algorithm from Ravi 2025-01-21 16:55:08 -06:00
Blade He d41fae3dba prepare for 100 multi-funds document samples 2025-01-17 16:26:31 -06:00
Blade He b93a8d55e8 update for output data as template 2025-01-17 11:41:58 -06:00
Blade He f10ff8ee33 update for deployment 2025-01-16 20:34:43 -06:00
Blade He fb4a6402f0 support output merged data format 2025-01-16 16:31:04 -06:00
Blade He 2eace81f51 support more configurable parts 2025-01-16 13:54:45 -06:00
Blade He db0827435b supplement EMEA AR configuration files 2025-01-16 11:30:44 -06:00
Blade He 9f0e77a11e support load configurations by doc_source parameter 2025-01-16 11:17:48 -06:00
Blade He acc30d4b72 if fail to get text by pdf to html API, then try to get text by pymupdf. 2025-01-15 18:36:02 -06:00