37cf06a394Confirm span pages calculation, the management fee and costs page only with management_fee_and_costs and management_fee datapoints
Blade He
2025-04-03 18:08:27 -0500
f333cc30f51. fit the scenario when document type is not 1 or 4, 5 2. support the scenario: "investment fees and costs including performance" statement in performance fee data page, instead of in management fee and costs data page.
Blade He
2025-04-03 17:06:43 -0500
4b896f4460update latest metrics based on optimized matching algorithm Support: 2663 -> 2773 percentage of Share Matched: 80.08 -> 82.59
Blade He
2025-04-02 20:39:31 -0500
427a379b3b1. support re-call ChatGPT API to match non-matched prediction fund/ share names 2. If document fund amount less than 3, cancel the production name judgment logic
Blade He
2025-04-02 16:34:41 -0500
4cee95db9afix issue for post actions
Blade He
2025-03-31 22:04:31 -0500
50e51e0894recover main.py
Blade He
2025-03-31 17:16:05 -0500
f7b9652c75gitignore a virtual environment directory
Russell Spence
2025-03-28 08:36:44 -0500
355b145cf7If found total_annual_dollar_based_charges and could be divisible by 52 or 12, then set the fund name and share name to be document production name
Blade He
2025-03-28 01:33:33 -0500
46f86b124bupdate instructions fund name section structure
Blade He
2025-03-28 00:51:51 -0500
8a5723c150optimize for Entry Fee/ Nil Entry case
Blade He
2025-03-27 21:10:33 -0500
d9259923261. Support the keywords of complex special cases to be regex 2. Support set sub-datapoints list to complex special cases node. 3. Simplify the common management fee and costs instructions. 4. Add markdown title characters: ## or ### to instructions.
Blade He
2025-03-27 16:00:19 -0500
dc560e1e01update metrics
Blade He
2025-03-26 23:14:28 -0500
ff2325c72d1. fix issue for assign values based on production name 2. optimize instructions for extract non-necessary data by Cost of Product message
Blade He
2025-03-26 18:58:45 -0500
8ad472fb39UPDATE metrics code file
Blade He
2025-03-24 18:00:53 -0500
dd1f8f76aeupdate for metrics
Blade He
2025-03-24 17:12:13 -0500
4edc4b4768clean code
Blade He
2025-03-24 17:10:16 -0500
9be6d1296dupdate benchmark check logic
Blade He
2025-03-19 00:52:25 -0500
5ba39a394b1. keep fund/ share db list before applying LLM 2. add key words for interposed_vehicle_performance_fee_cost
Blade He
2025-03-18 22:15:31 -0500
c71936c5ff1. optimize benchmark_name instructions 2. consider possible with multiple same raw fund names in documents, not to remove unmatched_db_list when match relevant raw fund/ share name Otherwise, it will occur some raw names couldn't match db name issue.
Blade He
2025-03-18 17:22:21 -0500
0cea2e501bFor AUS Prospectus, cancel visiting Vision ChatGPT when page contents without any numeric text or perhaps with messy code. (But should keep this logic for EMEA LUX AR, because of some special providers cases for this market documents.)
Blade He
2025-03-18 14:15:43 -0500
6614972849Raw Code added to identify benchmark names
Ravi Maheshwari
2025-03-18 18:57:08 +0530
0ad17e338eCode added to save anomilities
Ravi Maheshwari
2025-03-18 17:56:50 +0530
2817490652Code added to save anomilities
Ravi Maheshwari
2025-03-18 17:54:33 +0530
ad371f6584Changed Performance matrix code to get all anomilities to analyze and Prompt to get better accuracy
Ravi Maheshwari
2025-03-18 16:43:55 +0530
b3941ee4b3update instructions for total_annual_dollar_based_charges
Blade He
2025-03-17 15:07:02 -0500
0ce604021cupdating gitignore
Ravi Maheshwari
2025-03-17 18:52:49 +0530
dc9180ca1bRollback file name
Ravi Maheshwari
2025-03-17 17:09:48 +0530
af3d1222a6Changes done for Bugfix: 1. SSL issue \n2. Ignore Example Tables \n3. Performacne fee
Ravi Maheshwari
2025-03-17 17:07:08 +0530
dd15c1c48eOptimize for benchmark name
Blade He
2025-03-14 11:51:10 -0500
bceff71fa4Set re-run parameters to be True re_run_extract_data = True re_run_mapping_data = True force_save_total_data = True
Blade He
2025-03-14 04:03:22 -0500
0f65537478optimize instructions for minimum_initial_investment
Blade He
2025-03-14 04:02:15 -0500
f539340d041. optimize instructions Only load relevant fund name for investment objective, instead of full page text with the most recent investment objective 2. Exclude the table which with only one numeric column: Cost Product
Blade He
2025-03-14 01:04:51 -0500
551f754379Fix issue when saving data extraction data
Blade He
2025-03-13 18:36:04 -0500
a48af9ddf0A. Metrics score Blade's updates 1. Set the secondary key to be the share class name, instead of the fund name 2. Remove the data point which support is 0 to calculate the metrics 3. Add the message list to store the error message 4. Support save metrics/ error message to excel file 5. Support statistics for different document list 6. Set F1-Score to the first column in the metrics table B. Optimize instructions for benchmark_name
Blade He
2025-03-13 17:52:06 -0500
a090b5cc9e1. metrics's key should be share class name: sec_name 2. support output metrics data as Excel file 3. Optimize instructions for performance_fee_costs
Blade He
2025-03-13 11:53:27 -0500
c7c36dbdd21. update performance_fee name to performance_fee_costs 2. support extract data for total_annual_dollar_based_charges
Blade He
2025-03-11 17:15:39 -0500
b7506c78f3Add API code file
Blade He
2025-03-10 16:00:17 -0500
e9f6383258apply configuration file to replace disorder table header contents
Blade He
2025-03-10 11:09:00 -0500
2548606ccca little change
Blade He
2025-03-10 08:20:01 -0500
604ab326a7a little change
Blade He
2025-03-08 21:50:44 -0600
4ee762963eoptimized for management_fee_and_costs and administration_fees
Blade He
2025-03-08 21:40:00 -0600
fa2dede454optimize for management_fee_and_costs and management_fee
Blade He
2025-03-07 18:38:36 -0600
2cd4f5f787Supplement provider information to ground truth data Calculate metrics based on providers Integrate "merge" data algorithm for AUS Prospectus final outputs
Blade He
2025-03-07 15:02:12 -0600
52515fc1521. simplify management_fee_and_costs instructions 2. optimize management_fee_and_costs instructions 3. resolve the issues for complex scenarios: need sum management_fee, recoverable_expenses, indirect_costs as management_fee_and_costs
Blade He
2025-03-06 17:27:18 -0600
c4ed65770dTry to support more complex management_fee_and_costs scenarios Support calculate all of data points metrics
Blade He
2025-03-05 17:21:13 -0600
cd7e09757dcheck in calc_metrics to repo.
Blade He
2025-03-05 09:57:02 -0600
d00820c14dupdate AUS Prospectus data point configurations
Blade He
2025-03-04 16:52:06 -0600
f4b4d00f58optimize instructions for management fee and costs. support dynamic loading complex instructions by keywords
Blade He
2025-03-04 08:32:55 -0600
d3be711859optimize administration fees instructions
Blade He
2025-02-28 22:12:18 -0600
d4bc3aba4eoptimize for management fees
Blade He
2025-02-28 16:55:33 -0600
d0295995d8support judge whether next page contents with same structure table as current page. If yes, handle next page data extraction pipeline.
Blade He
2025-02-27 23:08:57 -0600
d0128d62791. optimize for administration fees. 2. optimize for management fees
Blade He
2025-02-27 17:36:41 -0600
543cab74e11. get production name 2. if some data point with production name, set each fund/ share with relevant data point value(s)
Blade He
2025-02-27 12:07:49 -0600
412692e1c4update keywords for management fee and costs
Blade He
2025-02-27 08:34:46 -0600
70079d176eSupport remove duplicated values to keep the values to be the latest ones.
Blade He
2025-02-26 17:05:58 -0600
f467945cd4support benchmark name data extraction
Blade He
2025-02-26 10:05:46 -0600
357bb6d5801. support dynamic show fund level data examples. 2. optimize for minimum_initial_investment data point
Blade He
2025-02-25 10:35:53 -0600
e60e1fd546move configuration files for all datapoints to "all_datapoints" folder
Blade He
2025-02-24 15:23:16 -0600
590f7e22491. backup data points configurations 2. simplify data points configurations for important 11 data points.
Blade He
2025-02-24 15:21:32 -0600
75ea383354support identify aus prospectus document category: MIS or Super
Blade He
2025-02-24 15:08:15 -0600
f7d53acddesupport get sqlpass api by configuration
main
Blade He
2025-02-19 14:37:21 -0600
bb6862b179update a little
Blade He
2025-02-19 14:32:08 -0600
705933bbddoptimized for phase 2 data
Blade He
2025-02-18 18:52:26 -0600
353bc28599update a little
Blade He
2025-02-11 11:49:53 -0600
01e2a0e38dadd configuration for datapoints data types update configuration for minimum initial investment support apply value to all of funds for minimum initial investment
Blade He
2025-02-05 12:08:12 -0600
a8810519f8optimize instructions configuration optimize drilldown part logic
Blade He
2025-02-04 15:29:24 -0600
f9ef4cec96update sql_query cache file store location At most cache 5 days, then clean from local disk.
Blade He
2025-01-31 10:59:54 -0600
7f37f3532fswitch example document
Blade He
2025-01-27 14:59:26 -0600
6f831e241cMerge branch 'aus_prospectus_ravi'
Blade He
2025-01-27 12:32:42 -0600
41f8c307ffa little change
Blade He
2025-01-27 12:32:36 -0600
47c41e492f1. only get name mapping data from document mapping 2. Compare name mapping metrics between Ravi's and mine.
Blade He
2025-01-27 12:29:49 -0600
d9b0bed39aa little change
Blade He
2025-01-22 09:57:42 -0600
350550d1b0fix issue for removing item from list
Blade He
2025-01-21 17:24:05 -0600
e2b9bcbdbcinitial abbreviation configurations
Blade He
2025-01-21 17:09:45 -0600
b15d260a58migrate name mapping algorithm from Ravi
Blade He
2025-01-21 16:55:08 -0600
d41fae3dbaprepare for 100 multi-funds document samples
Blade He
2025-01-17 16:26:31 -0600
b93a8d55e8update for output data as template
Blade He
2025-01-17 11:41:58 -0600
f10ff8ee33update for deployment
Blade He
2025-01-16 20:34:43 -0600
fb4a6402f0support output merged data format
Blade He
2025-01-16 16:31:04 -0600
2eace81f51support more configurable parts
Blade He
2025-01-16 13:54:45 -0600
db0827435bsupplement EMEA AR configuration files
Blade He
2025-01-16 11:30:44 -0600
9f0e77a11esupport load configurations by doc_source parameter
Blade He
2025-01-16 11:17:48 -0600
acc30d4b72if fail to get text by pdf to html API, then try to get text by pymupdf.
Blade He
2025-01-15 18:36:02 -0600
ace0ac2674a little change
Blade He
2025-01-15 18:22:08 -0600